• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Spilting a large text file by searching a key word.

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I have over 200 books in a single text file I want to split all of the books up then make indivual directories for each book then split each book into chapters. How can I do this in either C, command line, or perl?

I have tired gawk but this is not built into mac, also I have tried cspilt and could not get this to work.

The layout of the text text file is

CHAPTER ONE
Book Title
..
..
CHAPTER TWO
..
..
CHAPTER ONE
New Book Title

I was wanting to write to a new file every time "CHAPTER ONE" appears then to divide up the chapter just write a new txt file every time "CHAPTER" appears. Suggestions on logic would be appreciated I have never done anything outside of arithmetic. Below is what I think I need to do, but i have no idea how to do it.



 
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I merged your stuff with the following thread. I hope that is okay by you.
 
Jeremiah Parrack
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
This program only splits the inputted text file in half but I can do it according to line number as well. I need to split according to a word. I have over 200 books in a text file i want to split the file every time it sees the word "CHAPTER ONE" I have no idea how to do this but I am sure I'm not too far off. If you can tell me how to adjust this that would be great.
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

You have posted two questions with the same name but different content, so I think I should merge the two because they seem related. I have also corrected the code tags (thank you for using them); you can read more about the tags here. Change the word java in the opening tag to c or c++ for better syntax highlighting.

How do you know that the word CHAPTER will always be in the same *char (=char[])? Are you reading line by line or 256 chars at a time? Do you expect CHAPTER ONE to appear more than once in the book? Is there a built‑in function to find substrings embedded in *char/char[]s? Presumably you find the first occurrence of the C in CHAPTER, use strcpy for the remainder of the String, and put a \0 in place of the C. A little wasteful of memory, maybe, but it will only be a few kB per book.
If CHAPTER XXX is always on a line by itself, all you need to do is to find whether CHAPTER is a “prefix” of that particular *char/char[].

If you cannot find a built‑in prefix procedure, you can write your own. One suggestion: copy the first seven letters into a new *char/char[] and see whether that is equal to CHAPTER with strcmp.
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you are repeatedly reading lines it is quite easy to use strcmp to find whether a line is equal to CHAPTER ONE, and you can use the title as the name of the next file.Remember strcmp returns the difference between two *chars, so it returns 0 for equality. That depends on the lines being exactly equal, including case and any spaces. Beware of trailing spaces, because CHAPTER ONE and CHAPTER ONE    are different.

I seem to have merged the two discussions in the wrong order. Sorry.
You can use c, cpp or c++ in the code tags, it would appear. More details in the ranch guide (scroll to the bottom).
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
... and welcome to the Ranch
 
Jeremiah Parrack
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:How do you know that the word CHAPTER will always be in the same *char (=char[])? Are you reading line by line or 256 chars at a time? Do you expect CHAPTER ONE to appear more than once in the book? Is there a built‑in function to find substrings embedded in *char/char[]s?



I know this due to the format of this large text file. The text file has 200+ books in it so if I separate them according to CHAPTER ONE it should separate them all into 200+ individual text files.


Campbell Ritchie wrote:Presumably you find the first occurrence of the C in CHAPTER, use strcpy for the remainder of the String, and put a \0 in place of the C. A little wasteful of memory, maybe, but it will only be a few kB per book.
If CHAPTER XXX is always on a line by itself, all you need to do is to find whether CHAPTER is a “prefix” of that particular *char/char[].

If you cannot find a built‑in prefix procedure, you can write your own. One suggestion: copy the first seven letters into a new *char/char[] and see whether that is equal to CHAPTER with strcmp.




Yes CHAPTER is on a line by itself,  I will play with this thank you for the extremely helpful information!!


 
Jeremiah Parrack
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:If you are repeatedly reading lines it is quite easy to use strcmp to find whether a line is equal to CHAPTER ONE, and you can use the title as the name of the next file.Remember strcmp returns the difference between two *chars, so it returns 0 for equality. That depends on the lines being exactly equal, including case and any spaces. Beware of trailing spaces, because CHAPTER ONE and CHAPTER ONE    are different.

I seem to have merged the two discussions in the wrong order. Sorry.
You can use c, cpp or c++ in the code tags, it would appear. More details in the ranch guide (scroll to the bottom).





Ok I have edited code from https://www.codingunit.com/c-tutorial-splitting-a-text-file-into-multiple-files
This splits the code every 5 lines.

I used to strcmp to write a new file everytime it sees the word "Turn" ( this is a smaller text file I'm using). However its not doing what i want, none of the values for the srtcmp statement are returning 0 any suggestions??








The out put i am getting is :


 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Add a %s instruction inside \"%s\" escapes to your line 27, print the exact value of line, and see whether you have any trailing whitespace or anything.
 
Campbell Ritchie
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There is something wrong with your reading lines. You nowhere use a \n escape but you are going onto a new line. It appears you are reading a line end character, and it is printing between lines 26 and 27. My trick with \"%s\" should show you some error like that.
 
reply
    Bookmark Topic Watch Topic
  • New Topic