jQuery in Action, 2nd edition*
The moose likes Java in General and the fly likes text file extractor Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "text file extractor" Watch "text file extractor" New topic
Author

text file extractor

Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Hello, i would like to know if anyone could help me with this, i have a .txt file that i need to extract from every line the second sentence, ej:

aa01;Some stuff1
aa02;Some stuff2
aa03;Some stuff3

so, from that file i will need the all the line without the aa0x code, or better yet all the line with the sentence after the ";" it will be the result like this.
Some stuff1
Some stuff2
Some sutff3

Thank you.
Kemal Sokolovic
Bartender

Joined: Jun 19, 2010
Posts: 825
    
    5

What have you done so far? Since the forum is NotACodeMill, nobody is going to give you a complete solution. You will have to start working on your problem and during the process we will all be glad to help with any specific problem you might encounter.

So, what exactly is the issue? Are you having problem reading a file, extracting the part of each line you want to get, or something else I didn't think of?


The quieter you are, the more you are able to hear.
fred rosenberger
lowercase baba
Bartender

Joined: Oct 02, 2003
Posts: 11161
    
  16

There are many ways to approach this, depending on your specific needs. For example:

Will there always be exactly one semi-colon?
Will there always be exactly five characters to skip?
Will every line have those five characters?
what is the exact pattern of the leading characters - will it always be the literal "aa" followed by two digits, or will those letters change?

And as Kemal said...without knowing exactly where you are stuck, we can't help. We don't know if you need help installing the JDK, compiling a simple "Hello World", opening the file, reading the file, parsing out individual lines, printing out the results...

you need to be SPECIFIC if you want help.


There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors
Randall Twede
Ranch Hand

Joined: Oct 21, 2000
Posts: 4340
    
    2

i am guessing his problem is with getting the second sentence in each line. it has been so long since i have done that i can't remember how. StringTokenizer comes to mind. i am sure there are tutorials about this. try going to google and typing in java parse text file


SCJP
Visit my download page
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

Use of StringTokenizer is advised against (although not strictly deprecated) but the Javadoc suggests alternatives: java.util.StringTokenizer
Kemal Sokolovic
Bartender

Joined: Jun 19, 2010
Posts: 825
    
    5

Randall Twede wrote:i am guessing his problem is with getting the second sentence in each line. it has been so long since i have done that i can't remember how. StringTokenizer comes to mind. i am sure there are tutorials about this. try going to google and typing in java parse text file

String#split(String regex) would do that job. But we still don't know what is the exact problem OP is facing, so we could only guess.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19653
    
  18

Am I the only one who's thinking about using a proper CSV library? Because that format looks like a CSV file that uses ; as the column separator.
Our AccessingFileFormats FAQ page mentions several libraries you can use. I rather like opencsv.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Hi, i show you the file and the code



and the file which is read has this:
AAA;BBB;CCC
AAA1;BBB1;CCC1
AAA2;BBB2;CCC2
AAA3;BBB3;CCC3
AAA4;BBB4;CCC4

so i can't get from each line the second column, ej, i need from the file the follow result:
BBB
BBB1
BBB2
BBB3
BBB4

i have try several way but i can't get it. thank you.


Kemal Sokolovic
Bartender

Joined: Jun 19, 2010
Posts: 825
    
    5

You were already advised against using StringTokenizer, and I agree with the point.

Much easier and elegant solution can be achieved by using String#split(String) method I have referred you to in my previous post. If you have a String like this:

you can get each value with the code like this:

With that code you will have these values:
data[0] = "value1";
data[1] = "value2";
data[2] = "value3";
data[3] = "value4";

As you can see the logic is pretty straighforward. For more information you can look the API or tutorial, but with the given example you should be able to accomplish your task.
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

In what way is the code you have supplied not working? When I run it the second iteration of your while loop seems to return the column you want.

I was going to say that string.split would give you the data in a more accessible format but Kemal beat me to the punch.
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Kemal Sokolovic wrote:You were already advised against using StringTokenizer, and I agree with the point.

Much easier and elegant solution can be achieved by using String#split(String) method I have referred you to in my previous post. If you have a String like this:

you can get each value with the code like this:

With that code you will have these values:
data[0] = "value1";
data[1] = "value2";
data[2] = "value3";
data[3] = "value4";


As you can see the logic is pretty straighforward. For more information you can look the API or tutorial, but with the given example you should be able to accomplish your task.


your code work fine but it don't get the result that i need, see the next code in which i implent your code:



and this is the result:
AAA1
BBB1
CCC1

when i need the from each line the second column, like this:
BBB
BBB1
BBB2
BBB3
BBB4

where the original file.txt contains:
AAA;BBB;CCC
AAA1;BBB1;CCC1
AAA2;BBB2;CCC2
AAA3;BBB3;CCC3
AAA4;BBB4;CCC4
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

Your x variable it incrementing with the line not the token. If your text file is only ever structured as you say then you know that the second element of a will be the string you require for each line of your program.
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Phil English wrote:Your x variable it incrementing with the line not the token. If your text file is only ever structured as you say then you know that the second element of a will be the string you require for each line of your program.


Sorry i don't understand what you say. can you make and example.
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

What is happening in your latest code is that you are calling x++ every time you read a new line so x==2 will only be true on your second line. What is happening is that your code is printing all of the String array a only on the second line.

Let's say you run your code just for the first two lines of the file.

On the first iteration your String array a will contain ["AAA", "BBB", "CCC"] so a[0] = "AAA", a[1] = "BBB", a[2] = "CCC"
On the second iteration a is overwritten and now contains ["AAA", "BBB", "CCC"] so a[0] = "AAA1", a[1] = "BBB1", a[2] = "CCC1"

As a suggestion get rid of the if and the for loop and just try printing individual elements of the array. System.out.println(a[0])
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Phil English wrote:What is happening in your latest code is that you are calling x++ every time you read a new line so x==2 will only be true on your second line. What is happening is that your code is printing all of the String array a only on the second line.

Let's say you run your code just for the first two lines of the file.

On the first iteration your String array a will contain ["AAA", "BBB", "CCC"] so a[0] = "AAA", a[1] = "BBB", a[2] = "CCC"
On the second iteration a is overwritten and now contains ["AAA", "BBB", "CCC"] so a[0] = "AAA1", a[1] = "BBB1", a[2] = "CCC1"

As a suggestion get rid of the if and the for loop and just try printing individual elements of the array. System.out.println(a[0])


ok i understand now, but when i have a file.txt with more than 100 lines is very difficult to print manually each line. thats why i need a best method to do it.
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

You know that every time you loop (each line in your file) that a[1] contains the string you want. Why not extract that on every iteration into a new variable. That new variable will grow by one element each time you iterate and when you are finished it will contain all the strings you want.
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Phil English wrote:You know that every time you loop (each line in your file) that a[1] contains the string you want. Why not extract that on every iteration into a new variable. That new variable will grow by one element each time you iterate and when you are finished it will contain all the strings you want.


ok i never think that, thank you i goint to test it.
Phil English
Ranch Hand

Joined: Jun 18, 2012
Posts: 62

No problem. Have a look at java.lang.ArrayList it is probably as good a place as any to store these strings especially if your input file could change in length.
Hernan Tavella
Ranch Hand

Joined: Apr 28, 2012
Posts: 42
Thank you to all, it work great.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: text file extractor
 
Similar Threads
Please help with NullPointerException
Tidying up my code?
Design Patterns important for jobs and job interviews ?
Changing Javascript prompts according to Locale
Fetch 10K rows from table