aspose file tools*
The moose likes Java in General and the fly likes Regular Expressions on a .csv file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Regular Expressions on a .csv file" Watch "Regular Expressions on a .csv file" New topic
Author

Regular Expressions on a .csv file

Elisha Cassidy
Greenhorn

Joined: Aug 23, 2006
Posts: 9
Hi,

i was wondering if someone could help me. i have a .csv file with a few useless lines of text at the beginning that i would like to ignore and just read in the lines that begin with numbers. My csv file is in the proper format in that the data is in their own separate columns. i tried to use regular expressions to only extract the lines where the first field is numbers but i can't seem to get it to work with the pattern matching. can you please help me as i am really stuck. my data is:

123,Fri Aug 11 11:21:25 2006,2,C:\Documents and Settings\Test\continues till end of file path,18

where the commas represent the data being in their own column. The C:\Docs and Settings part changes i.e. it could be C:\Test, but it always begins with C:\
Is there a way to just look at the first field of every line and if it begins with a number then just take that whole line i.e. if first field is 123 then get that and return 123,Fri...,2,C:\...,18 as it is above. i only want the lines where the first field contain numbers
thanks in advance for your help,

kedklok

Here is what i tried:

Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Is there a way to just look at the first field of every line and if it begins with a number then just take that whole line i.e. if first field is 123 then get that and return 123,Fri...,2,C:\...,18 as it is above. i only want the lines where the first field contain


Well, in your code, you are already reading the file line by line. You just need to check if the line starts with a number and if true do something with it. There is also no need to return the line, as you already have each line.

Basically, you just need the regex to tell you if the line, that you already have, starts with a number.

Anyway, try this...



Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Alan,

The request was to check if the first field was a valid number -- not if the first character was a digit. The first field contains all the characters up to the comma separator.

Henry
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?


Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

Originally posted by Garrett Rowe:
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?


There is a performance gain -- but not in the way I did it. The reason I did that was just out of habit...

Since the operation does not change. There is no reason to compile the pattern repeatedly. The "p" variable can be instantiated once, instead of everytime the method is called.

Henry
Alan Moore
Ranch Hand

Joined: May 06, 2004
Posts: 262
Originally posted by Henry Wong:
The request was to check if the first field was a valid number -- not if the first character was a digit. The first field contains all the characters up to the comma separator.


Yes, but it's probably safe to assume that, if a line starts with a digit, the entire first field is numeric (but of course, only the OP will know for sure). People who are just starting to use regexes have a tendency to make their regexes more specific than they need to be (and thus more complicated and error-prone), or to use regexes where something simple like Character.isDigit() will suffice.

Originally posted by Garrett Rowe:
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?


The way Henry used it there's no benefit, but pre-compiling the regex can save a lot of overhead if you're using the regex in a tight loop. And if performance is really critical, you can save a little more overhead by pre-instantiating the Matcher. Another benefit is that you can use one Matcher's other matching methods, find(), find(int), and lookingAt(). I used lookingAt() because it requires the match to start at the beginning of the target text but doesn't require it to match all the way to the end. That makes it slightly more efficient than matches(), but again, you won't notice that unless you're doing some heavy-duty text processing.
Elisha Cassidy
Greenhorn

Joined: Aug 23, 2006
Posts: 9
Hi,

thanks very much for the help. i have tried Henry's code above and it works grand for me except that it keeps asking me for a return statement. i have modified the code to get it to work with my program but now it returns a blank line and i only want it to just return the lines where the first field is a numbers as i need to then put the results into an swt table.

thanks in advance for your help

Elisha

Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 257

Originally posted by Elisha Cassidy:

(...)
grand for me except that it keeps asking me for a return statement. i have modified the code to get it to work with my program but now it returns a blank line and i only want it to just return the lines where the first field is a numbers as i need to then put the results into an swt table.

(...)


How about getting it to return a true/false instead?



HTH,
- Anand


"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery
Elisha Cassidy
Greenhorn

Joined: Aug 23, 2006
Posts: 9
Hi,

i have the line printing out in full but does anyone know how to split it so i can print specific columns of the file. This is what i tried:

Thanks again for all you help

Elisha

[ August 30, 2006: Message edited by: Elisha Cassidy ]
[ August 30, 2006: Message edited by: Elisha Cassidy ]
Garrett Rowe
Ranch Hand

Joined: Jan 17, 2006
Posts: 1296
how about:

[ August 30, 2006: Message edited by: Garrett Rowe ]
K Terr
Greenhorn

Joined: Jun 20, 2006
Posts: 14
Hi,

i am trying to read in a .csv file but i only want the lines that have URL or REDR in the first field. does anyone know the regular expression for this, i can't use [a-z] as the first few lines contain text that i want to ignore. i only want URL and REDR

thanks in advance for the help

K Terr
Elisha Cassidy
Greenhorn

Joined: Aug 23, 2006
Posts: 9
hi all,

thanks for all the help, it is now working. K Terr i have no idea how to do that i too am new to regular expressions.

Elisha
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18896
    
  40

K. Terr,

Please start a new topic -- and provide examples of what you are looking for.

Henry
K Terr
Greenhorn

Joined: Jun 20, 2006
Posts: 14
its ok i got it to work

for anyone else stuck on this, use the following:



gets the lines starting with URL

K Terr
[ September 01, 2006: Message edited by: K Terr ]
Anand Hariharan
Rancher

Joined: Aug 22, 2006
Posts: 257

Originally posted by K Terr:


gets the lines starting with URL


You don't need the parenthesis, and you'd be better of to include the comma in your RE.

Perhaps something like "^URL,"?
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Regular Expressions on a .csv file