| Author |
regular expression question
|
Viv Singh
Ranch Hand
Joined: Nov 08, 2008
Posts: 73
|
|
Hi,
I would like to extract some information from a text. I guess I will have to do that using regular expressions in java.
Example:
OS info
*******
UOS = Windows Vista 32-bit Service Pack 1
Admin=NO
From this text I would like extract the information that the operating system is Windows Vista 31bit Service Pack1 and the info that the user was not the admin and store it in 2 variables lets say String os = "Windows Vista 32-bit Service Pack 1" and String admin = "NO".
How could I do that?
thanks in advance.
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16695
|
|
I know that this is unlikely to be a homework problem, but, JavaRanch is still a learning site.... so, what have your tried so far? And what issues are you having?
Henry
|
Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
|
 |
Viv Singh
Ranch Hand
Joined: Nov 08, 2008
Posts: 73
|
|
For example to extract the Operating system, I have tried the following:
I want it to return the whole string: "Windows Vista 32-bit Service Pack 1". It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16695
|
|
In your regex, you are only extracting the first group of word characters. So, it will only get the first word, as the space won't match.
It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".
Well, to do this part, you'll need to have a mechanism to define what is a valid OS name. Your program doesn't magically know what names are valid, and what names are not. Where is this validity data coming from?
Henry
|
 |
Viv Singh
Ranch Hand
Joined: Nov 08, 2008
Posts: 73
|
|
Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16695
|
|
Viv Singh wrote:Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............
Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.
However, in your example...
Windows Vista 32-bit Service Pack 1 abcdef
It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?
Henry
|
 |
Viv Singh
Ranch Hand
Joined: Nov 08, 2008
Posts: 73
|
|
Henry Wong wrote:
Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.
How can I math everything but the carriage return?
Windows Vista 32-bit Service Pack 1 abcdef
It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?
This is a problem, I will have to think of some solution for this problem.
|
 |
Rob Spoor
Sheriff
Joined: Oct 27, 2005
Posts: 19216
|
|
|
If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.
|
SCJP 1.4 - SCJP 6 - SCWCD 5
How To Ask Questions How To Answer Questions
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16695
|
|
How can I math everything but the carriage return?
In regex, to match everything but a certain set of characters, you do this... [^abc] .... meaning don't match a, b, or c.
So... to not match the CR and LF, you have to do this .... [^\\r\\n]
If you use this in your original pattern, instead of \\w, meaning ... "UOS =\\s*([^\\r\\n]*)" ... this should extract to the EOL.
Henry
|
 |
Henry Wong
author
Sheriff
Joined: Sep 28, 2004
Posts: 16695
|
|
Rob Prime wrote:If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.
Based on the format, I agree. But I am guessing that not everything is being shown here...
Henry
|
 |
 |
|
|
subject: regular expression question
|
|
|