aspose file tools*
The moose likes Beginning Java and the fly likes regular expression question Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "regular expression question" Watch "regular expression question" New topic
Author

regular expression question

Viv Singh
Ranch Hand

Joined: Nov 08, 2008
Posts: 73
Hi,

I would like to extract some information from a text. I guess I will have to do that using regular expressions in java.

Example:

OS info
*******
UOS = Windows Vista 32-bit Service Pack 1
Admin=NO

From this text I would like extract the information that the operating system is Windows Vista 31bit Service Pack1 and the info that the user was not the admin and store it in 2 variables lets say String os = "Windows Vista 32-bit Service Pack 1" and String admin = "NO".

How could I do that?

thanks in advance.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18989
    
  40


I know that this is unlikely to be a homework problem, but, JavaRanch is still a learning site.... so, what have your tried so far? And what issues are you having?

Henry


Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
Viv Singh
Ranch Hand

Joined: Nov 08, 2008
Posts: 73
For example to extract the Operating system, I have tried the following:





I want it to return the whole string: "Windows Vista 32-bit Service Pack 1". It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18989
    
  40

In your regex, you are only extracting the first group of word characters. So, it will only get the first word, as the space won't match.

It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".


Well, to do this part, you'll need to have a mechanism to define what is a valid OS name. Your program doesn't magically know what names are valid, and what names are not. Where is this validity data coming from?

Henry
Viv Singh
Ranch Hand

Joined: Nov 08, 2008
Posts: 73
Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18989
    
  40

Viv Singh wrote:Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............


Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.

However, in your example...

Windows Vista 32-bit Service Pack 1 abcdef


It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?

Henry
Viv Singh
Ranch Hand

Joined: Nov 08, 2008
Posts: 73
Henry Wong wrote:
Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.


How can I math everything but the carriage return?

Windows Vista 32-bit Service Pack 1 abcdef

It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?


This is a problem, I will have to think of some solution for this problem.
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19760
    
  20

If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18989
    
  40

How can I math everything but the carriage return?



In regex, to match everything but a certain set of characters, you do this... [^abc] .... meaning don't match a, b, or c.

So... to not match the CR and LF, you have to do this .... [^\\r\\n]

If you use this in your original pattern, instead of \\w, meaning ... "UOS =\\s*([^\\r\\n]*)" ... this should extract to the EOL.

Henry
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 18989
    
  40

Rob Prime wrote:If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.



Based on the format, I agree. But I am guessing that not everything is being shown here...

Henry
 
 
subject: regular expression question