This week's book giveaway is in the Design forum.
We're giving away four copies of Design for the Mind and have Victor S. Yocco on-line!
See this thread for details.
Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

regular expression question

 
Viv Singh
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I would like to extract some information from a text. I guess I will have to do that using regular expressions in java.

Example:

OS info
*******
UOS = Windows Vista 32-bit Service Pack 1
Admin=NO

From this text I would like extract the information that the operating system is Windows Vista 31bit Service Pack1 and the info that the user was not the admin and store it in 2 variables lets say String os = "Windows Vista 32-bit Service Pack 1" and String admin = "NO".

How could I do that?

thanks in advance.
 
Henry Wong
author
Marshal
Pie
Posts: 20996
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

I know that this is unlikely to be a homework problem, but, JavaRanch is still a learning site.... so, what have your tried so far? And what issues are you having?

Henry
 
Viv Singh
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For example to extract the Operating system, I have tried the following:





I want it to return the whole string: "Windows Vista 32-bit Service Pack 1". It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".
 
Henry Wong
author
Marshal
Pie
Posts: 20996
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In your regex, you are only extracting the first group of word characters. So, it will only get the first word, as the space won't match.

It is also possible that in the input the there is something like "Windows Vista 32-bit Service Pack 1 abcdef" but even then I still just want "Windows Vista 32-bit Service Pack 1".


Well, to do this part, you'll need to have a mechanism to define what is a valid OS name. Your program doesn't magically know what names are valid, and what names are not. Where is this validity data coming from?

Henry
 
Viv Singh
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............
 
Henry Wong
author
Marshal
Pie
Posts: 20996
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Viv Singh wrote:Isnt there any way to read till the end of line?
Because the problem is that I do not know the exact data. There could be many variations Like Windows 2000 Service Pack 1, Windows 2000 Service Pack 2, Windows XP Service Pack 1, Windows 98 ............


Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.

However, in your example...

Windows Vista 32-bit Service Pack 1 abcdef


It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?

Henry
 
Viv Singh
Ranch Hand
Posts: 73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry Wong wrote:
Sure, you can change your match criteria to everything but the carriage return / line feed, and it will match to the end of line. Or you can read it a line at a time, then match everything, which is to the end of line.


How can I math everything but the carriage return?

Windows Vista 32-bit Service Pack 1 abcdef

It wasn't separated by an EOL -- in this case, how do you know if abcdef isn't part of the OS name?


This is a problem, I will have to think of some solution for this problem.
 
Rob Spoor
Sheriff
Pie
Posts: 20511
54
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.
 
Henry Wong
author
Marshal
Pie
Posts: 20996
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How can I math everything but the carriage return?



In regex, to match everything but a certain set of characters, you do this... [^abc] .... meaning don't match a, b, or c.

So... to not match the CR and LF, you have to do this .... [^\\r\\n]

If you use this in your original pattern, instead of \\w, meaning ... "UOS =\\s*([^\\r\\n]*)" ... this should extract to the EOL.

Henry
 
Henry Wong
author
Marshal
Pie
Posts: 20996
76
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:If I see that file format, I'm thinking of java.util.Properties to do the hard work for me.



Based on the format, I agree. But I am guessing that not everything is being shown here...

Henry
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic