• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing

 
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Folks,
I'm hoping that someone could help me out with the following problem. I'm trying to parse the following text.

As you can see, there is a sort of column and row heading. For example, RES/FUEL means reserved fuel. I'm a little clueless as how to approach this. I just started this job a few days ago (right out of college) and the company has absolutely no documentation on similar solutions to help me out. If I took the time to try to figure out the other developer's solution, it would take me forever because it's too complex at this point for me.
One idea I had is to create a multidimensional array. That way, for example, when I try to extract the value for attribute reserved fuel, I could just use indices. Another issue is dealing with the heading inside the txt document, which MAX POSS LOAD.
I don't know. Any help would truly be appreciated.
[ April 01, 2003: Message edited by: Jim Yingst ]
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As you can see, the text didn't properly format when I pasted it.
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually, it did come out all right. Just make sure your text size is lowered.
 
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I added [code] tags to preserve the indentation.
First, you can break this into separate lines by reading it with a BufferedReader, which has a readLine() method. For each line then, you need to separate the different fields. It looks like you can identify fields just by counting characters - e.g. the DIST field seems to start in the 12th or 13th column, and end in the 15th. You can extract this with String's substring() method. (Read the API for this carefully.) The you can convert the data to a non-String format using methods like trim() and Integer.parseInt() - E.g.
String line = buffReader.readLine();
...
String distStr = line.substring(11, 15);
int dist = Integer.parseInt(distStr.trim());
This approach should work if the column numbers are consistent throughout your input file. If they're not, well, you'll have to study the file structure more to look for consistent patterns which you can use.
 
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let's begin by defining the problem set? What are the different fields each line contains and the possible permutations thereof. Are there any field delimiters? It looks to me like it's just spacing things out, but it's still something to think about. Can we guarantee that each field occupies a certain range of character positions in all cases? For example, can we guarantee that the first field will always be found in the first nine characters?
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I like your approach Jason. To answer your first question, I (nor does anyone else at my company) actually know what the possible permutations are. We're using this particular textfile as a example of a possible text that would be sent to us from an external data provider. When we begin to test the classes "live", then will have to address the code to any permuations. (A shitty approach if you ask me, but not much I can do.) The delimiters, as Jim suggested, are the spaces in between. Yes, at this point, have to assume each field specifies an exact range.
[ April 01, 2003: Message edited by: Chris Cairns ]
[ April 01, 2003: Message edited by: Chris Cairns ]
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Jim,
I think your approach would be good, but I need to treat the textfile as a String. For example, here is a larger portion of the textfile that's being parsed.

Ignore most of that. Point I'm trying to get at is I'm breaking the text into blocks. Then parsing the blocks. So the block is actually a String, so I have to operate on that. The block I have pasted in my previous post is the one I have to work on.
[ April 01, 2003: Message edited by: Chris Cairns ]
[ April 01, 2003: Message edited by: Chris Cairns ]
 
Jason Menard
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Just so I'm sure I understand you, are you saying you have that block as one String? So it would roughly be the equivalent of this?

[ April 01, 2003: Message edited by: Jason Menard ]
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Exactly, Jason.
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm thinking that way one to do this is to assume that the fields hold a fixed position. However, let's just say that the distance for 670 for TRIP-SAEZ and DIST is 5,000. I could grab a substring with an index position a few line spaces before, then just trim it. That way, if one of the fields do change, it would be compensated for.
 
Jason Menard
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm thinking regexes. See if this code points you in the right direction. I might have a couple of different patterns to handle different permutations if it gets too messy with one regex, but hopefully you get the idea.

I only worked up to the first three groups, but hopefully it will get you started.
 
Jason Menard
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Let me break the regex down a little better in case you are unfamiliar with them:
1: ^
The beginning of the String...
2: ([A-Z-]{1,9})
...followed by 1-9 characters which may each be either capital A through Z, or the '-' character. Capture this sequence to group 1.
3: \\s{2,8}
Followed by 2 - 8 whitespace characters...
4: (\\d{1,4})?
...followed by zero or one occurences of a 1-4 digit sequence, which is captured to group 2.
5: \\s{3,11}
Followed by 3 - 11 whitespace characters...
6: (\\d{1,5})
followed by a 1-5 digit sequence which is captured to group 3.
7: .*
Followed by zero or more occurences of any other character.
HTH
[ April 01, 2003: Message edited by: Jason Menard ]
 
Jim Yingst
Wanderer
Posts: 18671
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The fact that all your input is in one big string doesn't prevent you from using a BufferedReader. You can construct it like this:

You could also use regular expressions for this (as Jason is doing) - if you're familiar with them and/or have time to learn, they're extremely powerful and flexible. But I'm pretty confident you can do this with BufferedReader and StringReader too, if you can understand the format properly.
From your longer file example, it seems as if the biggest problem is not figuring out how to parse the individual lines that have the data you want, but rather, how do you parse just those lines, ignoring the other stuff (which I assume for now that you don't need)? Based on what you've said so far, I might suggest: read lines until you find one that says

That indicates the start of data, as far as you're concerned. Now read each subsequent line and try to parse it. If it's null or blank, that indicates the end of the table of useful data, so you can stop reading.
Of course, if the other parts of the file also contain useful data that you need to understand, but in a different format, then you will have to study the format more to decide how to approach it.
Good luck...
[ April 01, 2003: Message edited by: Jim Yingst ]
 
Jason Menard
Sheriff
Posts: 6450
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Even if you are using regexes, my preference would be to do as Jim suggested and read in each line at a time.
 
Chris Cairns
Ranch Hand
Posts: 48
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Okay, thanks you guys. I really appreciate it. I have a lot to learn!
 
Ranch Hand
Posts: 1067
2
IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I say do it in perl!
 
reply
    Bookmark Topic Watch Topic
  • New Topic