• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Parsing a Text File

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am trying to parse this text file which has tags similar to that of XML.
I am able to read the textfile fine, and return its output. My problem is going about how to read the tags within the file and returning the contents within those elements.
My textfile looks like this:


I want to extract for example, the DOCNO number within the DOCNO tag. What methods in Java would I use to do that?

Here is my code so far. It's just reading and outputting all of the file:


 
Sheriff
Posts: 67746
173
Mac Mac OS X IntelliJ IDE jQuery TypeScript Java iOS
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Why aren't you just using XML parsing?
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You should never do this when doing I/O:

You must handle exceptions instead of ignoring them. At least print the error message (better yet: the stack trace) to where you will see it.
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And welcome to the Ranch
 
Ranch Hand
Posts: 167
1
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The way I've been doing it, is to read the text file into an array one line per array element, then iterate through the array and perform my manipulation etc.

But in terms of parsing out your data, I would take the file, read it into an array, then build another array from the first making everything from <DOC> to </DOC> appear on a single line, then I would use Patterns and Matches to find what you need. Once you have the file read into an array, I would do something like this:


This basically iterates through your main array, and looks at the current index and if it is <DOC>, it continues to look at each next line, adding it to a single string until it sees the end of the main array or the next <DOC> at which point, it bails out.

Once you have your new array list, you will search for your patterns like this, if, for example, I wanted the DOCNO



Thats how I would do it, but I'm far from anything more than a amateur programmer.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You have obviously programmed in C.
Please space your code out so you don't have two {} on the same line (Except in array initialisers). It is by no means easy to read. Variable names like ct i and bStop (which you can maybe delete because you don't seen to be using it anywhere) don't tell us what they are supposed to mean.

I still think you would do better to find yourself an XML parser.
 
Michael D Sims
Ranch Hand
Posts: 167
1
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:You have obviously programmed in C.
Please space your code out so you don't have two {} on the same line (Except in array initialisers). It is by no means easy to read. Variable names like ct i and bStop (which you can maybe delete because you don't seen to be using it anywhere) don't tell us what they are supposed to mean.

I still think you would do better to find yourself an XML parser.



Actually, I've never written in C. My first language was Commodore Basic back in 1982 (I was 12), then the Amiga, then Basic for DOS, then Visual Basic, then some dabblings in .NET when they were transitioning over to OOP but had not quite gotten there yet. My first true OOP was Java, and that consisted of three courses that I took when I got my CIS degree in business at Cal Poly, Pomona. I am a Network Engineer by trade and my true specialties are servers, routers, switches, VoIP deployments etc. I dabble in programming from time to time because I enjoy it, but by no means would I consider my programming skills worthy of a wage.

I've always had trouble naming variables, and somewhere along the way, I seem to have stuck with lower case data type then upper case name. So in my example, bStop is a boolean variable called Stop. I preface string variables with str, and integers with i etc. but I'm not always consistent it would seem.

I have noticed in my IDE, every time I start a loop or a function, it wants to format it like this:



But for some reason, the layout doesnt sit well with me ... I need those curly braces to be lined up so it has a logical structure visually like this:



And I've seen this done on several web sites in simple if statements ... seemed easy enough to read to me so sometimes I use it:



If this is not acceptable, I will do my best to conform to what is acceptable. I am not unteachable and I love to learn.

The way code looks on this web site is kind of a difficult read in it of itself.


And I do agree with you ... an XML parser would probably save Sash Ko a ton of time. I was just offering my $.02

Mike
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The conventions for C and those other languages were mostly developed when a megabyte of memory cost about the same as a large London house. And probably occupied about the same space. They tried to squeeze as much as possible into each line because each keystroke had a cost. Now you can buy a megabyte of ram for about the same as one sip of coffee, things have changed and spacing is considered important. Longer variable names and variable names which are self‑explanatory are now the thing. So bStop would be regarded as a poor name. The b is unnecessary because the name should make it clear that it is a boolean variable, maybe called shouldStop or canStop. Those convey more information than stop, and the format makes it clear that they are booleans. If the name is something like text, who needs to add str to it?
You can often set up options on IDEs to get braces to line up like this
{
   something();
}
It varies from IDE to IDE.
 
Michael D Sims
Ranch Hand
Posts: 167
1
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:The conventions for C and those other languages were mostly developed when a megabyte of memory cost about the same as a large London house. And probably occupied about the same space. They tried to squeeze as much as possible into each line because each keystroke had a cost.



I once calculated the cost of 8 gigs of ram using the Commodore 64 - 1983 price as a base. It had 64k of RAM and it cost around $400 for the whole computer. 1 gig of ram has 8,388,608k ... divide that by 64 and you get 16,384 Commodore 64’s in a gig of ram. At the $400 per 64 price, it would have cost $6,553,600.00(6.5 MILLION dollars) back in 1983 for a gig of ram. Todays typical computer has about 8 gigs of ram, so at the 1983 price, you’d be looking at $52,428,800.00 (52 MILLION dollars), for what today can be purchased for around $100.00. I don’t think anything in life or nature has depreciated that quickly over such a short period anywhere in our history. Technology is bizarre economically speaking.

Campbell Ritchie wrote:...things have changed and spacing is considered important. Longer variable names and variable names which are self‑explanatory are now the thing.



Thank you ... I will get my mind geared in that direction. I hate typing long variable names personally, but I understand the need to have readable code.

Campbell Ritchie wrote:You can often set up options on IDEs to get braces to line up like this
{
   something();
}
It varies from IDE to IDE.



Funny you mention this, as just before replying to this post, I found those settings in my IDE - which is IntelliJ IDEA 13 ... nice IDE, but jam packed with a zillion settings I’ll never use nor care to understand. Right now I’m having a problem with IntelliJ telling me that I can’t use a string in a switch statement because I need to use a Java version newer than 6 or something like that ... yet I’m using 1.8 so I dunno ... but I’ll start a new thread on that one.

Thanks again for your time and mentoring...

Mike Sims
 
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Michael D Sims wrote:I once calculated the cost of 8 gigs of ram using the Commodore 64 - 1983 price as a base. It had 64k of RAM and it cost around $400 for the whole computer. 1 gig of ram has 8,388,608k ... divide that by 64 and you get 16,384 Commodore 64’s in a gig of ram. At the $400 per 64 price, it would have cost $6,553,600.00(6.5 MILLION dollars) back in 1983 for a gig of ram. Todays typical computer has about 8 gigs of ram, so at the 1983 price, you’d be looking at $52,428,800.00 (52 MILLION dollars), for what today can be purchased for around $100.00. I don’t think anything in life or nature has depreciated that quickly over such a short period anywhere in our history. Technology is bizarre economically speaking.



If you want raw memory, look at the ZX81 wobbly RAM pack.
£49.95 at launch in 1981 (excluding blue tack) for 16k.
Works out at about £26mil .. just for the 8Gb memory.

Michael D Sims wrote:
Funny you mention this, as just before replying to this post, I found those settings in my IDE - which is IntelliJ IDEA 13 ... nice IDE, but jam packed with a zillion settings I’ll never use nor care to understand. Right now I’m having a problem with IntelliJ telling me that I can’t use a string in a switch statement because I need to use a Java version newer than 6 or something like that ... yet I’m using 1.8 so I dunno ... but I’ll start a new thread on that one.

Thanks again for your time and mentoring...

Mike Sims



That's odd, because 13 is up to date with Java 8. 12 is stuck on Java 7.
You sure you have the latest updates?
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Dave Tolls wrote:That's odd, because 13 is up to date with Java 8. 12 is stuck on Java 7.
You sure you have the latest updates?


The first thing I would look at is the project SDK and language level settings: File / Project Structure / Project, check if the Project SDK is set to 1.8 and the language level to 8.0.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

On Sunday, I wrote: . . . a megabyte of memory cost about the same as a large London house. . . .

Correction: about the same as London
Oh no, you were calculating gigabytes. So I wasn't that far out.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Michael D Sims wrote: . . . ... nice IDE, but jam packed with a zillion settings . . .

That is why we usually warn newbies to avoid IDEs until they are more experienced.

I think I shall duplicate this discussion in the IDEs forum.
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Remember C came out in 1972 and those prices for memory were current in 1981; house prices rose by at least 3× during that decade and memory prices probably came down by 3×.
Actually memory prices have hardly changed: if you look here, you find they are still in a range straddling £49.95!
 
Michael D Sims
Ranch Hand
Posts: 167
1
IntelliJ IDE MySQL Database Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Remember C came out in 1972 and those prices for memory were current in 1981; house prices rose by at least 3× during that decade and memory prices probably came down by 3×.
Actually memory prices have hardly changed: if you look here, you find they are still in a range straddling £49.95!

That must be the magic number for manufacturers to be able to kick out a product while maintaining a profit of some kind. The only thing that changes is the technology they produce gets smaller and faster all the time. It's a good time to be alive on planet Earth!
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic