I'm trying to make a program that can scrape some data out of a text file and get certain important parts. The text files contains the source code for an ebay auction. What I am trying to do is grab key parts like the price, item location, etc.
I can get the source code and save it to a file and I can read and write files. What I need to figure out is how to grab that particular part of the file.
I know that I can use regular expressions to match things that I'm looking for, but how can I grab a part of the file that is after the match to the regex? I was able to match a line in the file that has the particular piece of info I want and store that in a separate string, but the line is quite long and I'm not sure how to break it down to get at what I need, or handle a situation where the rest of the information is on the next line.
Ideally I would like to get whatever is left on the line after the regex match, and then do a string tokenizer with a space delimiter which should capture the data and I can have it end when it reaches something like a < or "
I suppose I could take a whole line and tokenize it but there has to be a better way. Any assistance is appreciated.
If you're parsing XML, then Java has some nice frameworks that help you parse it up. If you're working with a proprietary text format, then you're pretty much stuck with rolling your own solution. I'm not really clear on your specific questions though. Maybe you could give an example of what you're trying to do, and what's not working?
Joined: Nov 11, 2011
Thanks. It's proprietary - just source code for a webpage. However, it is unique (mostly). I've got it all done now. I just have to have a method for each item I want to find, match a section with a regex, then do a scanner class and string tokenizer to break things down and some if/then stuff to allow for small variances in the data. It was a pain, but it's working now. I was just hoping there was an easier way to do it, say grab the next x number of characters AFTER a regex match and then just work with that.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com