Two Laptop Bag*
The moose likes I/O and Streams and the fly likes scraping XML Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Java 8 in Action this week in the Java 8 forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "scraping XML" Watch "scraping XML" New topic
Author

scraping XML

Ciri Bhoy
Greenhorn

Joined: Oct 20, 2011
Posts: 16
Hi all,

I'm writing a small app that reads in an XML file from a website as an inputstream, but I want to parse this inputstream in order to display only certain results contained as follows:.

<tr>
<td class="first">

<img id="ctl00_Content_ctl00_rptInfo_ctl16_Image2" alt="Inactive" src="../../images/t2.jpg" style="border-width:0px;" />
</td>
<td >
Brussels
</td>
<td>
Aer Lingus
</td>
<td>
EI639
</td>
<td>
12 Mar 21:50
</td>
<td class="last">
Arrived 21:39
</td>
</tr>

<tr>
<td class="first">
<img id="ctl00_Content_ctl00_rptInfo_ctl17_Image1" alt="Active" src="../../images/t1.jpg" style="border-width:0px;" />

</td>
<td >
..........................
..........................

I'm currently doing this by reading in each line and pulling out the relevant data using readLine() and it's working fine.....problem is, this seems far too easy. It's only a small project so performance isn't really an issue, I'm again just looking for the 'right' way of doing it....or a few 'right' ways. I hope I'm making myself clear enough, I'm afraid I'm not too well up on the jargon yet.

Any advice is very welcome and appreciated.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 39578
    
  27
That doesn't look like XML; it looks like HTML. My first weapon of choice would be a library that can handle HTML like HtmlUnit, which also handles the downloading of the page.


Ping & DNS - updated with new look and Ping home screen widget
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: scraping XML
 
Similar Threads
JSP page not getting displayed
Getting search results on same search page
How to hide a cell w/ background image until a function call
location.href not working on FireFox 3.5
Page displaying in IE6 and older versions but not in Higher versions, chrome and firefox also.