• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

reading a website data to build a dashboard

 
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi guys ..
i want to build a dsah board after fetching data from a website.

means seeing a website page content i want to develop an .xml file which stores tha data of that web page and that .xml file is to be used to generate a dashboard to generate a report.

say a web page is displaying various information on movies.....say it's revenue, it's production cost and all

Now i want to store the top 10 movies from that page (on any selective criteria say ..revenue) in an .xml file. that xml file willl begenerated at my machine...and will be send to create a dashboard report.

questiion

So my question is is it possible....i mean how can i read a web page content and store it in a .xml file..and if it is posssible what is the way to do this....and if it is possible how can we create that .xml file by reading the web page..
please help me with suitable way..
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There's no natural mapping from an HTML page to XML, so you'll need to code that yourself. I'd approach this using a library like HtmlUnit that makes it easy to access a web site programmatically. It cleans the HTML so it becomes well-formed XML, and then presents a DOM and XPath interface that you can use to extract whichever parts of the page you're interested in.
 
rammie singh
Ranch Hand
Posts: 122
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Ulf

thanks for your response.
well you said that we can use library like HtmlUnit...so is this library already present or we need to create it.
or is there any tool to read the contents of a web page .
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A quick search for "htmlunit" will answer that.
 
Ranch Hand
Posts: 263
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can fetch the html page.
Run regular expressions on it, to extract data.
And the create xml out of the data.

This was the approach that a few web-content aggregation products used to follow.
 
Rancher
Posts: 377
Android Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey,

I have also used a library called JTidy to get XML from HTML to allow me to extract data.

Sean
 
Every plan is a little cooler if you have a blimp. And a tiny ad.
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic