• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

HTML Parser

 
Greenhorn
Posts: 28
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello all
I am looking for a bit of inspiration here. I am doing a college project where I am creating a website where a user can type in a name of a CD and click search. When the user search's for a cd I want my website to search two or three other music sites and return how much the cd is in each of the 3 sites.
E.g One by One by the Foo fighters might be $12 in one shop and $14 in the other two, so now the user has an idea where they can go to buy the cheapest cd's.
I was told I could use a HTML parser and Java Swing to search the other websites and display the information I'm looking for.
If you want to look at www.bizrate.com, you will find a website similar to what I am trying to create.
What Im saying here might sound totally stupid to you but it's totally new to me and I would appreciate some guidance as to where I should go to start this project.
Thanks
Brian Mulvany
 
Sheriff
Posts: 67746
173
Mac Mac OS X IntelliJ IDE jQuery TypeScript Java iOS
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One of the problems that you are going to run into, even once you have an HTML parser working (google around and you will find many packages that parse HTML), is that unlike XML, HTML is a presentation markup. So it's difficult to tell under program control exactly what construct/element contains the data you are after. You can hard-code such mining, but that is subject to the whim of the developers of the other sites: they change the HTML around, your program breaks.

There's also the ethical considerations of mining other sites without the owners' permission. But I'll let you ponder that one.

If the sites in question provide XML-based feeds (RSS or otherwise), that's a much better bet since the data will be deterministic and any ethical considerations are moot.
 
when your children are suffering from your punishment, tell your them it will help them write good poetry when they are older. Like this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic