File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Searching other websites

 
Brian Mulvany
Greenhorn
Posts: 28
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
I am doing a project where users will be able to type in the name of a cd and prices and details of the cd from various different shops will be returned. First of all I would like a user to be able to type in what they want and the results of the search from just two websites to be displayed in two frames at the bottom of the page, one frame for website one and the second frame for website two.
I would like to know where to go about doing this.
I would appreciate as much help as possible as Im kinda lost.
Bye now
Brian
bmulvany@gmail.com
 
danny liu
Ranch Hand
Posts: 185
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Brian,

You may need HttpConnection and HTML Parser to fulfill that task.

a. use HttpConnection to connect to background web sites, post the search terms and get a html format result.

b. parse answer from that result using a parser.

Hope it helps.

Dan
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13045
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The HttpClient toolkit (from the Apache SF Commons project here) is convenient for simulating a browser connection to a website.
Bill
 
Brian Mulvany
Greenhorn
Posts: 28
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for that
But I went to that site and to be honest I found it all a bit daunting. I was wondering if there was anywhere I could go to learn the basics of HTTP Client. At least I know what I have to do in principle butI dont know how to put it into practice. As I said previously I want a user to be able to type in acd they want and the results to be returned unformatted from a website (cdwow.com for example)So on my page I want the user to see the exact same information they would see if they went to cdwow and searched for it on their site. I will worry about extracting the exact information I want when I get this first bit working and that is where the Parser comes in as far as I am aware.
Thanks
Brian
 
Brian Mulvany
Greenhorn
Posts: 28
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have managed to create several Java classes that search http://www.cdwow.ie for an album and return the results back as XML but within the Java program. I want the user to be able to use a JSP webpage with a form to be able to search for whatever cd they want. When they type in U2 for example I want U2 to be passed into the Java program. Also I would like the results of the seach that are in XML to be in abrand new XML or HTML file.
Thanks
Brian Mulvany
 
Dharmanand Singh
Greenhorn
Posts: 13
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All I could make out from your question is that you want to search a site and probably you have all the html, jsp files that lie there. There are 2 approaches that can be taken in order to search the documents of the site. First is to index all the display data of the html and jsp files on the site and secondly to index the data after crawling the whole site. Now I can give you some idea of using the former method. You will need to parse the html and jsp files and extract the display information from them by employing some logic. After getting the display data, you need to analyse and then index the content and store it. You can then search efficiently on these stored indices and keep on updating these indices whenever there is a change in the site. Now, there are many libraries which will help you to analyse and index the content and perform search on them. I have used one of such libraries: Lucene. You can download an lucene search web-application example from: Download. You can read about the details of this example here. There are certain others which also have a crawler that index the whole site by following the links (as indicated by me as second approach). I haven't used any such library personally but know of one whose documentation are not in English Regain. But I am sure we can find more and try them out. eg. htdig which probably runs on UNIX flavors.
[ December 02, 2004: Message edited by: Dharmanand Singh ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic