Win a copy of Design for the Mind this week in the Design forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

need the HTML source code..

 
harish raghavan
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello people...iam right now developing a page re-ranking algorithm...sorting the results of google on a user query...for that i need to grab the HTML source code of google serach result..
can u plz send me a program on how to grab the html source of google search result page and save it in file..please help me
 
Bimal Patel
Ranch Hand
Posts: 130
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

You mean, you want to grab the source of google's search engine? Please specify!!
 
Svend Rost
Ranch Hand
Posts: 904
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi..

If your looking for a programmer, you can find one here:
www.rentacoder.com

If you, however, have specific problems we'll gladly help. Please specify
your problem (i.e. "Im not sure how to write data to a file, this code
doesn't work").

/Svend Rost
 
harish raghavan
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
yeah...say for example

http://www.google.co.in/search?hl=en&q=computer+networks&meta=

this is the google result of computer network..i need the HTML source of this stored in a file...please help me
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd like to try that :
http://www.google.com/apis/
 
Rusty Shackleford
Ranch Hand
Posts: 490
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just write a very basic web browser that fetches, but doesn't display the page. Then save it.

Or you can just right click on the page and hit view source.
 
Tom Sullivan
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You might take a look at Sourceforge HTMLParser.

HTMLParser
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Important!

From http://www.google.com/terms_of_service.html


You may not send automated queries of any sort to Google's system without express permission in advance from Google.
 
harish raghavan
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
oh god..thanks for helping me..really thanks..
 
Tom Sullivan
Ranch Hand
Posts: 72
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know how you must feel Harish...

The last time I had to parse HTML pages I did it with HTMLParser. My program was pretty simple in that I only needed to grab values between <div> and <p> tags. HTMLParser allowed me to use the HTML Pages like an XML DOM. While what I was doing was pretty simple, the API has a number of methods in it that I think will solve your problem (at least as far as getting the HTML info goes which you can then write to file with standard IO).
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic