I'm trying to write a simple program that saves an html page given its url. However, what I'm retrieving is not the same html that the browser (in my case Mozilla) uses. To see what I mean: - run the following code - open the embedded link (http://www.ranchhouseinn.com/ranch.html) in a browser and then select "save page as..." and save it to c:/good.html (or whataver you wish) - compare that file with c:/copytest.html that was generated by the code.
So my questions are: - Why are these files different? - How can I get the html in good.html using java?
- run the following code - open the embedded link (http://www.hikinglasvegas.com/peaks_of_the_sierra.htm) in a browser and then select "save page as..." and save it to c:/good.html (or whatever you wish) - open both c:/copytest.html and c:/good.html IN NOTEPAD - search for 'Mallory' in both files
In good.html, you'll see there's a fully qualified url for this link: - href="http://www.hikinglasvegas.com/Mt_Malloryl_Photo_pg.htm" In copytest.html, you'll see it's been shortened: - href="Mt_Malloryl_Photo_pg.htm"
This is my problem. I'm trying to parse out individual URLs from the document but when I go the code route they're shortened. Any ideas?