Jeanne Boyarsky wrote:Bill,
Yes, it is done via JavaScript. Facebook does something similar. When you scroll down (or click show posts), it uses AJAX to fetch more data and paint it at the bottom of the page.
Why do you want to scrape Yandex specially? I ask because google provides an API for programmatic searching.
Each search engine provides a slightly different set of results. I have found that if you do not use multiple search engines, you will miss out. Google has results that Bing does not have and Yandex has results that Google and Bing do not have.
Over time, I have gotten tired of cutting from the content of browsers and pasting into word processors for my records. So I decided to write a program.
I have figured out how a can modify the URL for yandex to go to different pages. I was able to use yandex by using a "&p=" to navagate to a particular page.
The duckduckgo search engine does not work this way, though.
So far, I can programmatically get all the search results of 7 search engines. I want to get the content from a duckduckgo search engine next. But, like yandex, it too uses AJAX to get content without changing the URL to be different from the search results page.
I have asked the duckduckgo people and they have not been helpful. I was suprised. They do not want people to have the ability to take content from their search results. Their response did not make a lot of sense. I asked them what sets them apart from other search engines who have a different business model where they actually welcome users to programmaticaly take content. In fact, that is exactly what a meta search engine is. They have yet to respond.
But, as I think of this, their approach of being secretive is not in line with the spirit of the interent. All HTML pages can be read. And I think they are actually abusing what AJAX is intended to do. AJAX does not exist so that people can be secretive and hide things.
Either I will just pass on including duckduckgo in my list of search engines, or I will go ahead and find a way to decipher the AJAX from their HTML page. Or I will do both. Since I am doing all this for my own private use, I will maybe do both.
This shouldn't be rocket science. I am able to see the HTML code. Shouldn't there be some sort of AJAX tag in the HTML that I can find to make use of?
the HTML content of duckduckgo does not have the HTML tags that I expect from AJAX, namely XMLHttpRequest and XMLHTTP.