Hi,
I am currently working in a project that uses the Google Search Appliance (GSA), and although its default usage is to act as a web crawler it can be fed so called "Feeds" that can be just as simple as a list of URLs to crawl, up to a list of content (ie the full content) that it should index. These Feeds are in XML format, and are fed to the GSA by doing a HTTP POST to it. I have written the code that does the POST call, but now I have a few problems:
I have no easy access to the GSA server, and it is used in a live environment already, so I am not allowed to do "all sorts of
testing". So for now I can not know for sure that the code I have written (using HttpClient) performs the POST correctly so that the GSA will accept it. By using the DEBUG logging mode I managed to get HttpClient print out various information about the request it sends (when I tried it against some random web server). But that information wasn't perfectly clear, so I still don't know if it is in exactly the right format.
But even if it I could confirm that it *is* in the right format, I would very much be able to have a unit test that verifies that, instead of relying on analyzing some debug printouts. So I started looking around, but couldn't find anything really useful in this area. I found a lot of information about how to test a
servlet, and mocking http requests. But what I want is the exact opposite. I want to mock a servlet that can take real http requests, and then verify that the requests are in certain format. And I want this mocked servlet to be deployed in a temporary web server on the fly for the duration of the unit test listening on localhost, so I don't have to have any external dependencies.
Does anyone has any tips on how to go on with this? Can this be done with Jetty? Also, if someone is an expert in the HTTP protocol and HttpClient, then maybe he/she could tell me right away if I my code seems about right:
Here is the documentation for the GSA Feeds:
http://code.google.com/intl/us-en/apis/searchappliance/documentation/64/feedsguide.html
And here is the section that describes what the HTTP POST should look like:
http://code.google.com/intl/us-en/apis/searchappliance/documentation/64/feedsguide.html#how_feed_pushes
Here is their example of what the post should look like:
Here is my code that sends the request:
And here is the relevant the debug output:
What worries me is that it doesn't output any information about the "feedtype", "datasource" and "data" parts that are added. If I remove the line where I add the "data" part, then it prints out a lot more information, regarding "feedtype" and "datasource":
Is there no way of making it print out this information when I add the "data" part too? Is it because it sends it in a binary form? If so, is there a way to send it in regular urlencoded form? Because that is what the GSA expects. And the file I send *is* a regular plain text xml file. And regarding this, I have no idea why the GSA want it in urlencoded format, because they say:
You should post the feed using enctype="multipart/form-data". Although the search appliance supports uploads using enctype="application/x-www-form-urlencoded", this encoding type is not recommended for large amounts of data.
...and when sending so called "Content Feeds" the data sent can be quite large. Up to 1GB. Isn't it terribly inefficient to urlencode 1GB text data? But still, that is what they seem to demand that I do, even though they don't want me to specify enctype="application/x-www-form-urlencoded", so that confuses me a bit.
Oh, and I use httpcomponents httpclient 4.0.1 if anyone wonders.
Regards
/Jimi