"The good news about computers is that they do what you tell them to do. The bad news is that they do what you tell them to do." -- Ted Nelson
Karthik Shiraly wrote:A HttpClient instance is getting created just once (when SparkResource is instantiated), and closed immediately after the first request.
Closing it closes the connection manager, socket factory and any other closeable resource held by that httpclient.
I doubt this'll work correctly for any subsequent request. Have you deployed this and tested?
Karthik Shiraly wrote:Either that...or SparkResource is getting instantiated for every request, presumably by GAE. A trace statement in its constructor can clarify this.
From httpclient code, I doubt the former.
Has it solved the socket quota problem?
Karthik Shiraly wrote:The observations in this discussion kind of surprised me, and I set about investigating this whole thing in depth today.
"The good news about computers is that they do what you tell them to do. The bad news is that they do what you tell them to do." -- Ted Nelson
Sanjeev Mehta wrote:You were right. App Engine was indeed creating a new instance for every request. I tried to make the SparkResource a singleton using @Singleton but seems like it doesnt work. All I did was add @Singleton above @Path. According to the documentation, it should start treating SparkResource as a singleton after this. But, after deploying and looking at logs, it was still creating a copy per request.
Karthik Shiraly wrote:
Sanjeev Mehta wrote:You were right. App Engine was indeed creating a new instance for every request. I tried to make the SparkResource a singleton using @Singleton but seems like it doesnt work. All I did was add @Singleton above @Path. According to the documentation, it should start treating SparkResource as a singleton after this. But, after deploying and looking at logs, it was still creating a copy per request.
Both the documentation and search results point towards having to register your singleton with an application or resourceconfig. Have you missed that step?
Since I'm not familiar with jersey myself, I'm unable to give precise directions. Perhaps you can experiment by setting up a test jetty server with Jersey and trying on it before deploying to GAE.
Does the above code look efficient?
Karthik Shiraly wrote:Sorry for the late reply; I was not in town last 3 days and didn't check my mails.
1. Are connManager.close() and httpclient.close() called from elsewhere, so that they can release any resources they created?
If not, you may want to find out what callback Wink gives a resource during resource lifecycle close, and call these two close() methods from that
callback.
Since I've not used Wink, I'm not sure what callback it sends to the resource during shutdown. You'll have to find it out.
2. I think it's better to make connmanager and httpclient as instance members instead of static.
Then you can be sure that the lifetimes of connmanager and httpclient are within lifetime of the (singleton) SparkResource object.
Currently, lifetimes of connmanager and httpclient start from classloading of SparkResource class, not from its instantiation.
And when you implement SparkResource close callback, their lifetimes will end in that callback.
If Wink is designed in such a way that for whatever reason, it creates a SparkResource singleton, closes it, abandons it and recreates
a second SparkResource singleton later on, the second instance will find that the static shared httpclient is already closed and fail.
Admittedly, this is a far fetched scenario and your code with static members will probably work just fine, but
since your entire problem is centred around correct resource cleanup, it's better to be extra safe than sorry.
3. Add a log line to SparkResource constructor, so that you can verify for sure there's only 1 instance.
Does the above code look efficient?
Currently, response inputstream and connection are kept open while Jsoup is parsing, and closed in the finally block only after all parsing is done.
HTML parsing is a time consuming activity, and prevents a connection from being reused while it's going on.
Better to read the response inputstream quickly into a String using EntityUtils.toString(entity), close the response in finally block, and
parse the string using jsoup only after the finally block.
The best way to be sure your code handles n requests/min is to test it under a slightly higher load using a tool like jmeter or grinder.
com.google.apphosting.api.ApiProxy$CancelledException: The API call remote_socket.Close() was cancelled because the overall HTTP request deadline was reached.
Do you know what could be the reason for that?
Karthik Shiraly wrote:Yes, the instantiation looks fine now.
Maybe its better to first test if basic httpclient solved your quota problem, without all the timeouts and max limits. Just to keep the complexity low and have fewer unknowns.
You can configure the timeouts and max limits later on when the quota issue is solved.
com.google.apphosting.api.ApiProxy$CancelledException: The API call remote_socket.Close() was cancelled because the overall HTTP request deadline was reached.
Do you know what could be the reason for that?
Perhaps it's the same problem as your other topic: https://coderanch.com/t/657820/Web-Services/java/Google-App-Engine-jsoup-urlfetch
It sounds like some HTTP requests to your target didn't return within some GAE deadline (seems to be 60 secs). Perhaps some GAE configuration should be changed?
Add logs before and after the .execute() and see how much time they usually take.
Out of curiousity: Didn't the urlfetch approach recommended by GAE support solve your quota problem?
Karthik Shiraly wrote:Do some requests go through, or does every request give a deadline error?
If every request is failing, then it's likely the target site is blocking some requests based on country of origin or other conditions.
Karthik Shiraly wrote:GAE docs say "The URL Fetch service uses Google's network infrastructure for efficiency and scaling purposes."
One possible explanation is that URL fetch routes the request through some addresses that are blacklisted by target site.
On the other hand, when using httpclient, probably the requests go through your assigned GAE server IP address(es), which may not be blacklisted by target site.
The failures you see with httpclient may be because the target site is dropping some requests if too many requests / minute are originating from the same address or address block.
Throttle down the number of requests being sent to target site and see if that helps.
Sanjeev Mehta wrote:I read that the default number of connections on the PoolingHTTPClientConnectionManager are 2. Wouldn't this cause a problem if my app is getting multiple requests a second and I remove the line.
Also, what is an easy way for me to log how many socket connect calls my app makes?
If you two don't stop this rough-housing somebody is going to end up crying. Sit down and read this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
|