• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Using CloseableHttpClient with Google App Engine leaking sockets?

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I am using CloseableHttpClient in my Google App Engine servlet. Very recently, as I have started to have more traffic in my app, I have started to receive exceptions in Google App Engine where the sockets quota has been exceeded. This is what the exception says



I reached out to the App Engine team and they wanted me to check if my app was leaking sockets. Can someone please help me identify the leak?



This is what my http connection code looks like. Am I doing something wrong which may be causing the sockets to leak? Can I do something better?
 
Rancher
Posts: 4801
50
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you get any exceptions from the execute() call then you won't close off the httpclient.
Presumably the execute should be inside the try block.
 
Bartender
Posts: 1810
28
jQuery Netbeans IDE Eclipse IDE Firefox Browser MySQL Database Chrome Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And I strongly recommend that you add a catch block that logs your exceptions.
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It also looks like there's a potential race condition around the null check. This is a servlet which will receive requests on multiple threads but is initializing member variable httpclient without any lock.
If one thread is preempted just after the null check, another will end up creating a second httpclient instance. Depending on how GAE manages servlet and web app lifecycles, this may happen multiple times.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you very much for your comments guys. I made some changes to my code after your suggestions. Here's what the code looks like now.



Does this look better?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
A HttpClient instance is getting created just once (when SparkResource is instantiated), and closed immediately after the first request.
Closing it closes the connection manager, socket factory and any other closeable resource held by that httpclient.
I doubt this'll work correctly for any subsequent request. Have you deployed this and tested?
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:A HttpClient instance is getting created just once (when SparkResource is instantiated), and closed immediately after the first request.
Closing it closes the connection manager, socket factory and any other closeable resource held by that httpclient.
I doubt this'll work correctly for any subsequent request. Have you deployed this and tested?



Yeah I see what you're saying. I have deployed this code and surprisingly, it has been working fine. Maybe the execute statement establishes another socket connection?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Either that...or SparkResource is getting instantiated for every request, presumably by GAE. A trace statement in its constructor can clarify this.
From httpclient code, I doubt the former.

Has it solved the socket quota problem?
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:Either that...or SparkResource is getting instantiated for every request, presumably by GAE. A trace statement in its constructor can clarify this.
From httpclient code, I doubt the former.

Has it solved the socket quota problem?



I just see two instances of SparkResource instantiated by GAE in the last hour and there have been more than 10,000 requests. So, now even i'm confused how the code is working even after closing the httpclient. It hasn't solved my quota issue either so i'm wondering if the httpclient is being closed at all.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That's very surprising, but logically it indeed looks like execute() works even after a close(), that is assuming close() is actually happening. I think you should add some more traces, especially in the finally block, to make sure it's closing.

As for quotas, I know the GAE team advised looking for socket leaks but have you checked the possibility that your app is simply receiving too many requests and actually exceeding the quotas given on this page page? 864000 socket operations/day translates to 36000 ops/hour, and your traffic in just last hour is 10000 ops/hour. Perhaps over the course of a day, there are peak hours where it goes close to or over the hourly average, and the daily total ends up above the daily quota. Does GAE provide some dashboard of how many operations have occurred?
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yeah that's a possibility. The dashboard does provide that information. There have been ~300,000 requests in the past 15 hours and I think there are times in the day when the requests tend to spike. I also heard back from the App Engine team and they want me to explore using URLFetch instead of the Apache library so i'll look into that as well.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The observations in this discussion kind of surprised me, and I set about investigating this whole thing in depth today.

HttpClient close confusion
The first confusion was over how httpclient execute works even after close. With a trivial command line test application,
I find that it doesn't actually work, as suspected already. It throws an illegal state exception if execute is tried after close.

So the other possibility - that one SparkResource instance and one httpclient instance are being created with every request - is the correct explanation.
This is confirmed by Jersey's documentation on resource lifecycles (https://jersey.java.net/documentation/latest/jaxrs-resources.html#d0e2611)
which says it's 1 instance per request by default.

I can't explain why you saw only 2 instances, but there is some anomaly in the counting or tracing.

Quota problem
Now coming to the actual quota problem.
The default HttpClient instance is a pooling caching one.
So if 10 requests are sent to the same URL in succession, it doesn't open 10 sockets but try to reuse existing sockets that are idle.
I was able to confirm this by stracing my test application. I saw only 2 socket connect() calls for 10 requests, each from a different thread.
It had reused those 2 sockets to service all 10 requests.

I also observed that for each socket connect, httpclient sends at a minimum 3 setsockopt calls - to set SO_REUSEADDR, TCP_NODELAY, and SO_KEEPALIVE.
This is important because if you see the GAE error message, it's actually very very specific - "API call remote_socket.SetSocketOptions() required more quota".
Not connect, not bind, but setsockopts (or its GAE proxy implementation).
It seems apart from the documented limits, GAE also has undocumented limits on individual APIs like setsockopts.

Explanation
Combining all those observations, here's my hypothesis for this problem:

1. On every request, Jersey is creating a new SparkResource instance, which is inturn creating a new HttpClient instance.
So if there are 10000 requests in last hour, there are 10000 SparkResource and HttpClient instances.

2. Since every request is creating a new HttpClient, it can't take advantage of its connection pooling and caching.
So 10000 requests actually result in 10000 separate socket connects.

3. Each socket connect is associated with upto 3 setsockopts. So 10000 connects => 30000 setsockopts at a minimum.
If the daily quota on setsockopts is same as that on connects, this is uncomfortably close to the 36000 operations/hour hourly average.
I can see the sockopts daily quota easily breached even if the connects themselves reached just 1/3rd of the quota.

4. None of this takes into account the lifecycle management of the servlet container itself, because that's in GAE's hands and kind of hidden from us.
In some posts, I read GAE is very aggressive with shutting down the entire webapp when idle and restarting it only when new request arrives.
If true, it'll just amplify this entire problem of too many socket operations.

Recommendations:
Since GAE has already advised you to use URLFetch, follow that advice.

But if you want to use HttpClient for whatever reason in this project or some other, these would be my recommendations:

- Make sure to reuse just one HttpClient instance between requests and threads, to benefit from its connection pooling and caching
capabilities.

- With Jersey, one way to do that is make that SparkResource a singleton with @Singleton (as mentioned in that lifecycle documentation).

- Another more general way is to keep the httpclient in context and close it only in context destroyed hook, by implementing a contextlistener.
 
J. Kevin Robbins
Bartender
Posts: 1810
28
jQuery Netbeans IDE Eclipse IDE Firefox Browser MySQL Database Chrome Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:The observations in this discussion kind of surprised me, and I set about investigating this whole thing in depth today.


Wow. You're hired. That deserves a cow.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
LOL! Kevin, thanks for the cow, but I think you jumped the gun. This is only a theory and when theory runs up against reality reality wins everytime! We should wait for OP to find out if it's solved his problem.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you very much for your detailed post Karthik! Really appreciate the effort you have put into figuring out this issue!

I did not test the number of SparkResource instances created with log statements. But, the App Engine UI showed the number of instances as 2 so I assumed that was correct. I will add log statements and verify for myself now though. I will also try to make it a singleton(which I always thought I was doing ). I'll make these changes now and update the post with my observations. Thank you very much once again!
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You were right. App Engine was indeed creating a new instance for every request. I tried to make the SparkResource a singleton using @Singleton but seems like it doesnt work. All I did was add @Singleton above @Path. According to the documentation, it should start treating SparkResource as a singleton after this. But, after deploying and looking at logs, it was still creating a copy per request.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sanjeev Mehta wrote:You were right. App Engine was indeed creating a new instance for every request. I tried to make the SparkResource a singleton using @Singleton but seems like it doesnt work. All I did was add @Singleton above @Path. According to the documentation, it should start treating SparkResource as a singleton after this. But, after deploying and looking at logs, it was still creating a copy per request.


Both the documentation and search results point towards having to register your singleton with an application or resourceconfig. Have you missed that step?
Since I'm not familiar with jersey myself, I'm unable to give precise directions. Perhaps you can experiment by setting up a test jetty server with Jersey and trying on it before deploying to GAE.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:

Sanjeev Mehta wrote:You were right. App Engine was indeed creating a new instance for every request. I tried to make the SparkResource a singleton using @Singleton but seems like it doesnt work. All I did was add @Singleton above @Path. According to the documentation, it should start treating SparkResource as a singleton after this. But, after deploying and looking at logs, it was still creating a copy per request.


Both the documentation and search results point towards having to register your singleton with an application or resourceconfig. Have you missed that step?
Since I'm not familiar with jersey myself, I'm unable to give precise directions. Perhaps you can experiment by setting up a test jetty server with Jersey and trying on it before deploying to GAE.



Hello Karthik,

I finally made time to read about WINK and setup my app as a Singleton. It was as simple as writing :


I also tried to set up connection pooling so my HttpClient connection would be reused. Below is my code -



How does the code look? Is my usage of PoolingHttpClientConnectionManager correct? My app may receive about 200 requests/min in peak hours. Does the above code look efficient to handling this?

Thank you
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Sorry for the late reply; I was not in town last 3 days and didn't check my mails.

1. Are connManager.close() and httpclient.close() called from elsewhere, so that they can release any resources they created?
If not, you may want to find out what callback Wink gives a resource during resource lifecycle close, and call these two close() methods from that
callback.
Since I've not used Wink, I'm not sure what callback it sends to the resource during shutdown. You'll have to find it out.

2. I think it's better to make connmanager and httpclient as instance members instead of static.
Then you can be sure that the lifetimes of connmanager and httpclient are within lifetime of the (singleton) SparkResource object.

Currently, lifetimes of connmanager and httpclient start from classloading of SparkResource class, not from its instantiation.
And when you implement SparkResource close callback, their lifetimes will end in that callback.
If Wink is designed in such a way that for whatever reason, it creates a SparkResource singleton, closes it, abandons it and recreates
a second SparkResource singleton later on, the second instance will find that the static shared httpclient is already closed and fail.

Admittedly, this is a far fetched scenario and your code with static members will probably work just fine, but
since your entire problem is centred around correct resource cleanup, it's better to be extra safe than sorry.

3. Add a log line to SparkResource constructor, so that you can verify for sure there's only 1 instance.

Does the above code look efficient?


Currently, response inputstream and connection are kept open while Jsoup is parsing, and closed in the finally block only after all parsing is done.
HTML parsing is a time consuming activity, and prevents a connection from being reused while it's going on.
Better to read the response inputstream quickly into a String using EntityUtils.toString(entity), close the response in finally block, and
parse the string using jsoup only after the finally block.

The best way to be sure your code handles n requests/min is to test it under a slightly higher load using a tool like jmeter or grinder.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:Sorry for the late reply; I was not in town last 3 days and didn't check my mails.

1. Are connManager.close() and httpclient.close() called from elsewhere, so that they can release any resources they created?
If not, you may want to find out what callback Wink gives a resource during resource lifecycle close, and call these two close() methods from that
callback.
Since I've not used Wink, I'm not sure what callback it sends to the resource during shutdown. You'll have to find it out.

2. I think it's better to make connmanager and httpclient as instance members instead of static.
Then you can be sure that the lifetimes of connmanager and httpclient are within lifetime of the (singleton) SparkResource object.

Currently, lifetimes of connmanager and httpclient start from classloading of SparkResource class, not from its instantiation.
And when you implement SparkResource close callback, their lifetimes will end in that callback.
If Wink is designed in such a way that for whatever reason, it creates a SparkResource singleton, closes it, abandons it and recreates
a second SparkResource singleton later on, the second instance will find that the static shared httpclient is already closed and fail.

Admittedly, this is a far fetched scenario and your code with static members will probably work just fine, but
since your entire problem is centred around correct resource cleanup, it's better to be extra safe than sorry.

3. Add a log line to SparkResource constructor, so that you can verify for sure there's only 1 instance.

Does the above code look efficient?


Currently, response inputstream and connection are kept open while Jsoup is parsing, and closed in the finally block only after all parsing is done.
HTML parsing is a time consuming activity, and prevents a connection from being reused while it's going on.
Better to read the response inputstream quickly into a String using EntityUtils.toString(entity), close the response in finally block, and
parse the string using jsoup only after the finally block.

The best way to be sure your code handles n requests/min is to test it under a slightly higher load using a tool like jmeter or grinder.



Hello Karthik,

Thank you once again for your detailed comments. I tried to incorporate some of your suggestions in my code. How does the below code look?



I verified that only one instance of the class is being created with the log inside the constructor. connManager.close() and httpclient.close() are currently not being called at all. I'm new to Java and WINK both but i'm reading documentation on WINK to figure out the best place to call it.

Also, for whatever reason, I get a lot of HTTP exceptions like



Do you know what could be the reason for that?
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, the instantiation looks fine now.
Maybe its better to first test if basic httpclient solved your quota problem, without all the timeouts and max limits. Just to keep the complexity low and have fewer unknowns.
You can configure the timeouts and max limits later on when the quota issue is solved.

com.google.apphosting.api.ApiProxy$CancelledException: The API call remote_socket.Close() was cancelled because the overall HTTP request deadline was reached.
Do you know what could be the reason for that?


Perhaps it's the same problem as your other topic: https://coderanch.com/t/657820/Web-Services/java/Google-App-Engine-jsoup-urlfetch
It sounds like some HTTP requests to your target didn't return within some GAE deadline (seems to be 60 secs). Perhaps some GAE configuration should be changed?
Add logs before and after the .execute() and see how much time they usually take.

Out of curiousity: Didn't the urlfetch approach recommended by GAE support solve your quota problem?
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:Yes, the instantiation looks fine now.
Maybe its better to first test if basic httpclient solved your quota problem, without all the timeouts and max limits. Just to keep the complexity low and have fewer unknowns.
You can configure the timeouts and max limits later on when the quota issue is solved.

com.google.apphosting.api.ApiProxy$CancelledException: The API call remote_socket.Close() was cancelled because the overall HTTP request deadline was reached.
Do you know what could be the reason for that?


Perhaps it's the same problem as your other topic: https://coderanch.com/t/657820/Web-Services/java/Google-App-Engine-jsoup-urlfetch
It sounds like some HTTP requests to your target didn't return within some GAE deadline (seems to be 60 secs). Perhaps some GAE configuration should be changed?
Add logs before and after the .execute() and see how much time they usually take.

Out of curiousity: Didn't the urlfetch approach recommended by GAE support solve your quota problem?



Ok. I'll remove the timeouts and max limits for now and add logs to check the average response time. I never got URLFetch to work. It worked fine when I deployed it on my local server but always failed when I deployed to GAE. So, I stuck to using Apache.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Do some requests go through, or does every request give a deadline error?
If every request is failing, then it's likely the target site is blocking some requests based on country of origin or other conditions.

Update: It's also possible the target site is dropping some requests if too many requests / minute are originating from the same address or address block.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:Do some requests go through, or does every request give a deadline error?
If every request is failing, then it's likely the target site is blocking some requests based on country of origin or other conditions.



Yes every request gives a deadline error. But whats confusing to me is if GAE URLFetch requests are blocked why do GAE Apache requests work.
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
GAE docs say "The URL Fetch service uses Google's network infrastructure for efficiency and scaling purposes."
One possible explanation is that URL fetch routes the request through some addresses that are blacklisted by target site.

On the other hand, when using httpclient, probably the requests go through your assigned GAE server IP address(es), which may not be blacklisted by target site.

The failures you see with httpclient may be because the target site is dropping some requests if too many requests / minute are originating from the same address or address block.
Throttle down the number of requests being sent to target site and see if that helps.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Karthik Shiraly wrote:GAE docs say "The URL Fetch service uses Google's network infrastructure for efficiency and scaling purposes."
One possible explanation is that URL fetch routes the request through some addresses that are blacklisted by target site.

On the other hand, when using httpclient, probably the requests go through your assigned GAE server IP address(es), which may not be blacklisted by target site.

The failures you see with httpclient may be because the target site is dropping some requests if too many requests / minute are originating from the same address or address block.
Throttle down the number of requests being sent to target site and see if that helps.



Yeah I believe that is the case. GAE IP addresses might be blocked. Because even after throttling done and sending 1 request/min, it does not go through.
 
Sanjeev Mehta
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hey Karthik,

I read that the default number of connections on the PoolingHTTPClientConnectionManager are 2. Wouldn't this cause a problem if my app is getting multiple requests a second and I remove the line.

Also, what is an easy way for me to log how many socket connect calls my app makes?

Thanks
 
Karthik Shiraly
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Sanjeev Mehta wrote:I read that the default number of connections on the PoolingHTTPClientConnectionManager are 2. Wouldn't this cause a problem if my app is getting multiple requests a second and I remove the line.


Definitely tuning is required and you may need to increase the connections. The point I was making was to do it step by step. Turn only one knob at a time, measure its impact, and only then proceed to turning the next knob.
Right now, you are checking that singleton http connection pooling is working and doesn't give you a quota or timeout error. If that basic functionality works, then turn the next knob, and so on.

Also, what is an easy way for me to log how many socket connect calls my app makes?


The best way I think is to first run your rest service on your own linux testing server or VM, and log socket calls using "strace -f -o strace.log java -jar jetty.jar", and simulate client connections to that test server.

On GAE, I'm not sure; as far as I know, it's only a platform and does not provide you tools like strace (but confirm this).
If so, the only way I can think of is to download httpclient source, add a couple of logs, and build your own httpclient.jar.
 
If you two don't stop this rough-housing somebody is going to end up crying. Sit down and read this tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic