This week's book giveaway is in the HTML Pages with CSS and JavaScript forum.
We're giving away four copies of Testing JavaScript Applications and have Lucas da Costa on-line!
See this thread for details.
Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
  • Piet Souris
  • Frits Walraven
  • Carey Brown

TCP: out of memory — consider tuning tcp_mem

Ranch Hand
Posts: 52
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Recently we experienced an interesting production problem. This application was running on multiple AWS EC2 instances behind Elastic Load Balancer. The application was running on GNU/Linux OS, Java 8, Tomcat 8 application server. All of sudden one of the application instances became unresponsive. All other application instances were handling the traffic properly. Whenever the HTTP request was sent to this application instance from the browser, we were getting following response to be printed on the browser.

We used our APM (Application Performance Monitoring) tool to examine the problem. From the APM tool, we could observe CPU, memory utilization to be perfect. On the other hand, from the APM tool, we could observe that traffic wasn’t coming into this particular application instance. It was really puzzling. Why traffic wasn’t coming in?

We logged in to this problematic AWS EC2 instance. We executed vmstat, iostat, netstat, top, df commands to see whether we can uncover any anomaly. To our surprise, all these great tools didn’t report any issue.

As the next step, we restarted the Tomcat application server in which this application was running. It didn’t make any difference either. Still, this application instance wasn’t responding at all.

DMESG command
Then we issued ‘dmesg’ command on this EC2 instance.  This command prints the message buffer of the kernel. The output of this command typically contains the messages produced by the device drivers. In the output generated by this command, we noticed the following interesting messages to be printed repeatedly:

We were intrigued to see this error message: “TCP: out of memory — consider tuning tcp_mem”. It means out of memory error is happening at the TCP level. We had always taught out of memory error happens only at the application level and never at the TCP level.

Problem was intriguing because we breathe this OutOfMemoryError problem day in and out. We have built troubleshooting tools like GCeasy, HeapHero to facilitate engineers to debug OutOfMemoryError that happens at the application level (Java, Android, Scala, Jython… applications). We have written several blogs on this OutOfMemoryError topic. But we were stumped to see OutOfMemory happening at the device driver level. We never thought there would be a problem at the device driver level, that too in, stable Linux operating system. Being stumped by this problem, we weren’t sure how to proceed further.

Thus, we resorted to Google god’s help 😊. Googling for the search term: “TCP: out of memory — consider tuning tcp_mem”, showed only 12 search results. But for one article, none of them had much content  ☹. Even that one article was written in a foreign language that we couldn’t understand. So, we aren’t sure how to troubleshoot this problem.

Now left with no other solutions, we went ahead and implemented universal solution i.e. “restart”. We restarted the EC2 instance to put-off immediate burning fire. Hurray!! Restarting the server cleared the problem immediately. Apparently, this server wasn’t restarted for several days (like more than 70+ days), maybe due to that application might have saturated TCP memory limits.

We reached out to one of our intelligent friends who works for a world-class technology company for help. This friend asked us the values that we are setting for the below kernel properties:


Honestly, this is the first time, we are hearing about these properties. We found that below are the values set for these properties in the server:

Our friend suggested to change values as given below:

He mentioned setting these values will eliminate the problem we had faced. Sharing the values with you (as it may be of help to you). Apparently, our values have been very low when compared to the values he has provided.

Here are a few conclusions that we would like to draw:

* Even the modern industry-standard APM (Application Performance Monitoring) tools aren’t completely answering the application performance problems that we are facing today.
* ‘dmesg’ command is your friend. You might want to execute this command when your application becomes unresponsive, it may point you out valuable information
* Memory problems doesn’t have to happen in the code that we write 😊, it can happen even at the TCP/Kernel level.

    Bookmark Topic Watch Topic
  • New Topic