This week's book giveaway is in the Java in General forum.
We're giving away four copies of Event Streams in Action and have Alexander Dean & Valentin Crettaz on-line!
See this thread for details.
Win a copy of Event Streams in Action this week in the Java in General forum!

jim terry

Ranch Hand
+ Follow
since Nov 18, 2018
Cows and Likes
Cows
Total received
0
In last 30 days
0
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
12
Given in last 30 days
3
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by jim terry


In this modern world, Garbage collection logs are still analyzed in a tedious & manual mode. i.e. you have to get hold of your Devops engineer who has access to production servers, then he will mail you the application’s GC logs, then you will upload the logs to GC analysis tool, then you have to apply your intelligence to anlayze it. There is no programmatic way to analyze Garbage Collection logs in a proactive manner. Thus to eliminate this hassle, gceasy.io is introducing a RESTful API to analyze garbage collection logs. With one line of code you can get your GC logs analyzed instantly.

Here are few use cases where this API can be extremely useful.

Use case 1:Automatic Root cause Analysis
Most of the DevOps invokes a simple Http ping or APM tools to monitor the applications health. This ping is good to detect whether application is alive or not. APM tools are great at informing that application’s CPU spiked  up by ‘x%’, memory utilization increased by ‘y%’, response time dropped by ‘z’ milliseconds. It won’t inform what caused the CPU to spike up, what caused memory utilization to increase, what caused the response time to degrade. If you can configure Cron job to capture thread dumps/GC logs on a periodic interval and invoke our REST API, we apply our intelligent patterns & machine learning algorithms to instantly identify the root cause of the problem.

Advantage 1: Whenever these sort of production problem happens, because of the heat of the moment, DevOps team recycles the servers with out capturing the thread dumps and GC logs. You need to capture thread dumps and GC logs at the moment when problem is happening, in order to diagnose the problem. In this new strategy you don’t have to worry about it, because your cron job is capturing thread dumps/GC logs on a periodic intervals and invoking the REST API, all your thread dumps/GC Logs are archived in our servers.

Advantage 2: Unlike APM tools which claims to add less than 3% of overhead, where as in reality it adds multiple folds, beauty of this strategy is: It doesn’t add any overhead (or negligible overhead). Because entire analysis of the thread dumps/GCeasy are done on our servers and not on your production servers..

Use case 2: Performance Tests
When you conduct performance tests, you might want to take thread dumps/GC logs on a periodic basis and get it analyzed through the API. In case if thread count goes beyond a threshold or if too many threads are WAITING or if any threads are BLOCKED for a prolonged period or lock isn’t getting released or frequent full GC activities happening or GC pause time exceeds thresholds, it needs to get the visibility right then and there. It should be analyzed before code hits the production. In such circumstance this API will become very handy.

Use case 3: Continuous Integration
As part of continuous integration it’s highly encouraged to execute performance tests. Thread dumps/GC Logs should be captured and it can be analyzed using the API.  If API reports any problems, then build can be failed. In this way, you can catch the performance degradation right during code commit time instead of catching it in performance labs or production.

How to invoke Garbage Collection log analysis API?

Invoking Garbage Collection log analysis is very simple:

1). Register with us. We will email you the API key. This is a one-time setup process. Note: If you have purchased enterprise version with API, you don’t have to register. API key will be provided to you as part of installation instruction.
2).Post HTTP request to https://api.gceasy.io/analyzeGC?apiKey={API_KEY_SENT_IN_EMAIL}
3).The body of the HTTP request should contain the Garbage collection log that needs to be analyzed.
4).HTTP Response will be sent back in JSON format. JSON has several important stats about the GC log. Primary element to look in the JSON response is: “isProblem“. This element will have value to be “true” if any memory/performance problems has been discovered. “problem” element will contain the detailed description of the memory problem.

CURL command

Assuming your GC log file is located in “./my-app-gc.log,” then CURL command to invoke the API is:


It can’t get any more simpler than that? Isn’t it?

How to invoke Java Garbage Collection log analysis API
3 weeks ago
Hi

The source is: https://www.youtube.com/watch?v=uJLOlCuOR4k&t=26s  and its been allowed to post it under copyright regulations.
3 weeks ago

Java Thread Dump Analyzer,  Troubleshoot JVM crashes, slowdowns, memory leaks, freezes, CPU Spikes
https://community.atlassian.com/t5/Marketplace-Apps-Integrations/How-do-you-analyze-GC-logs-thread-dumps-and-head-dumps/ba-p/985787

3 weeks ago
Based on the JVM version (1.4, 5, 6, 7, 8, 9), JVM vendor (Oracle, IBM, HP, Azul, Android), GC algorithm (Serial, Parallel, CMS, G1, Shenandoah) and few other settings, GC log format changes. Thus, today the world has ended up with several GC log formats.

‘GC Log standardization API’ normalizes GC Logs and provides a standardized JSON format as shown below.

Graphs provided by GCeasy is great, but some engineers would like to study every event of the GC log in detail. This standardized JSON format gives them that flexibility. Besides, that engineers can import this data to Excel or Tableau or any other visualization tool.

How to invoke Garbage Collection log standardization API?

Invoking Garbage Collection log analysis is very simple:

1). Register with us. We will email you the API key. This is a one-time setup process. Note: If you have purchased enterprise version, you don’t have to register. API key will be provided to you as a part of installation instruction.

2). Post HTTP request to http://api.gceasy.io/std-format-api?apiKey={API_KEY_SENT_IN_EMAIL}

3). The body of the HTTP request should contain the Garbage collection log that needs to be analyzed.

4). HTTP Response will be sent back in JSON format. JSON has several important stats about the GC log. Primary element to look in the JSON response is: “isProblem“. This element will have value to be “true” if any memory/performance problems have been discovered. “problem” element will contain the detailed description of the memory problem.

CURL command

Assuming your GC log file is located in “./my-app-gc.log,” then CURL command to invoke the API is:



It can’t get any more simpler than that? Isn’t it?

Note:use the option “–data-binary” in the CURL instead of using “–data” option. In “–data” new line breaks will be not preserved in the request. New Line breaks should be preserved for legitimate parsing.

Other Tools

You can also invoke the API using any web service client tools such as SOAP UI, Postman Browser Plugin,…..

[ATTACH=CONFIG]3624[/ATTACH]
  Fig: POSTing GC logs through PostMan plugin

Sample Response




JSON Response Elements

ElementDescription
gcEventsThis is the top-level root element. It will contain an array of GC events. For every GC event reported in the GC Log, you will see an element in this array.
timeStampTimestamp at which particular GC event ran
gcTypeYOUNG – if it’s a young GC event type. FULL – if it’s full GC event type.
durationInSecsDuration for which GC event ran
reclaimedBytesAmount of bytes reclaimed in this GC event
heapSizeBeforeGCOverall Heap size before this GC event ran
heapSizeAfterGCOverall Heap size after this GC event ran
youngGenSizeBeforeGCYoung Generation size before this GC event ran
youngGenSizeAfterGCYoung Generation size after this GC event ran
oldGenSizeBeforeGCOld Generation size before this GC event ran
oldGenSizeAfterGCOld Generation size after this GC event ran
permGenSizeBeforeGCPerm Generation size before this GC event ran
permGenSizeAfterGCPerm Generation size after this GC event ran
metaSpaceSizeBeforeGCMetaspace size before this GC event ran
metaSpaceSizeAfterGCMetaspace size after this GC event ran
1 month ago
Thus from Java 9, if you launch the application with -XX:+UseConcMarkSweepGC (argument which will activate CMS GC algorithm), you are going to see below WARNING message:



Why CMS is deprecated?
1 month ago

jim terry wrote:Hello Stephen

Some of the GCeasy tutorials are given below:

What is Garbage collection log? How to enable and analyze?
How to enable Java 9 GC Logs?
Key sections on GCeasy Report
OutOfMemoryError
GCeasy Tutorials

1 month ago

In this article, we have attempted to answer most common questions around System.gc() API call. We hope it may be of help.

What is System.gc()?

System.gc() is an API provided in java, Android, C# and other popular languages. When invoked it will make its best effort to clear accumulated unreferenced object (i.e. garbage) from memory.

Who invokes System.gc()?

System.gc() calls can be invoked from various parts of your application stack:

a.Your own application developers might be explicitly calling System.gc() method.
b.Sometimes System.gc() can be triggered by your 3rd party libraries, frameworks, sometimes even your application servers.
c.It could be triggered from external tools (like VisualVM) through use of JMX
d.If your application is using RMI, then RMI invokes System.gc() on a periodic interval.


What are the downsides of invoking System.gc()?

When System.gc() or Runtime.getRuntime().gc() API calls are invoked from your application, stop-the-world Full GC events will be triggered. During stop-the-world full GCs, entire JVM will freeze (i.e. all the customer transaction that are in motion will be paused). Typically, these Full GCs take long duration to complete. Thus, it has potential to result in poor user experiences and your SLAs at unnecessary times when GC aren’t required to be run.

JVM has sophisticated algorithm working all the time in background doing all computations and calculations on when to trigger GC. When you invoke System.gc() call, all those computations will go for toss. What if JVM has triggered GC event just a millisecond back and once again from your application you are going invoking System.gc()? Because from your application you don’t know when GC ran.

Are there any good/valid reasons to invoke System.gc()?

We haven’t encountered that many good reasons to invoke System.gc() from the application. But here is an interesting use case we saw in a major airline’s application. This application uses 1 TB of memory. This application’s Full GC pause time takes around 5 minutes to complete. Yes, don’t get shocked, it’s 5 minutes (but we have seen cases of 23 minutes GC pause time as well). To avoid any customer impact due to this pause time, this airline company has implemented a clever solution. On a nightly basis, they take out one JVM instance at a time from their load balancer pool. Then they explicitly trigger System.gc() call through JMX on that JVM. Once GC event is complete and garbage is evicted from memory, they put back that JVM in to load balancer pool. Through this clever solution they have minimized customer impact caused by this 5 minutes GC pause time.

How to detect whether System.gc() calls are made from your application?

As you can noticed in ‘Who invokes System.gc()?’ section, you can see System.gc() calls to be made from multiple sources and not just from your application source code. Thus searching your application code ‘System.gc()’ string isn’t enough to tell whether your application is making System.gc() calls. Thus it poses a challenge: How to detect whether System.gc() calls are invoked in your entire application stack?

This is where GC logs comes handy. Enable GC Logs in your application. In fact, it’s advisable to keep your GC log enabled all the time in all your production servers, as it helps you to troubleshoot and optimize application performance. Enabling GC logs adds negligible (if at all observable) overhead. Now upload your GC log to the Garbage Collection log analyzer tool like GCeasy, HP JMeter,…. These tools generate rich Garbage collection analysis report.


  Fig: GC Causes reported by GCeasy.io tool

Above figure is an excerpt from the ‘GC Causes’ section of the report generated by GCeasy. You can see that ‘System.gc()’ call to be invoked 304 times accounting for 52.42% of GC pause time.

How to remove System.gc() calls?

You can remove explicit System.gc() call through following solutions:

a. Search & Replace

This might be a traditional method :-), but it works. Search in your application code base for ‘System.gc()’ and ‘Runtime.getRuntime().gc()’. If you see a match, then remove it. This solution will work if ‘System.gc()’ is invoked from your application source code. If ‘System.gc()’ is going to invoked from your 3rd party libraries, frameworks or through external sources then this solution will not work. In such circumstance you can consider using the option outlined in #b.

b. -XX:+DisableExplicitGC

You can forcefully disable System.gc() calls by passing the JVM argument ‘-XX:+DisableExplicitGC‘ when you launch the application. This option will silence all the ‘System.gc()’ calls that is invoked anywhere from your application stack.

c. RMI

If your application is using RMI, then you can control the frequency in which ‘System.gc()’ calls are made. This frequency can be configured using the following JVM arguments when you launch the application:

-Dsun.rmi.dgc.server.gcInterval=n

-Dsun.rmi.dgc.client.gcInterval=n

The default value for these properties in

JDK 1.4.2 and 5.0 is 60000 milliseconds (i.e. 60 seconds)

JDK 6 and later release is 3600000 milliseconds (i.e. 60 minutes)

You might want to set these properties to a very high value so that it can minimize the impact.



1 month ago



Should I be running my application with few instances (i.e. machines) with large memory size or a lot of instances with small memory size? Which strategy is optimal? This question might be confronted often. After building applications for 2 decades, after building JVM performance engineering/troubleshooting tools (GCeasy,fastThread, HeapHero), I still don’t know the right answer to this question. At the same time, I believe there is no binary answer to this question as well. In this article, I would like to share my observations and experiences on this topic.

Two multi-billion dollars enterprises story

Since our JVM performance engineering/troubleshooting tools has been widely used in major enterprises, I had an opportunity to see world-class enterprise applications implementations in action. Recently I had the chance to see two hyper-growth technology companies (If I say their name everyone reading this article will know them). Both companies are headquartered in Silicon Valley. Their business is technology, so they know what they are doing when it comes to engineering. They are wall-street darlings, enjoying great valuations. Their market cap is in the magnitude of several billions of dollars. They are the poster child of modern thriving enterprises. For our conversation let’s call these two enterprises as company-A and company-B.

It immensely surprises me to see how both enterprises has adopted *two extremes* when it comes to memory size. Company-A has set its heap size (i.e. -Xmx) to be 250gb, whereas company-B has set its heap size to be 2gb. i.e. company-A’s heap size is 125 times larger than Company-B’s heap size. Both enterprises are confident about their memory size settings. As they say: ‘Proof is in the pudding’, both enterprises are scaling and handling billions of business-critical transactions.

This is a great experience to see both companies who are into the same business, having more or less same revenue/same market cap, located in the same geographical region, at same point in time adopting two extremes when it comes to memory size. Given this real-life experience, what is the right answer? Large size or small size memory? My conclusion is: You can succeed with either strategy if you have a good team in place.

Large memory size can be expensive

Large memory size with few instances (i.e. machines) tends to be expensive than with small memory size, a greater number of instances. Here is simple math, based on the cost of an AWS EC2 instances in US East (N. Virginia) region:

m4.16xlarge – 256GB RAM – Linux on-demand instance cost: $3.2/hour

T3a small – 2GB RAM – Linux on-demand instance cost: $0.0188/hour

So, to have capacity of 256GB RAM, we would have to get 128 ‘T3a small’ instance (i.e. 128 instances x 2GB = 256GB).

128 x T3a small – 2GB RAM – Linux on-demand instance cost: $2.4064/hour (i.e. 128 x $0.0188/hour)

It means large memory size with few instances costs $0.793/hour (i.e. $3.2 – $2.4064) more than small memory size with a lot of instances. In other words, ‘large memory size with few instances strategy is 33% more expensive.

Of course, another counter-argument can be made is: you might need fewer engineers, less electricity, less real estate if you have a smaller number of machines. Patching, upgrading servers might be easier to do as well.

Business Demands

In some cases, the nature of your business itself dictates the memory size of your application. Here is a real-life incident that we faced: When we built HeapHero (Heap Dump analysis tool), our tool’s memory size had to be larger than heap dump file it parses. Say suppose heap dump file size is 100gb, then HeapHero tool’s memory size must be more than 100gb. There is no choice.

Say suppose you are caching large amount (say 200gb) of data for maximizing your application’s performance, then your heap size must be more than 200gb. You will not have a choice. Thus, in some cases, the business requirement will dictate your memory size.

Performance & Troubleshooting

If your memory size is large, then typically Garbage Collection pause times will also be high. Garbage collection is a process that runs in your application to clean-up unreferenced objects in memory. If your memory size is large, then the amount of garbage in the memory will also be large. Thus, amount of time taken to clean up garbage will also be high. When garbage collection runs, it pauses your application. But there are solutions to this problem:

*You can use pauseless JVM (like ‘Azul’)
*Proper GC tuning needs to be done to reduce pause times

Similarly, if you need to troubleshoot any memory problem, you will have to capture heap dumps from the application. A heap dump is basically a file which contains all information about your application’s memory like what objects were present, what are their references, how much memory each object occupies, …. Heap dumps of large memory size application will also tend to be very large. Analyzing large size heap dumps are difficult as well. Even world’s best heap dump tools like Eclipse MAT, HeapHero have challenges in parsing heap dumps that are more than 100gb. Reproducing these problems in test lab, storing these heap dump files, sharing these heap dump files are all challenges.

Emotions comes first, Rationale next

After reading books like ‘How we decide’ written by Jonah Lehrer – I am fairly convinced that your prior experience, emotions play a key role in deciding your application’s memory size. I used to work for a major financial institution. Chief architect of this financial institution was suggesting us to run our JVMs with very large memory size, rationale he gave was: “We used to run mainframes with very large memory size ”😊.

Conclusion

If you are working for very large corporations, then there is a 99.99% chance that you may not have to say on what should be the memory size for your application. Because that decision has already been made by elites/demi-gods who are sitting on the ivory tower😊. It might be to be hard to reverse or change that decision.

But if you have choice or option to make that decision, your decision for memory size can be most likely be influenced by your prior experience and emotions . But either way, you can’t go wrong (i.e. going with few instances with large memory size or lot of instances with small memory size), provided you have the right team in place.





1 month ago


JVM has provided helpful arguments to deal with OutOfMemoryError. In this article we would like to highlight those JVM arguments. It might come handy for you when you are troubleshooting OutOfMemoryError. Those JVM arguments are:

1. -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath
2. -XX:OnOutOfMemoryError
3. -XX:+ExitOnOutOfMemoryError
4. -XX:+CrashOnOutOfMemoryError


Let’s discuss these JVM arguments in detail in this article.

1. -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath

Heap dump is basically a snapshot of memory. It contains details about objects that present in memory, actual data that is present within those objects, references originating of those objects. Heap dump is a vital artifact to troubleshoot memory problems.

In order to diagnose OutOfMemoryError or any memory related problem, one would have to capture heap dump right at the moment or few moments before the application starts to experience OutOfMemoryError. It’s hard to do capture heap dump at the right moment manually because we will not know when OutOfMemoryError is going to be thrown. However, capturing heap dumps can be automated by passing following JVM arguments when you launch the application in the command line:



Example:


In ‘-XX:HeapDumpPath’ you need to specify the filepath where heap dump should be stored.

When you pass these two JVM arguments, heap dumps will be automatically captured and written to a specified file path, when OutOfMemoryError is thrown.

Once heap dumps are captured, you can use tools like HeapHero, Eclipse MAT to analyze heap dumps.

2. -XX:OnOutOfMemoryError

You can configure JVM to invoke any script when OutOfMemoryError is thrown. Most of the time, OutOfMemoryError doesn’t crash the application. However, it’s better to restart the application, once OutOfMemoryError happens. Because OutOfMemoryError can potentially leave application in an unstable state. Requests served from an unstable application instance can lead to an erroneous result.

Example:


When you pass this argument, JVM will invoke “/scripts/restart-myapp.sh” script whenever OutOfMemoryError is thrown. In this script, you can write code to restart your application gracefully.
3. -XX:+CrashOnOutOfMemoryError

When you pass this argument JVM will exit right when it OutOfMemoryError is thrown. Besides exiting, JVM produces text and binary crash files (if core files are enabled). But personally, I wouldn’t prefer configuring this argument, because we should always aim to achieve graceful exit. Abrupt exit can/will jeopardize transactions that are in motion.

I ran an application which generates OutOfMemoryError with this ‘-XX:+CrashOnOutOfMemoryError’ argument. I could see JVM exiting immediately when OutOfMemoryError was thrown. Below was the message in the standard output stream:



From the message, you could see hs_err_pid file to be generated in ‘C:\workspace\tier1app-svn\trunk\buggyapp\hs_err_pid26064.log’. hs_err_pid file contains information about the crash. You can use tools like fastThread to analyze hs_err_pid file. But most of the time information present in hs_err_pid is very basic. It’s not sufficient enough to troubleshoot OutOfMemoryError.

4. -XX:+ExitOnOutOfMemoryError

When you pass this argument, JVM will exit right when OutOfMemoryError is thrown. You may pass this argument if you would like to terminate the application. But personally, I wouldn’t prefer configuring this argument, because we should always aim to achieve a graceful exit. Abrupt exit can/will jeopardize transactions that are in motion.

I ran the same memory leak program with this ‘-XX:+ExitOnOutOfMemoryError’ JVM argument. Unlike ‘-XX:+CrashOnOutOfMemoryError’, this JVM argument did not generate any text/binary file. JVM just exited.






Troubleshooting OutOfMemoryError or any memory related problem is done manually even in 2019. Troubleshooting and identifying the root cause of OutOfMemoryError can even be automated, by following below mentioned 3 steps:

1.Capture heap dump
2.Restart application
3.Problem Diagnosis

Let’s discuss these steps in detail.

1. Capture heap dump

Heap dump is basically a snapshot of memory. It contains details about objects that present in memory, actual data that is present within those objects, references originating of those objects… Heap dump is a vital artifact to troubleshoot memory problems.

In order to diagnose OutOfMemoryError or any memory related problem, one would have to capture heap dump right at the moment or few moments before the application starts to experience OutOfMemoryError. It’s hard to do capture heap dump at the right moment manually, because we will not know when OutOfMemoryError is going to be thrown. However, capturing heap dumps can be automated by passing following JVM arguments when you launch the application in the command line:



Example:



In ‘-XX:HeapDumpPath’ you need to specify the filepath where heap dump should be stored.

When you pass these two JVM arguments, heap dumps will be automatically captured and written to a specified file path, when OutOfMemoryError is thrown.

2. Restart application

Most of the time, OutOfMemoryError doesn’t crash the application. However, it’s better to restart the application, once OutOfMemoryError is thrown. Because OutOfMemoryError can potentially leave application in an unstable state. Requests served from an unstable application instance can lead to an erroneous result.

You can automate this restart process as well. Write a “restart-myapp.sh” script, which will shutdown and restart your application gracefully. Now specify this script’s path in ‘-XX: OnOutOfMemoryError’ JVM argument.

Example:



When you pass this argument, JVM will invoke “/scripts/restart-myapp.sh” script whenever OutOfMemoryError is thrown. Thus, your application will be automatically restarted right after it experiences OutOfMemoryError.

3. Problem Diagnosis

Now we have captured the heap dump (which is needed to troubleshoot the problem), restarted the application (to reduce the outage impact). Next step is troubleshooting. This can be little tricky 😊, but can be achieved with the right tools. You can use tools like Eclipse MAT, HeapHero to analyze heap dumps. These tools generate good memory analysis report, highlighting the objects that are causing memory leak. However, most organization does this step is manually.

Even this step can be automated by invoking HeapHero REST API. This API analyzes heap dump and returns excellent analysis report. You may invoke this API right after ‘restart-myapp.sh’ script. Thus, you will be able to automate OutOfMemoryError troubleshooting end-end.

Happy Troubleshooting.
2 months ago