wood burning stoves 2.0*
The moose likes Java in General and the fly likes Garbage Collection and Virtual Memory Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Garbage Collection and Virtual Memory" Watch "Garbage Collection and Virtual Memory" New topic
Author

Garbage Collection and Virtual Memory

David Kwok
Greenhorn

Joined: Dec 28, 2011
Posts: 5
Dear Developers,

I have this curious question about the working memory set along with Java GC. Given a program running on explicit memory management, the active working memory set is often associated with which part of the application access, the more areas that is accessed, the larger the set.

Now for Java, because of the automatic memory management along with GC, it becomes even if the application is idle, GC still functions, accessing memory performing tracing and so forth that is relevant to the GC process.

Will anyone here agree that such activities is counter productive to VMM where idle pages can be paged out to the disk for other active processes in a memory constraint situation ?

For long standing java application that occupies a large amount of memory, the working set will be large even if the application is waiting for input in an idle state and have a large data structure in the memory.

Generally if the program is written in C, the active working set will be small, but Java will be large because periodically the GC will be accessing live objects.

Is my understanding incorrect, or the implementation of JVM is coping with this using some techniques ? Please kindly advice.


Thanks
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19064
    
  40

David Kwok wrote:I have this curious question about the working memory set along with Java GC. Given a program running on explicit memory management, the active working memory set is often associated with which part of the application access, the more areas that is accessed, the larger the set.

Now for Java, because of the automatic memory management along with GC, it becomes even if the application is idle, GC still functions, accessing memory performing tracing and so forth that is relevant to the GC process.

Will anyone here agree that such activities is counter productive to VMM where idle pages can be paged out to the disk for other active processes in a memory constraint situation ?

For long standing java application that occupies a large amount of memory, the working set will be large even if the application is waiting for input in an idle state and have a large data structure in the memory.

Generally if the program is written in C, the active working set will be small, but Java will be large because periodically the GC will be accessing live objects.

Is my understanding incorrect, or the implementation of JVM is coping with this using some techniques ? Please kindly advice.



First, Java memory management, via the GC, sits on top of malloc()/free(), as the JVM is written in C/C++. So, they are very similar -- arguably.

You can argue that Java memory footprint is larger, on average, because you need to wait for the next GC cycle before it will be collected. However, I don't understand your point about using more memory. An algorithm in one language should be comparable to the same algorithm in any other language. What would make a program coded in Java, more likely to use memory -- and I don't mean uncollected garbage memory?

IMO, as long as the JVM is configured correctly, via -xms and -xmx, it should work very well.

Henry

Books: Java Threads, 3rd Edition, Jini in a Nutshell, and Java Gems (contributor)
David Kwok
Greenhorn

Joined: Dec 28, 2011
Posts: 5
Dear Henry,

I suppose you could have misunderstood, or perhaps I didn't state clearly. The active working set memory is not exactly the amount of memory required by the application. If you refer to the vmstat utility found in most unixes, with the manual found at http://linuxcommand.org/man_pages/vmstat8.html, there is one of the information "active memory" that refers to the number of pages of memory that is within the virtual memory allocated to the application process that is active since the last scanned.

I was working on HP-UX when looking at memory usage pattern of a java process. It comes to my attention that if you will to start a C program that malloc 2GB of memory and filled with data and then goes idle for 10mins either waiting for an I/O or just plain sleeping, the active memory will be small. Basically the memory is allocated but not accessed. Should there be another process that requires more memory and the memory is already pretty much utilized, those idle memory pages will be evict into the disk vis the VMM.

Now should this program is written is Java and you have 2GB of live objects, after some time, they will be found in the tenure memory. GC will still periodically bring them up because it needs to mark and sweep those objects that have been dereference. The action of mark and sweep would means reading the objects and hence bring up the memory pages back into memory, hence increasing the active memory of the system.

This is what I'm concern about. In a system with perhaps only 3GB and 2 process running on 2GB each, this means they will be swapping each other out unnecessarily even if both are idling and both are written using Java, because the GC will be actively working. Is this not true ?

If both are C programs, they will be quietly sitting there with no access to those idling memory pages that are swapped out into the disk.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19064
    
  40

David Kwok wrote:
I suppose you could have misunderstood, or perhaps I didn't state clearly. The active working set memory is not exactly the amount of memory required by the application. If you refer to the vmstat utility found in most unixes, with the manual found at http://linuxcommand.org/man_pages/vmstat8.html, there is one of the information "active memory" that refers to the number of pages of memory that is within the virtual memory allocated to the application process that is active since the last scanned.

I was working on HP-UX when looking at memory usage pattern of a java process. It comes to my attention that if you will to start a C program that malloc 2GB of memory and filled with data and then goes idle for 10mins either waiting for an I/O or just plain sleeping, the active memory will be small. Basically the memory is allocated but not accessed. Should there be another process that requires more memory and the memory is already pretty much utilized, those idle memory pages will be evict into the disk vis the VMM.

Now should this program is written is Java and you have 2GB of live objects, after some time, they will be found in the tenure memory. GC will still periodically bring them up because it needs to mark and sweep those objects that have been dereference. The action of mark and sweep would means reading the objects and hence bring up the memory pages back into memory, hence increasing the active memory of the system.

This is what I'm concern about. In a system with perhaps only 3GB and 2 process running on 2GB each, this means they will be swapping each other out unnecessarily even if both are idling and both are written using Java, because the GC will be actively working. Is this not true ?

If both are C programs, they will be quietly sitting there with no access to those idling memory pages that are swapped out into the disk.



The Sun/Oracle JVM does not do GC for no reason. An idle application means an idle GC too. As for other JVMs, the only one that I know that will run GC, based on time, is the Azul JVM -- however, with that JVM, you'll likely run on specialize hardware, or with a VM hypervisor (VmWare, which will be managing the memory).

Henry

David Kwok
Greenhorn

Joined: Dec 28, 2011
Posts: 5
Henry Wong wrote:
David Kwok wrote:
I suppose you could have misunderstood, or perhaps I didn't state clearly. The active working set memory is not exactly the amount of memory required by the application. If you refer to the vmstat utility found in most unixes, with the manual found at http://linuxcommand.org/man_pages/vmstat8.html, there is one of the information "active memory" that refers to the number of pages of memory that is within the virtual memory allocated to the application process that is active since the last scanned.

I was working on HP-UX when looking at memory usage pattern of a java process. It comes to my attention that if you will to start a C program that malloc 2GB of memory and filled with data and then goes idle for 10mins either waiting for an I/O or just plain sleeping, the active memory will be small. Basically the memory is allocated but not accessed. Should there be another process that requires more memory and the memory is already pretty much utilized, those idle memory pages will be evict into the disk vis the VMM.

Now should this program is written is Java and you have 2GB of live objects, after some time, they will be found in the tenure memory. GC will still periodically bring them up because it needs to mark and sweep those objects that have been dereference. The action of mark and sweep would means reading the objects and hence bring up the memory pages back into memory, hence increasing the active memory of the system.

This is what I'm concern about. In a system with perhaps only 3GB and 2 process running on 2GB each, this means they will be swapping each other out unnecessarily even if both are idling and both are written using Java, because the GC will be actively working. Is this not true ?

If both are C programs, they will be quietly sitting there with no access to those idling memory pages that are swapped out into the disk.



The Sun/Oracle JVM does not do GC for no reason. An idle application means an idle GC too. As for other JVMs, the only one that I know that will run GC, based on time, is the Azul JVM -- however, with that JVM, you'll likely run on specialize hardware, or with a VM hypervisor.

Henry



Okay, using your information that GC will not take place when the application is idle. Suppose it's not idle, it's working on another part of the application and allocation happens. Eventually it comes to a point where GC is initiated, would it therefore trace the earlier 2GB data structure to look for live objects ? It might seems hypothetical meanwhile, but unless we as a programmer work like a system tracing the system 24 by 7, it might happens under our hood without us knowing it, at least not unless we are dumping debug logs on the GC all the time.

Activating the GC would means the earlier 2GB data structure where partial of its memory pages would have been swap out to the disk will need to be swap in for the GC to scan through even if the allocation is not concern it. Am I right ?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19064
    
  40

David Kwok wrote:Okay, using your information that GC will not take place when the application is idle. Suppose it's not idle, it's working on another part of the application and allocation happens. Eventually it comes to a point where GC is initiated, would it therefore trace the earlier 2GB data structure to look for live objects ? It might seems hypothetical meanwhile, but unless we as a programmer work like a system tracing the system 24 by 7, it might happens under our hood without us knowing it, at least not unless we are dumping debug logs on the GC all the time.

Activating the GC would means the earlier 2GB data structure where partial of its memory pages would have been swap out to the disk will need to be swap in for the GC to scan through even if the allocation is not concern it. Am I right ?


With the Sun/Oracle JVM, if you do an allocation, and memory isn't available, it will trigger a GC -- initially only a new generation GC, but it may also trigger a tenure generation GC too, if more memory is needed.

So, to answer your question, if you paint yourself into a corner, and then need to get out of that corner, you are in trouble right? Yes. Absolutely.

Henry

David Kwok
Greenhorn

Joined: Dec 28, 2011
Posts: 5
Henry Wong wrote:
David Kwok wrote:Okay, using your information that GC will not take place when the application is idle. Suppose it's not idle, it's working on another part of the application and allocation happens. Eventually it comes to a point where GC is initiated, would it therefore trace the earlier 2GB data structure to look for live objects ? It might seems hypothetical meanwhile, but unless we as a programmer work like a system tracing the system 24 by 7, it might happens under our hood without us knowing it, at least not unless we are dumping debug logs on the GC all the time.

Activating the GC would means the earlier 2GB data structure where partial of its memory pages would have been swap out to the disk will need to be swap in for the GC to scan through even if the allocation is not concern it. Am I right ?


With the Sun/Oracle JVM, if you do an allocation, and memory isn't available, it will trigger a GC -- initially only a new generation GC, but it may also trigger a tenure generation GC too, if more memory is needed.

So, to answer your question, if you paint yourself into a corner, and then need to get out of that corner, you are in trouble right? Yes. Absolutely.

Henry



Well I wouldn't say I'm trying to make the situation looks bad enough for a dive and then expect miracle out from it. I'm trying to understand if my assumption is correct. I would like to understand more about how GC interacts with the Virtual Memory and what kind of impact are we looking it. Sometimes customer do not understand memory management as well as developers do and it takes tons of effort just to get a simple concept across such as if the available memory runs low in a Linux system, it's perfectly fine because unless we see paging happening all the time, it's just the file and kernel cache occupying it for good performance. The customer just wanna see 60% utilization across the board and in my opinion is plain silly to put in the memory stick and only use 60% of it.

I'm trying to look at how to account for and address memory utilization in a large Java web application. Memory are utilized all over the places for requests and response, for internal object caching, for data abstraction layer and so forth, but outside the JVM, we see just resident memory and virtual memory and other information such as active memory. I would like to have a clearer picture on what can be done and how it can be done to optimize a web application. Sometimes simple options like use less memory or reuse objects are not entirely feasible, for they make sense, but when you are managing a project done across multiple developers, which may or may not be top-notch, we have to also look at another area on how sub-optimal optimization can be done to savage situation.

I need to find out how much I can squeeze out from the GC behavior. Is there anything that is not done and can be done. So back to the earlier topic, so it seems I'm correct that among all other possible usage of memory, GC does have impact on the ACTIVE working memory set of a typical Java application. In some ways it's, counter effective to the VMM because while VMM wants to page out least recently used memory pages to the disk, the GC attempting to free up memory can make matter worse by bring them back into the active memory, thereby forcing the VMM to page out other parts of the memory. Doing so will add on unnecessary I/O to the disk.

As I'm not a guru when coming to the innards of the JVM and GC, I would hope to know if my understanding is sound.
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19064
    
  40

David Kwok wrote:Well I wouldn't say I'm trying to make the situation looks bad enough for a dive and then expect miracle out from it. I'm trying to understand if my assumption is correct. I would like to understand more about how GC interacts with the Virtual Memory and what kind of impact are we looking it. Sometimes customer do not understand memory management as well as developers do and it takes tons of effort just to get a simple concept across such as if the available memory runs low in a Linux system, it's perfectly fine because unless we see paging happening all the time, it's just the file and kernel cache occupying it for good performance. The customer just wanna see 60% utilization across the board and in my opinion is plain silly to put in the memory stick and only use 60% of it.

I'm trying to look at how to account for and address memory utilization in a large Java web application. Memory are utilized all over the places for requests and response, for internal object caching, for data abstraction layer and so forth, but outside the JVM, we see just resident memory and virtual memory and other information such as active memory. I would like to have a clearer picture on what can be done and how it can be done to optimize a web application. Sometimes simple options like use less memory or reuse objects are not entirely feasible, for they make sense, but when you are managing a project done across multiple developers, which may or may not be top-notch, we have to also look at another area on how sub-optimal optimization can be done to savage situation.

I need to find out how much I can squeeze out from the GC behavior. Is there anything that is not done and can be done. So back to the earlier topic, so it seems I'm correct that among all other possible usage of memory, GC does have impact on the ACTIVE working memory set of a typical Java application. In some ways it's, counter effective to the VMM because while VMM wants to page out least recently used memory pages to the disk, the GC attempting to free up memory can make matter worse by bring them back into the active memory, thereby forcing the VMM to page out other parts of the memory. Doing so will add on unnecessary I/O to the disk.

As I'm not a guru when coming to the innards of the JVM and GC, I would hope to know if my understanding is sound.



This is opinion only. If others have a different opinion, please feel free to add to the debate.

In my opinion, which is formed after many years of doing nothing but optimizing GC issues, I do not believe that there is any way to safely swap a JVM. A JVM should never be swapped. Period. I am pretty sure that others will discuss to help you find the safe point, but IMO.... there is no safe point. Never swap out a JVM. Period.

A GC cycle, which takes less than a second, can take minutes when using the disk. This will break the application, via network timeouts and other stuff. Never swap out a JVM. Period.

Henry
David Kwok
Greenhorn

Joined: Dec 28, 2011
Posts: 5
Henry Wong wrote:
David Kwok wrote:Well I wouldn't say I'm trying to make the situation looks bad enough for a dive and then expect miracle out from it. I'm trying to understand if my assumption is correct. I would like to understand more about how GC interacts with the Virtual Memory and what kind of impact are we looking it. Sometimes customer do not understand memory management as well as developers do and it takes tons of effort just to get a simple concept across such as if the available memory runs low in a Linux system, it's perfectly fine because unless we see paging happening all the time, it's just the file and kernel cache occupying it for good performance. The customer just wanna see 60% utilization across the board and in my opinion is plain silly to put in the memory stick and only use 60% of it.

I'm trying to look at how to account for and address memory utilization in a large Java web application. Memory are utilized all over the places for requests and response, for internal object caching, for data abstraction layer and so forth, but outside the JVM, we see just resident memory and virtual memory and other information such as active memory. I would like to have a clearer picture on what can be done and how it can be done to optimize a web application. Sometimes simple options like use less memory or reuse objects are not entirely feasible, for they make sense, but when you are managing a project done across multiple developers, which may or may not be top-notch, we have to also look at another area on how sub-optimal optimization can be done to savage situation.

I need to find out how much I can squeeze out from the GC behavior. Is there anything that is not done and can be done. So back to the earlier topic, so it seems I'm correct that among all other possible usage of memory, GC does have impact on the ACTIVE working memory set of a typical Java application. In some ways it's, counter effective to the VMM because while VMM wants to page out least recently used memory pages to the disk, the GC attempting to free up memory can make matter worse by bring them back into the active memory, thereby forcing the VMM to page out other parts of the memory. Doing so will add on unnecessary I/O to the disk.

As I'm not a guru when coming to the innards of the JVM and GC, I would hope to know if my understanding is sound.



This is opinion only. If others have a different opinion, please feel free to add to the debate.

In my opinion, which is formed after many years of doing nothing but optimizing GC issues, I do not believe that there is any way to safely swap a JVM. A JVM should never be swapped. Period. I am pretty sure that others will discuss to help you find the safe point, but IMO.... there is no safe point. Never swap out a JVM. Period.

A GC cycle, which takes less than a second, can take minutes when using the disk. This will break the application, via network timeouts and other stuff. Never swap out a JVM. Period.

Henry


Okay sounds like a good solution, how do you not swap out a process in Linux ? Is there any approach that you have taken to advice the kernel not to swap out the JVM ? Well if I will to provide more than enough memory to the system, it would be a general idea, but if I want to be certain swapping is out of question for the kernel, how can it be done. Advice please ?
Henry Wong
author
Sheriff

Joined: Sep 28, 2004
Posts: 19064
    
  40

David Kwok wrote:Okay sounds like a good solution, how do you not swap out a process in Linux ? Is there any approach that you have taken to advice the kernel not to swap out the JVM ? Well if I will to provide more than enough memory to the system, it would be a general idea, but if I want to be certain swapping is out of question for the kernel, how can it be done. Advice please ?



The easiest way is to probably disable it altogether. Or as you mentioned, to add a boatload of memory into the system.

As for how to disable swap for a particular process, you will have to look into it for you version of linux. I don't think that there is a technique that will work for all versions of Linux. Not that I know about anyway.

Henry
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Garbage Collection and Virtual Memory