File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Memory-mapped file: fragmentation? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Memory-mapped file: fragmentation?" Watch "Memory-mapped file: fragmentation?" New topic
Author

Memory-mapped file: fragmentation?

surlac surlacovich
Ranch Hand

Joined: Mar 12, 2013
Posts: 296

Hello Folks!
Just wanted to know what use cases can be of MMF and stumbled upon article, which says that if one use small enough files it will waste RAM, because of page size alignment (4KB), so if you allocate 5KB it will reserver 8KB and will waste 3KB.
The questions is: should we even concern about it when we use MMP on Java? Is the defragmentation feature provided by JVM?
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Hi,

in my opinion there is no need at all to use memory-mapped files when you only have to process relatively small files. The reason to use memory-mapped files is to speed up file access or to access large files as if they were in-memory data. But even with small files the waste of memory is only big in contrast to the (small) file size. If you don't process thousands of these files simultaneously the wasted few KB of RAM won't be a big problem on most modern systems. Of course it's more of a problem on systems with only small RAM size.

Actually I don't know how memory-mapped files are implemented by the JVM. I guess it depends on the underlying operating system. On systems which support memory-mapped files natively the JVM will probably just use this feature from the OS. If it would be implemented in the JVM for some systems it would probably work similar because it's usually more efficient to manage memory in whole blocks instead of single bytes which will inevitably lead to some wasted memory under certain conditions. After all the wasted memory is just a trade-off between the memory consumed and performance. Memory-mapped files are used to speed things up at the cost of some wasted memory.

Marco
surlac surlacovich
Ranch Hand

Joined: Mar 12, 2013
Posts: 296

Awesome! Thanks Marco.
Can MMP be an alternative to any IPC within one computer? Because it looks like it's very fast and the file can be shared between different processes.
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

Depending on your requirements memory-mapped files probably could be an alternative for IPC on a single node. But in my opinion there are many other considerably more elegant solutions for IPC in the Java world. Of course it depends on your needs what would be a really good solution.

Marco
surlac surlacovich
Ranch Hand

Joined: Mar 12, 2013
Posts: 296

Marco Ehrentreich wrote:it depends on your needs what would be a really good solution.

I would say performance is the only need in this situation.

Marco Ehrentreich wrote:But in my opinion there are many other considerably more elegant solutions for IPC in the Java world.

Do you mean message-based IPC, It should introduce some overhead, isn't it?
Marco Ehrentreich
best scout
Bartender

Joined: Mar 07, 2007
Posts: 1282

I would say performance is the only need in this situation.

Another advantage of memory-mapped files would surely be that it allows to access a very large chunk of data.

Do you mean message-based IPC, It should introduce some overhead, isn't it?

Yes, a solution based on messaging comes to mind (of course, only if it fits your needs). Some kind of MOM (message-oriented middleware) is optimized for high volumes of messages (up to millions or billions per second) and popular products like ActiveMQ allow for a lot of optimizations regarding performance, reliability, availability, guaranteed delivery etc. Additionally these tools are often capable of running in a distributed mode in a cluster of message brokers. This enables high availability and allows you to scale horizontally, i.e. to simply add more nodes to speed up performance when necessary. Of course messaging isn't a perfect fit for all situations. For example you won't use messages to share gigabytes of data in a single message. Another thing to consider is that messaging is inherently asynchronous and your application has to be designed accordingly which may or may not be an option but at least requires a different way of thinking about design.
A nice thing about ActiveMQ is that you can easily use it in embedded mode inside your application as long as you don't really need a separate message broker or a cluster of brokers.

Regarding overhead I don't see real disadvantages for a messaging-based solution. As I said these tools are highly optimized for speed and throughput and can be further tuned per configuration. If you'd implement your own kind of IPC with memory-mapped files you'd still have to read and write from or to it and manage how to access the file concurrently from different processes. This is surely no easy task and will bring some overhead, too. That doesn't mean that it is not possible to outperform existing and proved solutions with a custom solution but it may not be that easy to do and you should ask yourself if you think you can achieve this and if it's worth the trouble.

That said, I want to add that messaging of course is not the only solution to integrate different applications/processes. You should choose well depending on your needs and requirements!

Marco
surlac surlacovich
Ranch Hand

Joined: Mar 12, 2013
Posts: 296

Thanks a lot Marco for your very explanatory answer. I will consider MOM if I'll need to integrate different subsystems (not Java-based maybe).
Marco Ehrentreich wrote: ...manage how to access the file concurrently from different processes.

Just a quick note, that it can be solved with NIO FileLock, which can lock only subset of bytes in a file, which is good way to sync multiple processes.
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: Memory-mapped file: fragmentation?