Forum:

Performance

Large Memory Requirements

Ranch Hand

Posts: 146

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I have a program written using the Netbeans IDE. It runs fine. However, when I run it outside of the IDE it fails:
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
I have tried using the -Xmx flag to increase the amount of space but setting it at 1.7m isn't enough and more than that and it refuses to start:
Could not reserve enough space for 1740800KB object heap

I have put some feedback in the program so that it tells me what memory it is using:
Before loading data:
Total Memory (in bytes): 403177472 (Runtime.getRuntime().totalMemory())
Free Memory (in bytes): 365469408 (Runtime.getRuntime().freeMemory())
Max Memory (in bytes): 7616856064 (Runtime.getRuntime().maxMemory())
After loading data:
Total Memory (in bytes): 2131755008
Free Memory (in bytes): 787647272
Max Memory (in bytes): 7616856064

My question is :
Why does it run in the Netbeans IDE but not when I run it from the command line (using java -jar program.jar)? And, how do I make it run?

Campbell Ritchie

Marshal

Posts: 79239

377

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Any idea why you need so much memory? What are you loading?
If you look up the java tool (use ctrl‑F‑“Xmx”), you find that 1.7m is 1000× too small for what you want (or even 1000× too large

‍), and the values must be multiples of 1MiB. So 1.7m won't work; maybe you wanted 1.7G.
The figures you gave add up to about 2.9GB in use, not 1.7GB.
If I remember correctly, the amount of memory used for the heap space defaults to 25% of available RAM. What command line arguments did you supply to NetBeans?
I think we need lots more information before we can help, but you probably won't be able to distribute your app with such a problem, and I think you would do well to reduce its memory requirements.

[edit]Does the Xmx option take fractional arguments? The link I gave you only shows integer arguments.

Java 8 (verified skill)
Skill verified by Jeanne Boyarsky

Neil Barton

Ranch Hand

Posts: 146

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thanks for the response. It is a deliberate policy to load the data into memory to make the program run as fast as possible. With 32gb of ram I should be able to do that! I investigated what else was different about the run time environments by putting some debug into the program. This came out like this:
Netbeans Env:
Version:1.8.0_161
java.class.path:F:\Jrun\lib\HikariCP-3.4.2.jar;F:\Javadev\Horsey\Horsey\build\classes;F:\Jrun\lib\mysql-connector-java-8.0.13.jar;F:\Jrun\lib\log4j-1.2.17.jar;F:\Jrun\lib\slf4j-api-1.7.25.jar;F:\Jrun\lib\slf4j-log4j12-1.7.25.jar;F:\Javadev\Horsey\Genie2\build\classes
java.home:C:\Program Files\Java\jdk1.8.0_161\jre
Cmd Env:
Version:1.8.0_241
java.class.path:dist\Genie2.jar
java.home:C:\Program Files (x86)\Java\jre1.8.0_241

My conclusion is that I am running different versions of Java, the CMD one is 32 bit (X86 in the path). I am currently uninstalling those versions and bringing Java completely up to date. Hopefully with only one runnable version (the 64 bit version) the problem will disappear. I am concerned that, once I have updated Java I will be left with a smaller memory size, we'll see.

Stephan van Hulst

Saloon Keeper

Posts: 15524

364

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

There's a difference between using more memory to increase performance, and keeping everything in memory.

You really ought to use memory sparingly, then run a profiler, and then only cache the results of hot paths.

Campbell Ritchie

Marshal

Posts: 79239

377

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I can't see what difference a new version of Java® would make, nor x86 vs x86_64. Your NetBeans ought to be using the same version as your command line however. I can't see that it can do any harm to uninstall an old version. Why have you got so many XXX.jars in your CLASSPATH? Have you set a system CLASSPATH? A system CLASSPATH is usually more trouble than it is worth.
And I think Stephan is correct.

Matthew Bendford

Rancher

Posts: 326

posted 2 years ago

1
Number of slices to send:

Optional 'thank-you' note:

Send

@Neil Your conclusion seems correct: You dev-env is a 64bit version - which can take up to about 16EiB (that's Exibytes - 1 EiB = 1024TiB). The runtime is only a 32bit version - which is limited to 4GiB - and from my experience such old Java runtimes somehow had hard time to even start up with anything higher than -Xmx=1.5G.
Unless you have some specific reason for keep those old versions you really should consider update to at least the latest LTS 11 (or was it some other? idk).

@Campel Yes, -Xmx does take fractions - one can set -Xmx=1.5G - which to my personal experience seems to be some hard limit in a 32bit VM - at least I was never able to start up a VM when setting it to any high value. So, although a 32bit VM should be able to max out at 4GiB it actually does so at around 1.5GiB - for whatever reason I'm not really sure of. I'm also not aware if this is some issue with those old 1.8 versions and maybe got improved in later releases - but as to my kowledge with Java11 or so the 32bit support was dropped alltogether so only 64bit runtimes are even provided since - but I might be wrong at this.

fred rosenberger

lowercase baba

Posts: 13089

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Neil Barton wrote:It is a deliberate policy to load the data into memory to make the program run as fast as possible.

Do you KNOW this makes it run faster? I have no idea what your program is doing, but if it takes it 10 seconds to load all that data into memory, but then only saves .5 seconds during the execution, it's a net loss.

Never assume that doing X will make your program automatically run faster. Most of the time, you'll be wrong.

Maybe you have done the work, and loading it all into memory does save time, but if you haven't, i'd suggest putting in the work to prove it. And this assumes that time really, truly does matter. I've seen teams spend weeks working on performance improvements that save the user about a tenth of a second...which they never notice. So all that time, effort, and money for something that in the end didn't really matter.

There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors

Tim Holloway

Saloon Keeper

Posts: 27807

196

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I can give you a WONDERFUL example (from experience) where loading everything into memory definitely did NOT make things "as fast as possible".

It was an application using a very large sparse array written in BASIC and running under OS/2. The app was doing random access to array elements.

OS/2, like most modern OS's ran by default using virtual memory. And because this array was SO very large, even though so little of it actually contained data, that meant that virtually every time an array element was read or written, a page fault would occur. Something would have to give up memory and get written to the pagefile and the page containing the desired element had to be read from the pagefile.

I don't know what modern performance is, but back then, the rule of thumb was that disk I/O was about 1000 times slower than RAM access.

It CRAWLED. In fact, about the only thing I know of that did worse was a cross-product outer join on FoxPro with sorted results done over a LAN and that one we killed after 2 days.

There is no one-size-fits-all solution. Sometimes a bubble sort will outdo a QuickSort. It all depends on the data and how you manipulate it.

The secret of how to be miserable is to constantly expect things are going to happen the way that they are "supposed" to happen.

You can have faith, which carries the understanding that you may be disappointed. Then there's being a willfully-blind idiot, which virtually guarantees it.

Paul Clapham

Marshal

Posts: 28226

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Matthew Bendford wrote:So, although a 32bit VM should be able to max out at 4GiB it actually does so at around 1.5GiB - for whatever reason I'm not really sure of.

That corresponds to my experience from last year when I finally upgraded from Java 8. I was using a 32-bit version of Java 8 and it wouldn't accept even "-Xmx=2G" to specify the heap size. But when I upgraded to 64-bit versions of Java 11 and then onwards, I could specify much, much larger heap sizes with no problem -- until the heap size exceeded the maximum virtual memory swapping file's capacity.

Java 8 (verified skill)
Skill verified by Paul Clapham

Ron McLeod

Marshal

Posts: 4510

572

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

According to this SAP blog post, the maximum heap size varies by operating system.

Jesse Silverman

Bartender

Posts: 1737

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

The operating system always takes its cut of the 32-bit memory space. You can never have all of it allocated to the user.

There was a "Large Address Aware" flag that you would have to link with to get anywhere close to what you thought you were going to get on Windows in 32-bit mode.
Doing so took memory away from the OS that it normally would have to work with and could easily have serious negative performance implications, but when you really needed it, you really needed it.

Of course, on real-world programs, you would wind up dying from fatal heap fragmentation far before you got to that limit, there were (are) programs like MicroQuill's Smartheap to stave that off.

What a flashback to those days...it sent a chill up my spine.

I agree with the people who feel that saying "Good, now I have 64-bits, I can basically get along with using nearly infinite memory" is a very dangerous mindset to stray into.
If you have some dedicated box running just your application, you can possibly get away with using almost all the physical memory.
In most real-world situations, the performance implications are huge, I've seen memory leaks on 64-bit Windows programs blow up the memory footprint to 80GB, and a run that should take less than an hour would take days, killing the disk the whole time as well.

RTFJD (the JavaDocs are your friends!) If you haven't read them in a long time, then RRTFJD (they might have changed!)

Tim Holloway

Saloon Keeper

Posts: 27807

196

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

There are 3 "virtual memory" schemes that I know of.

The first, and most primitive was employed in early time-sharing systems and required no page translation hardware. In this scheme, the OS and its data were permanently resident and each user had a workspace (often a whopping 4K or RAM). Since in time-sharing, the user spends a lot of time thinking ("think time"), the idle memory would be swapped out in its entirety and another user's workspace would swap in. Obviously, as user RAM grew, this would take progressively longer and delays would become noticeable, so it was most popular on minicomputer systems. So it was virtual smoke and mirrors rather than "true" virtual memory.

The second scheme was to provide multiple virtual address spaces. Here, each process would have a complete virtual address space with addresses running from 0 to whatever the system supported. Address spaces could also swap, but generally there were one or more shared memory areas including key non-pageable stuff like OS components. Rather than swap the entire address space each time you did a process switch, however, the address spaces all shared pagetable space. This scheme was what IBM used in their circa 1970's era mainframe OS's. Since physical addresses were limited to 24 bits, this allowed larger per-process (user) address spaces. Since certain systems, such as CICS needed to be able to talk to other processes in other address spaces, the instruction sets for those machines included cross-memory data movement and procedure calling.

IBM also had "transient areas" where less-frequently-used OS modules could swap in. The one I remember most fondly were the open/close service modules. Their DOS/VS OS had 2 transient areas - A and B and each transient module was limited to one or the other. The B transients had higher-level services in them and the A-transients did low-level stuff. The transient areas were in real memory, so the swapping was explicitly done under OS control. Most OS code and data was in low RAM and nonpageable, but OS/MVS had a "link pack" area which was high memory shared by all address spaces.

Fianlly, there's the Big Shared Address Space model. In this architecture, only one address space is used, but it's large enough to hold all processes without the overhead of an address space switch (although regular paging still applies). I'm not 100% sure, but I think that Linux runs on this model. MS Windows is even more puzzling, since in Windows 95, the windowing system ran in its own address space, but command shells were actually DOS VMs, each with their own address space. So one not-so-virtual machine for Windows apps, and multiple DOS VMs, one per command window. Windows NT and its descendents were on a different scheme where VM was actually integrated into the OS instead of bolted on, but I can't recall what model it uses.

Jesse Silverman

Bartender

Posts: 1737

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Such mixed feelings reading that stuff. I am simultaneously impressed you remember that level of detail about the historical stuff and thinking how much I try to focus my brain cells on stuff I still may use. [note: if you actually remember enough z/os, ispf, cics- and tso-related stuff to code things effectively NOW, there are still jobs they are always trying to fill]

The excellent "Windows Internals" series covers all modern Windows versions very well.
One interesting cross-over tidbit, is that programmers kept shooting themselves in the groin repeatedly in the following way.

When there were only 24-significant address bits on the 360 (and the Atari ST and the Amiga for that matter) but 32-bit address registers, many "clever" programmers decided not to waste them, and stored all kinds of stuff in them unused bits!! Presumably not the same programmers, since there wasn't that much overlap -- present company excepted...Of course, when the 370 came out with 31-bit addressing, all those programs instantly crashed. Same thing when the "clever" programs got run on a true 32-bit address 68020-based Amiga/Atari ST. Ugh!

To head this problem off at the pass, on 64-bit Windows (I remember for sure) and probably most of the others, there is a required canonical form of the addresses, where all the "unused" bits of the 64-bit but really 43- or 45- or 48- or whatever bits getting actually used must all be identically 0 or 1 (I think following the top bit really in use, like sign extension)...if not, you will crash NOW, instead of 8 or 11 years from now when those bits start actually meaning addresses....

Don't know how useful that is for Java performance, but the "Large Memory Requirements" thread had already gone off the rails to "32-bit vs. 64-bit, and various implementations of virtual memory across hardware and OS flavors".

RTFJD (the JavaDocs are your friends!) If you haven't read them in a long time, then RRTFJD (they might have changed!)

Tim Holloway

Saloon Keeper

Posts: 27807

196

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

There is an emulator (Project Hercules) that can pretend to be virtually any IBM System/360-370, etc. up to the latest zSeries. So if you really want to run any of those old OS's, it takes about 15 minutes to set up and use. Generally at a higher rate of speed than the originals ran. But then, an Apple Watch runs faster than those old mainframes.

The more important thing is that old ideas often get recycled, so they're worth remembering.

The IBM machines had 24-bit addresses, but unless I've gone completely senile, the Motorola MC68000-series systems used at least 31 bits. I've got an extensive Amiga collection (partly because I peddled Amiga software development tools), and I think I'd remember that. Of course I could just crack open one of my old books...

IBM stored 8 bits of the machine's Program Status Word (PSW) as the first byte of a saved address when you executed a subroutine call instruction (Branch-and-Link), so even when hardware address buses got bigger, software was hobbled and they ended up having to kludge around that. That wasn't really my problem, though, since I left that line of work just before XA came to town.

One thing I'm thankful to have missed (mostly) was segmented-mode addressing. That was Intel's way of dealing with physical (and later virtual) memory spaces greater than 1MB. The closest I came was the old mainframe base registers, which only covered 4KB.

Jesse Silverman

Bartender

Posts: 1737

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Atari's interface to physical memory was 24-bits only, so altho all the A0 thru A7 registers of course had 32 bits, only 24 (at most) meant anything about an actual address.
Some programmers got clever there and used the other 8 for other stuff.

I thought the Amiga was similar, but it doesn't matter that much either way, the important thing was that converting our IBM MF code from 24 bits to 31 bits clean was a nightmare for reasons similar to what I saw on the Atari ST platform.

Yes, the segmented memory on the 8086/80286 was horrific, the compilers supported all kinds of options for different sizes of different data and pointers, normal, huge, aggravated, aggrandized, ugh -- add on top of that the problem that many people still considered prototypes optional, so instead of compile errors you would get runtime crashes, of the entire computer if you were in real mode, of your "DOS window" if you were on OS/2.

I thought the biggest mistake was targeting OS/2 to be the last crappy 16-bit OS to work on the 286 memory model and ISA, they went to pure 386 later with Warp but by then it was too late.
Linux lucked out by going 386-only and skipping all that garbage, you can still see Tanenbaum complaining that Linus forsook the 286 way back then, but it was a horrible model to try to code for.

I always said that even the crappiest Atari ST you got for $149 at Toys R' Us gave you true 32-bit programming, the exception was that the memory controller was 24-bits, and other limitations kept most machines down to 4MB, but all your code was nice flat 32-bit everywhere....people thought that would bloat code, it didn't, because you had relative branches and jumps and what-not for when things actually were close by, which they often were.

Some of this might be mildly helpful working on embedded systems, but it is hard to get those jobs unless you have recent work experience on current platforms and RTOS's, so....meh!

RTFJD (the JavaDocs are your friends!) If you haven't read them in a long time, then RRTFJD (they might have changed!)

Tim Holloway

Saloon Keeper

Posts: 27807

196

I like...

posted 2 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Actually, I do most of my work these days on AVR systems, so real-time isn't that hard to get into. Granted, most of the Arduino-like systems are hardly running what you'd call an "operating system", and anything that I'd need real OS-like services I just outsource to a Raspberry Pi, but still...

The Amiga was a curious beastie. The OS was developed on a cross-compiled Unix system running Green Hills C. Except for the "DOS" parts which were written in BCPL - which addressed by words, not bytes. And the core Exec, which was written in object-oriented assembly language. Green Hills, and later Lattice/SAS C (and thus my C++ system) ran the CPU as a full-fledged 32-bit system. The other popular C compiler, however, came over from the Macintosh world and the Mac operated the MC68000 as a 16-bit system. Note that here "bits" refers to (sizeof int), as the memory model was always flat 32 bits with no other (shudder) options.

The Amiga was the first PC that really acted like a grown-up computer. You played by the OS rules or you regretted it. The kind of hacker tricks that the pre-multi-tasking systems used could easily take down an Amiga, especially since no MMU wasn't available for the original processors. And incidentally, the Amiga OS was in fact a true Real-Time Operating System. Linux only operates as a true RTOS if you do a custom build. The difference between multi-tasking and true RTOS is in latency, and the multi-level interrupt stack makes all the difference. Linux, so I understand blocks all other interrupts when an interrupt service routine is running. In Amiga's Exec, there were priorities where, for example, video ISRs could interrupt audio ISRs.

The best thing about running a VM-based language like Java, therefore is that all of these addressability horrors are safely hidden out of sight to the application programmer and the only effect any of them have is on the allocable amount of JVM memory.

The main point - returning to the original question - is don't mindlessly throw hardware at all your problems. There are plenty of other options.

Neil Barton

Ranch Hand

Posts: 146

posted 2 years ago

2
Number of slices to send:

Optional 'thank-you' note:

Send

Thanks Fred, I seem to have started an interesting debate with my 'off the cuff' statement about running my program in memory. It wasn't all that glib, the program originally worked through a disk file, took about an hour. Now I run it all in memory it takes about 1.5minutes (there are some other tweaks that speeded it up as well). The machine is totally dedicated to this program and does nothing else (it can't the processor is at 100% most of the time) so it doesn't have to co-exist with anything else. I agree, with all the comments on here, just assuming that it will run faster in memory can be something that catches you out later. My approach is to start with the simple approach (using the disk and a single process) and then try to speed it up with running in memory and multi threading. It still goes wrong sometimes though!
It's good to read the comments from everyone else and I'm pleased to see that programmers still care about not killing the machine they are delivering to. A lot of programs these days seem to take oodles of resources to do not very much.
You'll be pleased to know that after sorting out the different versions of Java my program is running at full throttle, just a few more tweaks...