This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Since objs is a local variable, that really seems impossible. I think the most likely scenario is that you're accidentally executing an older version of the code, and the line number is incorrect. Otherwise, it seems like it would have to be a JVM bug of some sort. You could try looking for a bug report, but I suspect there are many NullPointerException reports that are not related to this problem.
It may be possible to put in a workaround, catching the NPE, logging it, and retrying the failed code - but I'd be very reluctant to do something like that without understanding what the problem really is.
What kind of heavy stress are you dealing with? Lots of threads? Sheer CPU load? IO? Or something else?
I agree that NPE may be thrown from lines 322 or 324 if some of arguments are null, but I can't see how it could be thrown from line 321. It looks like line 320 was not executed or at least not completed before line 321 was called.
So far the most important for me is to understand how it happened.
Joined: Sep 17, 2006
Well the only thing that makes any sese to me is JLS - 17.3 Incorrectly Synchronized Programs Exhibit Surprising Behaviors, which I gravitate towards because the problem description is it happened only few times and can't be reproduced therefore I go on the board that we may never find it on the basis that the choice of words by working engineers - adjuncted by real-world constraints - suggests that if multiple threads can access the code - a reasonable assumption given your problem definition - then all of the fixes suggested here should be applied.
What I meant was if at any line, some thread other than the one that entered, does something with any of the ref's, not just the one under suspicion, then we get an npe. The fact that this occurs only under heavy load complicates isolation and recovery, and I sought to generalize this. So if the problem is to be found, I suggest wrapping the code in some sort of harness. An app that runs reliably is usually better than one that runs fast but fails occasionally. [ December 03, 2007: Message edited by: Nicholas Jordan ]
Umm, yeah, no not so much. They're local variables. Nothing else can access them. Stale memory caches of other variables can't affect anything here.
Jacek, tell us something about the environment this code is running in. Who is the JVM vendor (is it Sun's or something else?)
Are you using any native code (that you wrote, or third party?) Threads can hit each other with stray pointer accesses, since they're in the same memory space; perhaps native code on some other thread is smashing your stack?
I wonder if it's just a bug in reporting which line the error is coming from. It could be that some JIT optimizations are confusing the line number info. I haven't seen that happen, but it's the most plausible explanation I can think of. Perhaps you should just check for null arguments, and throw a clear exception if you find them:
Of course, maybe it's OK for some of those to be null. If you're using classes from java.util.logging, then the Level is the only one that should throw an NPE if it's null. At least when using the SimpleFormatter. Anyway, try testing what happens when different inputs are null.
Joined: Feb 09, 2007
This code runs as standalone process on sun jdk 1.5.0_11 on RedHat Linux. As you can suspect shown method is a part of logging utility. Application uses third party native code via JNI but as far as I know we never had any problems with invalid stack traces.
As for JLS 17.3 suggestion - I agree that it looks like there is some synchronization issue but I can't find what it could be. Class that thrown NPE is a log4j Logger wrapper and it only has static field logger. Classes using this wrapper follow log4j usage pattern - they declare static field logger.
author and iconoclast
Well, this is very significant because given this, "impossible" stuff now becomes possible. Thread-safety or JNI bugs in that native code could cause pretty much anything to happen. If the JNI code itself is third-party, have you talked to that vendor at all? Otherwise, you need to look hard at the code yourself. Are you using local vs. global references correctly? Are you pinning things when needed, and unpinning them afterwards? JNI is full of dragons.