aspose file tools*
The moose likes Java in General and the fly likes Not so simple NPE Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Not so simple NPE" Watch "Not so simple NPE" New topic
Author

Not so simple NPE

Jacek Ostrowski
Greenhorn

Joined: Feb 09, 2007
Posts: 23
Howdy ranchers, I found NullPointerException that I can't explain.
I'm going mad because of it so maybe you will help me.

Here is method that throws NPE, line numbers are in comments


NPE was thrown from line 321 during stress tests under heavy load, it happened only few times and can't be reproduced, code is multi-threaded.

Any ideas ?

Jacek


SCEA, SCWCD, SCJP, OCA AS10g
Pat Farrell
Rancher

Joined: Aug 11, 2007
Posts: 4655
    
    5

Originally posted by Jacek Ostrowski:

Any ideas ?


Nothing obvious, but I'd wrap the code with a Mutex and see if that improved it.

Can you tell if its is 'param' that is null, I would expect it is your allocated Object array. I'd add some if ( x== null) printStackTrace code, if you can't set breakpoints.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Since objs is a local variable, that really seems impossible. I think the most likely scenario is that you're accidentally executing an older version of the code, and the line number is incorrect. Otherwise, it seems like it would have to be a JVM bug of some sort. You could try looking for a bug report, but I suspect there are many NullPointerException reports that are not related to this problem.

It may be possible to put in a workaround, catching the NPE, logging it, and retrying the failed code - but I'd be very reluctant to do something like that without understanding what the problem really is.

What kind of heavy stress are you dealing with? Lots of threads? Sheer CPU load? IO? Or something else?


"I'm not back." - Bill Harding, Twister
Rob Spoor
Sheriff

Joined: Oct 27, 2005
Posts: 19693
    
  20

Does the NPE also occur if you change it into the following:


SCJP 1.4 - SCJP 6 - SCWCD 5 - OCEEJBD 6
How To Ask Questions How To Answer Questions
Jacek Ostrowski
Greenhorn

Joined: Feb 09, 2007
Posts: 23
Can you tell if its is 'param' that is null


Yes, there is another log method that calls this one with a null param.
But it should have no meaning and it should not throw NPE.

Since objs is a local variable, that really seems impossible

Exactly what I'm thinking.

I think the most likely scenario is that you're accidentally executing an older version of the code

That is the first what I've checked. Line numbers are correct.

it seems like it would have to be a JVM bug of some sort. You could try looking for a bug report, but I suspect there are many NullPointerException reports that are not related to this problem.

It is my conclusion too. There are many NPEs in bug database and so far I haven't found anything close to mine.

What kind of heavy stress are you dealing with ?

We do load tests and this means that we have many threads and high CPU load.


Thanks for those suggestions. I'm going back to java bug database

Jacek
Nicholas Jordan
Ranch Hand

Joined: Sep 17, 2006
Posts: 1282
Originally posted by Jacek Ostrowski:
Yes, there is another log method that calls this one with a null param.
But it should have no meaning and it should not throw NPE.

Why not ? if(level == null || msg == null || params == null)result == npe;// duh

Since objs is a local variable, but not the objs[0] ? ( no ? ) - so I'd take P.F.'s approach.


"The differential equations that describe dynamic interactions of power generators are similar to that of the gravitational interplay among celestial bodies, which is chaotic in nature."
Jacek Ostrowski
Greenhorn

Joined: Feb 09, 2007
Posts: 23

Why not ? if(level == null || msg == null || params == null)result == npe;// duh


I agree that NPE may be thrown from lines 322 or 324 if some of arguments are null, but I can't see how it could be thrown from line 321. It looks like line 320 was not executed or at least not completed before line 321 was called.

So far the most important for me is to understand how it happened.

Jacek
Nicholas Jordan
Ranch Hand

Joined: Sep 17, 2006
Posts: 1282
Well the only thing that makes any sese to me is JLS - 17.3 Incorrectly Synchronized Programs Exhibit Surprising Behaviors, which I gravitate towards because the problem description is it happened only few times and can't be reproduced therefore I go on the board that we may never find it on the basis that the choice of words by working engineers - adjuncted by real-world constraints - suggests that if multiple threads can access the code - a reasonable assumption given your problem definition - then all of the fixes suggested here should be applied.

What I meant was if at any line, some thread other than the one that entered, does something with any of the ref's, not just the one under suspicion, then we get an npe. The fact that this occurs only under heavy load complicates isolation and recovery, and I sought to generalize this. So if the problem is to be found, I suggest wrapping the code in some sort of harness. An app that runs reliably is usually better than one that runs fast but fails occasionally.
[ December 03, 2007: Message edited by: Nicholas Jordan ]
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Umm, yeah, no not so much. They're local variables. Nothing else can access them. Stale memory caches of other variables can't affect anything here.

Jacek, tell us something about the environment this code is running in. Who is the JVM vendor (is it Sun's or something else?)

Are you using any native code (that you wrote, or third party?) Threads can hit each other with stray pointer accesses, since they're in the same memory space; perhaps native code on some other thread is smashing your stack?


[Jess in Action][AskingGoodQuestions]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
I wonder if it's just a bug in reporting which line the error is coming from. It could be that some JIT optimizations are confusing the line number info. I haven't seen that happen, but it's the most plausible explanation I can think of. Perhaps you should just check for null arguments, and throw a clear exception if you find them:

Of course, maybe it's OK for some of those to be null. If you're using classes from java.util.logging, then the Level is the only one that should throw an NPE if it's null. At least when using the SimpleFormatter. Anyway, try testing what happens when different inputs are null.
Jacek Ostrowski
Greenhorn

Joined: Feb 09, 2007
Posts: 23
This code runs as standalone process on sun jdk 1.5.0_11 on RedHat Linux.
As you can suspect shown method is a part of logging utility.
Application uses third party native code via JNI but as far as I know we never had any problems with invalid stack traces.

As for JLS 17.3 suggestion - I agree that it looks like there is some synchronization issue but I can't find what it could be.
Class that thrown NPE is a log4j Logger wrapper and it only has static field logger. Classes using this wrapper follow log4j usage pattern - they declare static field logger.


Jacek
Ernest Friedman-Hill
author and iconoclast
Marshal

Joined: Jul 08, 2003
Posts: 24184
    
  34

Originally posted by jaceko:

Application uses third party native code via JNI


Well, this is very significant because given this, "impossible" stuff now becomes possible. Thread-safety or JNI bugs in that native code could cause pretty much anything to happen. If the JNI code itself is third-party, have you talked to that vendor at all? Otherwise, you need to look hard at the code yourself. Are you using local vs. global references correctly? Are you pinning things when needed, and unpinning them afterwards? JNI is full of dragons.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Not so simple NPE