Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
The moose likes JForum and the fly likes Runnaway Process Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


JavaRanch » Java Forums » Products » JForum
Bookmark "Runnaway Process" Watch "Runnaway Process" New topic
Author

Runnaway Process

Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
I am trying to track this down, but it seems to be difficult.

It seems there is a runnaway process somewhere that just keeps getting worse as time goes on. I have all scheduling of indexing turned off at this point. First I thought it was the quartz scheduler, but after taking that out completely, it seems that there is a runnaway process.

It starts out small and keeps building as if it is being caused or gets worse from some sort of user action (but that is only a guess at this point).

Rafael, can you think of any loops that would cause such a problem?

Brakker


[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Rafael,

I am going to turn off all caching that I currently have on to try and eliminate that from being a possibility.

The backgrounprocess that is defaulted to "true" in systemglobals...what is that used for?

Thanks,

Chad
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
It is used to send emails and index message contents, for search, basically.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Rafael,

I have forum.cache.enabled=false and it seems the forum data or answers/posts on the forum index page are cached still. I did a search for FORUM_CACHE_ENABLED through the source and only found int being used in ConfigKeys where it originally gets set.

Is it being used?


[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Well, it turns out the runnaay proccess(s) had to do with the connetion pool. I switch back to the basic connection for now (will go to datasource) and everything has been solid for 3 days. When having connection pool enabled, the problem would happen within a matter of 1 day.

B
[originally posted on jforum.net by Anonymous]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
brakker wrote:Rafael,

I have forum.cache.enabled=false and it seems the forum data or answers/posts on the forum index page are cached still. I did a search for FORUM_CACHE_ENABLED through the source and only found int being used in ConfigKeys where it originally gets set.

Is it being used?



It was a try, but I'm not sure about finishing the implementation. From my point of view, this kind of cache should be always enabled.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Anonymous wrote:
Well, it turns out the runnaay proccess(s) had to do with the connetion pool. I switch back to the basic connection for now (will go to datasource) and everything has been solid for 3 days. When having connection pool enabled, the problem would happen within a matter of 1 day.

B


Ok, I guess we should left the current PooledConnection and use another one by default, like C3P0

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Rafael,

Forget what I said.

The runnaway process had nothing to do with the connection pool...it was just a very well cooridinated coincidence. The running process was hard to track because I couldn't hook up jprofiler on a production system. Anyway, it had to do with image processing in a totally different area of the app....people posting in upwards to 10 meg pictures...ouch!

B
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
LOL!!

Well, that's good news for all of us!!

Thanks for reporting it.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
The problem is back with a vengeance. It is stalking us in a way that makes it almost impossible to locate the source. The large image processing turned out not to be the issue. That problem with images was fixed last week and our dual processor has spike twice now to 100%. When it does spike, it does not come down. I do not know where to look anymore or even how to find the root cause as this problem only happens in a production environment with lots of traffic. All I can say for sure is that the problem started when we added the new forum code. I have actually gone through and added log statements to every while loop in the application (including our app and jforum) to try and catch it. Still, I find no answers. Every time we think we have found the answer, it comes back with a vengeance as if someone or a bot is causing it....very random. When it happens, and we reroute traffic to a different location, the processor never comes back down as if it's stuck in a loop or several loops. One thing that I have noticed recently, is that our application is taking more memory than it ever needed before. We set the jvm to startup with a maximum of 1Gig in the heap. It does seem that when the memory gets to that level is when the problem kicks off. I have spent more than 2 solid weeks trying to locate the root of this evil. The only thing I can think of trying is possibly running the app in the latest jvm (1.5 instead of 1.4x) and see if we can get some type of profiling through it. I think I read somewhere that the newest jvm combined with tomcat 5.5 supports some type of embedded profiling. Anyway, that's where I am at with this. This is probabably one of the toughest bugs I have come accross in my 15 years of programming.
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
coolbreeze wrote:
The problem is back with a vengeance. It is stalking us in a way that makes it almost impossible to locate the source. The large image processing turned out not to be the issue. That problem with images was fixed last week and our dual processor has spike twice now to 100%. When it does spike, it does not come down. I do not know where to look anymore or even how to find the root cause as this problem only happens in a production environment with lots of traffic. All I can say for sure is that the problem started when we added the new forum code. I have actually gone through and added log statements to every while loop in the application (including our app and jforum) to try and catch it. Still, I find no answers. Every time we think we have found the answer, it comes back with a vengeance as if someone or a bot is causing it....very random. When it happens, and we reroute traffic to a different location, the processor never comes back down as if it's stuck in a loop or several loops. One thing that I have noticed recently, is that our application is taking more memory than it ever needed before. We set the jvm to startup with a maximum of 1Gig in the heap. It does seem that when the memory gets to that level is when the problem kicks off. I have spent more than 2 solid weeks trying to locate the root of this evil. The only thing I can think of trying is possibly running the app in the latest jvm (1.5 instead of 1.4x) and see if we can get some type of profiling through it. I think I read somewhere that the newest jvm combined with tomcat 5.5 supports some type of embedded profiling. Anyway, that's where I am at with this. This is probabably one of the toughest bugs I have come accross in my 15 years of programming.


Can you start tomcat server with java debuger jdb (comes with JDK) instead of java and then, when the problem occur look at what threads are running and what they are doing.
[originally posted on jforum.net by Anonymous]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
I wish I could use jdb, but in production, any profiler/debugger slows down the container to the point where it just crashes when you add traffic.

I am noticing ParameterParser and VariableExpander taking the most part of the CPU...according to my logging.

Since we are not allowing users to upload images in the forums or update their profiles, what would be using ParamerterParser? The reason I ask is because it is stored in 'util/legacy/commons/fileupload/'.




[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
VariableExpander? that's interesting. I accepted the code as it was made by the author, and never give much attention to it, as the code looked ok (was doing its job).

ParameterParser? hm, I dont' remember if I call it directly or it's used internaly by commons-fileupload...

Anyway, I put it under util/legacy to not have versioning problems, as JForum uses version 1.1-dev, while almost everyone has version 1.0 (outdated and bugged).

I'm going to check VariableExpander right now. Also, today (Sunday 27) I commited a code to the cvs that improves quite well the message reading page (posts/list). You may want to take a look on it.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Rafael,

Could you tell me what specific classes you updated for the message reading enhancement(s). Since I have had to try and debug this problem, I have changed a few things to minimize the whereabouts the problem I am dealing with. If you can tell me the specific files that were updated, I will be sure not to overwrite.

Is it PostAction mostly?

Thanks.
[originally posted on jforum.net by Anonymous]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Here you go:

generic_queries.sql:
https://jforum.dev.java.net/source/browse/jforum/WEB-INF/config/database/generic/generic_queries.sql?r1=1.104&r2=1.105

PostAction.java
https://jforum.dev.java.net/source/browse/jforum/src/net/jforum/view/forum/PostAction.java?r1=1.93&r2=1.94'

TopicDAO
https://jforum.dev.java.net/source/browse/jforum/src/net/jforum/dao/TopicDAO.java?r1=1.5&r2=1.6

GenericTopicDAO
https://jforum.dev.java.net/source/browse/jforum/src/net/jforum/dao/generic/GenericTopicModelDAO.java?r1=1.5&r2=1.6

I guess that's all.

Also, I have refactored the VariableExpander class - it should run much faster now. The diff is here:

https://jforum.dev.java.net/source/browse/jforum/src/net/jforum/util/preferences/VariableExpander.java?r1=1.4&r2=1.5

The search indexation is much faster as well, but I didn't add it to the cvs yet.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Ah, there is PostCommon.java too:

https://jforum.dev.java.net/source/browse/jforum/src/net/jforum/view/forum/common/PostCommon.java?r1=1.21&r2=1.22

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
I am starting to lean toward the theory that this issue is rooted to memory and the garbage collector is consuming the processor for the following reasons:

1) It's totally random
2) The heap is consuming large amounts of memory and the verbosegc output shows the heap doing lots to try and keep up
3) I have put debug statements in all loops of jforum and our app and nothing leads me to believe that an infinite loop exists

I still have no idea how I am going to debug this as it's near impossible to attach a running profiler in production. I am going to try jvm 1.5 with its new profiling tools as our app now needs to be shut down twice a day to free resources.
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Ok, I know that I'm going to suggest may offend you, but: how about disabling jforum for a couple of hours? it's a kinda desperate try, but may help to isolate the problem.

Anway, the VariableExpander patch may improve performance and memory usage a bit, as well the other ones.

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
coolbreeze, are you sure that -Xdebug option slows JVM a lot?
http://java.sun.com/products/jpda/doc/soljdb.html
http://debuggercore.netbeans.org/docs/VM-options.html
I tried it before and slowdown was not that significant.
But may be my load was not high enough.

But you are right, that JDK 1.5 profiler is the easiest option to start.
[originally posted on jforum.net by Anonymous]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Thanks for all of your replies.

GreenEyed, I think you are right on the money with your message. I believe this issue to be something very similar, although I am now testing different jvm options (once again) to help reduce the cpu expense. I was using the concurrent class unloading attribute along with the multithreaded gc for tenured objects. The reason I was using them was to try and get around the "stop the world" full gc and take advantage of dual processors. I am hoping this is what was causing these cpu spikes because I have search just about everywhere including our app and jforum. I have thought about trying different hardware, but unfortunately, the new xeon machines have already been purchased. I am also going to go back through and start setting variables to null so the gc has an easier time cleaning stuff up. I honestly think the problem here is class unloading though. We are getting roughly a million hits a day which means if one small piece isn't working correctly, the entire app could fail or become problematic. I will let you all know the result of taking out these jvm attributes that we are currently using.
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Hey there,

Looks like a weird problem :?. A similar problem we ran across was when running some of our applications under a Digital Unix Node (8 CPUs). When the traffic would go up to a certain point, the JVM would simply get stuck consuming CPU and no matter what we did, it would remain there until the process was ruthlessly killed.

The problem seems to be, as it is still "unsolved", that the Digital Unix Java implementation does not handle traffic very well in a multiprocessor environment, as we discovered when we copied our application onto a simple monoprocessor PC with WindowsNT and just 352MB RAM. It ran fine! :roll: So we moved that application to this simple box and everything went back to normal. It lasted a couple of years until the traffic was too much for this "super-pc" so this year we moved it to a Solaris Box, multiprocessor, and it's running fine.

So don't discard a JDK/platform/multiprocessor bug, if you have the possibility of running it under another host with a different hardware/JDK setup, I would recommend a quick test to discard that option.

Good luck!
[originally posted on jforum.net by GreenEyed]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Problem finally solved.

The issue was brought out after adding the forums to tomcat server 5.0.27. This version of tomcat has a serious bug so STAY AWAY!!!

http://www.junlu.com/msg/107548.html
http://forum.java.sun.com/thread.jspa?threadID=542672&messageID=2880153

I'm still not sure what sets it off as it was a totally random issue that seemed to be sparked at any given time. We switched web servers and everything is stable for 3 days now.

Thanks for all your suggestions in trying to solve this, but I'm pretty sure at this point that this case is closed....thank god!



[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Absolutely great! I'm *that* happy for you . In the end it was good for JForum too, as it made me to improve some parts of the code

Rafael
[originally posted on jforum.net by Rafael Steil]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Yeah man, I'm glad we all win in this one.

The forums are running fast and smooth on our site.
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Great!

Good to know you were able to solve it. Apart of the JDK, one of the pieces to change to see if it is related or not is the servlet container, which I forgot to mention :roll:

As you said, when you have so much traffic everything has to work like an F1 engine .

We use Resin but it still is good to know that that version of Tomcat is buggy, just in case. I've just worked in one application that had about the same or more traffic, and we moved away from Tomcat to Resin earlier. It has to be said that when we did Tomcat was at version 3.2.X, which was nowhere near what it is now.

Great.

One question though, the million hits a day you are getting are for the forum application or globally on the server? Given the "optimitistic" approach of JForum regarding multithreading, I'm curious to know how it would behave under so much traffic
[originally posted on jforum.net by GreenEyed]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
The 1 million hits is site wide. The forums don't get nearly as much traffic at this point.

Tomcat has been good and not so good for us. Back when we were using version 3.1.x, we had serious memory issues. We switched to Jetty along side of jboss and things went much smoother. With the onset of the latest tomcat versions, however, there are some nice features that brought our interest back so we made the switch after testing in a contained environment. I have to say, nothing has been as good as the latest jboss build though. It's very solid and packed with features. It made for an easy transition. I don't know much about Resin, but if you are working with ejbs at all, I highly recommend the tomcat/jboss combination. We were originally using the separation of servlet container outside of jboss. Now, we have switched to the embedded version and it rocks. It's very fast and I believe tomcat is tested much more thoroughly when bundled inside of jboss.

Our app is built using the mvc model much like jforum code is. Up until now, it's been very hard to see the performance of the forums due to this processor issue we've been having. Currently, I see no issues as to bottlenecks. When it comes to performance, I am one of those persons that has always strongly believed that keeping to the basics highly outways any complications associated with trying to get to extravagent with code. Having a site that has received a fair share of traffic really opens ones' eyes as to what works and what does not. We all start out having dreams of building the next NASA project in our code. What ends up happening, however, is just the opposite in most cases as we have to trim everything down to become more sensible and reliable.

To answer your question, so far I have had no problem with the optimistic approach. Everything seems to be "bump free" for the end user and consistent in the database.

Thanks guys.
[originally posted on jforum.net by coolbreeze]
Migrated From Jforum.net
Ranch Hand

Joined: Apr 22, 2012
Posts: 17424
Hello there,

Resin is a servlet container that includes also an EJB container, so everything fits in quite nicely and if you are going to put both pieces in the same machine, local calls are also very performant, as Tomcat bundled with JBoss I guess. I found it easier to install and to manage, but anyway everybody has their own tools and as long as it works.... ;).

The problem with EJB was that porting things was not very easy, so we moved away from it and we are using now Hibernate and a basic servlet container. Still Resin as it is quite good also alone. But I have some of my apps in a web provider that uses Tomcat and they also have no problems... well, they do , but that's because it is a shared host with many other people and they bust the JVM every now and then, hehehe :roll:

And yes, I'm also a friend of the KISS principle, much more important to follow when traffic raises as the less things that can go wrong, the better. I don't really like putting layers over layers of software just for the sake of it, as you loose a bit of control with every layer you add.

We also use the servlet controller approach with the presentation separated from the business logic, even though we use XSL for the view and Hibernate and session POJOs for the logic. The framework has been working nicely since 1999, it started with different technologies for the logic of course, so we have it quite debugged ;).

I'm glad to hear the optimistic approach is working fine. Unfortunately for me my users would kill me if this approach failed and a message was assigned to some other user, for example, so I had to opt for the conservative approach (foreign keys + synchronization). Well, I always do so that was not a problem :P.

Nice talking to you and again, glad to hear the problem was solved. Lord knows that kind of problems can turn your hair gray quite quickly ;)
[originally posted on jforum.net by GreenEyed]
 
GeeCON Prague 2014
 
subject: Runnaway Process