RMI based application with the server piece running as a Java app on Solaris. After a few hours of use, with 100-150 users, the server app appears to hang from the client. If I do a kill -3 on the server process, I see a whole buttload of threads (over 100) stuck on DriverManager.getConnection() method (see below). The trace doesn't show anything deeper than that, but it is Oracle thin client being used. Our DBAs don't see any connections being attempted in the listeners log, and in fact, the number of the connections I see to the DB box via netstat jives exactly with the number of sessions I see on the Oracle side (in V$SESSION). While this is going on, I can connect to the DB box from the app server box all day long using a small class I wrote that just does a connection open and close.
This app was working fairly flawlessly until about a week ago. Nothing in terms of drivers or JVM version has changed, but there has been slightly more user load lately. It seems that every new user request that comes in after a certain point ends up with a thread blocked like this. What is the thread stopped in getConnection() indicative of? Has anyone seen anything like this?
Solaris 5.8, Sun JDK 1.3.1-b24
Much appreciated, Mike
Tons of these show in the thread dump (kill -3): Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut:"RMI TCP Connection(11307)-10.160.96.220" daemon prio=5 tid=0x4946d0 nid=0x2e7b waiting for monitor entry [0xd4480000..0xd44819e0] Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at java.sql.DriverManager.getConnection(DriverManager.java:193) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sched_MT.rmiSched_MTImpl.getLocation(rmiSched_MTImpl.java:556) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at java.lang.reflect.Method.invoke(Native Method) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:241) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sun.rmi.transport.Transport$1.run(Transport.java:152) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at java.security.AccessController.doPrivileged(Native Method) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sun.rmi.transport.Transport.serviceCall(Transport.java:148) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:465) Wed May 19 11:40:04 EDT 2004:ExecGroup-0 ut: at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:706)
Have you checked the oracle server to see how many connections are open? A complete stab in the dark is that somewhere in your code, you have a single point where connections are made but are not closed, and this is causing a resource leak.
I believe the Oracle database server is configured to accept a maximum number of connections. Over time you would have this rogue piece of code slowly taking over threads and making less available to the rest of the application, slowl;y choking it till they all run out.
Suggestions: a connection pool would help heaps. In fact, abstracting database operations (DataSource, Entity-Relationship mapping, pooling, db manager classes) in any way would probably help.
As I said, justa guess. If you pass Database resources (Connections etc) between classes I can only see it getting harder to fix...
Joined: May 19, 2004
According to Oracle, we have lots of room. There have been between 10-100 connections open at failure (according to V$SESSION and netstat), out of a configured max of 1000. For kicks, I tested running out of connections to see what would happen, and the fialure mode is different - the client side dies an ugly death with an exception like "max number of processes reached" and throws an exception.
Starting to think DriverManager's class-synchronized methods are a place to look.
If you are not using a connection pool and your Oracle database is not MTS, then Oracle connections are not multi-threaded. If two threads attempt to access the same connection at the same time, one will block.
Each RMI invocation spawns a new thread. Even though the thread dump tells you that the problem is with the "getConnection()" method, it may actually be causeed by something else. Perhaps your RMI code is not thread-safe? Jack Shirazi, in his Java Performance Tuning book says that a common problem with multi-threaded programs is that they can work well with small loads, but will become problematic when the load increases. This seems applicable to your situation -- you claim that the application worked well, until the load was increased.
Have you checked with Oracle's MetaLink Web site to see if there are any known issues regarding your combination of platform, database version and/or JDBC driver?