We have an application we're testing under load and it's starting to fail in strange (non-application ) ways. At the moment I'm brainstorming but I thought I'd drop the config here in case other had any pointers.
The front end of the application is a soap service running on Weblogic 7.0.2 and (currently) WinXP, sp2. This server runs on a local machine. The test harness is a Java app running on an externally hosted 4CPU box. We need that much grunt to run the tests. The Test Harness is configurable to run a series of concurrent users. With up to 12 concurrent users at about 25 transactions per second we have no problems. Beyond this we start to get what looks like network issues (more on this in a sec).
Communication is also over HTTPS.
We're pretty sure it's not weblogic specific, since we run two WL domains, the external one with 100 threads and an internal one which gets hit harder and has 400 threads. The internal server doesn't show any of the problems displayed on the external domain. This leads me to believe it's something between the test harness and weblogic server.
The Test harness is, as I mentioned, hosted externally in the UK. I'm going to ping them a query on anything they may be doing to choke connections. DOS prevention? The local server is running in our office in OZ. Firewall, router etc. Nothing we can think of that would cause issues here.
Here are the types of errors we're seeing: * Read timeouts while accepting requests * connection timout reported on the test harness - WL isn't even seeing these attempts so it can't be * socket closed - reported by the test harness
Any thoughts on other possible causes or ways to highlight or rule out factors? It's only causing about 1% failure and we can fix the results for this, but it'd be better if we either knew the problem so we could say "It's not us", or could fix it. I prefer the second one.