This week's book giveaway is in the OO, Patterns, UML and Refactoring forum. We're giving away four copies of Refactoring for Software Design Smells: Managing Technical Debt and have Girish Suryanarayana, Ganesh Samarthyam & Tushar Sharma on-line! See this thread for details.
Hi All, We are planning to design a very heavy load application where there could be 100s of hits every second. What are the design consideration we need to take care like server capacity, application server etc. We are planning to use Java and JBOSS server.
The major criteria here is there should not be any downtime due to heavy load and hits. Please provide your suggestion and if possible point me to any resource available.
We're going to need more details . What type of application is it? A Web app, an E-commerce App, what technology are you thinking of building this in? Is there going to be heavy usage of state? etc etc.
I have just read Scalable Internet Architectures by Theo Schlossnagle which is a pretty good introduction on design considerations that need to be made.
For example, you say
there should not be any downtime due to heavy load and hits
but is it OK to have downtime for other reasons (probably not). So you may need a high availability strategy or failover strategy for your site independent of the overall scale.
Further, the class of server that you use may differ depending on whether you go for horizontal scaling (adding more servers to a cluster), or vertical scaling (increasing capacity of single server)
Then there are concurrency issues and the like especially if databases are involved.
So there are a lot of design decisions that will depend on your context. The Schlossnagle book helped me get my head around some of these issues, and I would recommend it if this an important question for you.
when in doubt put it in parenthesis and stick a dollar sign in front of it, only good can come from this.
Joined: Aug 02, 2003
Thanks for the response. This is basically a online survey kind of application. Yes there will heavy use of state.
We thought of using J2EE and JBOSS app server. Regarding the load sharing should we scale Horizontally or Vertically? Is there any thumb rule for doing this.
OK, firstly yes Jboss and J2EE (I hope you mean JEE now?) are perfectly valid technologies to use for this type of application. It's _how_ you use them which is the killer
Above all else:
**Pre Optimisation is the root of all evil** In other words don't spend days tinkering with optimising your system architecture, do spend the time to get a quick prototype of your app working on a Jbos sserver and then _hammer_ it with an automated tool (something like JMeter). Find out where the real performance issues are and then plan your System Architecture accordingly.
Some random thoughts:
1.) Databases and the interactions with them (especially involving transactions) are often a bottle neck. You may find that you have to use a denormilised schema for higher performance, see your DBA for more details on other DB tuning that you may need. Same goes with pooling, connection timeouts yadda yadda
2.) If you're talking about having lots and lots of hits at the front end then you may want to consider putting an Apache Httpd server as a proxy in front of your Jboss application server(s). The Httpd server can serve up the static content as well as load balance requests to multiple Jboss servers (which you may or may not need).
3.) You probably need to think about a light weight software design, e.g. not 8 tiers of architecture. So look at and evaluate the light weight frameworks that are out there and see what people say about them in terms of filling your requirement. For example will you really need to go through that session facade layer and a CMP entity bean to store a survey result?
4.) You probably want to strip out unnecessary J2EE/JEE services running in Jboss. Start with the minimal configuration set that they supply out of the box and add services as you need them.
5.) Make sure your hardware is something decent with capacity to deal with spikes of traffic, I personally wouldn't recommend windows
6.) Transactions and Transactions Management is a great thing, but it can be slow, think twice before you go wrapping all of your logic in transactions, do you _really_ need to?
7.) If you do need to use Jboss clustering, it is (like any clustering) non trivial, you need to be _very_ aware of what automatically is shared and what you need to manually serialise. Also don't forget about clustering your DB, OS, and Web Server if needed.
There's so much more to this topic that I'm sure you could get novels worth of info. But remember the top point I made, it'll save you a lot of wasted time.
Apologies that this is stream of thought, I'm busy trying to investigate why my DB is deadlocking for seemingly no reason <sigh>.
Or rather "Java EE" now, since Sun wanted to drop the abbreviation to "J" as well as the unnecessary 2...
My main concern is that this question is so very vague... it's the kind of question consultants get paid heaps of cash to answer given certain criteria There are so many considerations and specifications which either you haven't listed for brevity, or haven't thought about at all.
From what you've said, it seems performance is of the utmost priority... so why did you choose a Java EE server? Was the choice of language in fact of more importance than performance after all? For example, I've heard some fantastic things about the lightweight FastCGI servers (like lighttpd and LiteSpeed) with C/C++ response generators---killer speeds and low overheads. YouTube and MySpace both use this approach for segments of their sites which have high traffic throughput (over 1000 requests/second). For sites with static content and simple request processors (e.g. doing POST-to-database), Java is unlikely to be the best performing solution. With the right hardware, it might be adequate for your needs though. But you'd need to determine that. As another example, Caucho did the same with the Resin Java EE server; they found that Java sockets and NIO are both slower than using native socket code over JNI, so went with the latter to obtain better throughput. Experiment to see what works best for you.
The problem I and colleagues always find with Java EE is the huge memory usage by the JVM---that is first reason why a server regularly fails under high load. And of course, if the OS chooses to terminate the JVM process, the whole Java EE server goes with it. With a FastCGI spawning separate processes to delegate requests to, the main server process stays up even in the event that one request processor crashes. A new request processor can then be spawned. So Java EE isn't something to run on a cheap server if you have high capacity needs. Indeed, you may wish to consider the load balancing/failover approach to cover such incidents.
You also have other considerations. Like what is your bandwidth, and how large is each of your web pages? Say you have 100 requests/second and 100Mbps per server connectivity. If you're using that full-out with no packet loss, that's at most 125KB you can transfer per request. Realistically, more like 100KB. A high-graphics page may be more than that, and add packet loss and routing delays, then your users are going to start stacking up as their requests time out. Your bottleneck is now the network and not the software. So either you'll need to 10x the capacity with a 1Gbps network upstream, or balance the traffic load over several servers with LAN switches of suitable capacity.
So you really need to create a whole plan, working through the requirements in order of priority with the client. It should not only cover the software, but the hardware and data centre too. Then use the right tools for the job, whatever software and hardware they may be.
To discuss these performance issues further, I'd suggest a nice moderator move this to the Performance forum where others might have more ideas.
Charles Lyons (SCJP 1.4, April 2003; SCJP 5, Dec 2006; SCWCD 1.4b, April 2004)
Author of OCEJWCD Study Companion for Oracle Exam 1Z0-899 (ISBN 0955160340 / AmazonAmazon UK )
actually, 100 hits per second is not considered heavy load.
Much more important is what each hit does. Displaying a web page at that volume is trivial with just apache. Doing a complete credit card auth/capture is another scale.
Tomcat or resin or jboss can handle the load. And can provide scaling and load balancing. This starts with the assumption that you have a decent computer/server. Since you can trivially get dual CPU each with QUAD core, and 12 or 24 GB of ram talking to RAID disks, you can get a lot of processing on one 2U rack mount box.
When you run out of power on one box (actually typically much sooner) you want to go to hardware load balancing network boxes such as a ServIron or BigIP
Joined: Aug 02, 2003
Hi All, Thanks a ton for all the suggestions. As Pat rightly mentioned 100 hits per sec is not a heavy load and he is right this is a banking application for processing cards and we have to go with java. With this added info any more suggestions are appreciated.
Red Hat (the Linux company) has been working on state-free systems. They're easier to scale, since the lack of state means you can front a cluster of servers with a load-balancer and not have to bounce back-end session info around. They also own JBoss, so their advice is likely to be worth something.
Of course, there's limits to what you can do state-free, especially in a secured transaction environment, but it's a good high-level consideration.
Micro-optimization is your enemy. It's a strategy that tends to lock you into a particular way of doing things, and often you'll discover that the real bottlenecks are elsewhere.
So is custom DIY stuff. While it may seem counter-intuitive, when you scale up, low-level optimizations lose their efficacy. A JVM is horribly hungry, weighing in at over 100MB of RAM on most platforms. But when you put in charge of a complex workload, the overhead gets diluted to the point where the advantages offered outweigh the disadvantages. It would be punitive to write a "Hello, World!" system to monitor code execution and adjust the code generation accordingly on the fly, but when you've got a system that's running 100s of transactions 24 hours a day, the expense of a load-sensitive subsystem is outweighed by the benefits of what it can do.
Nor is this something unique to JVMs. I've seen benchmarks for ORM systems that indicated double the performance of JDBC.
If you need to optimize something small and important, like an interrupt service routine, micro-optimization pays off. But business systems aren't generally so finely focused, so while in theory you could get really killer performance by hand-coding everything in assembly language (or C) and cobbling together your own custom DBMS, it would likely take you your whole life to do one complex online business app that way.
I gave up on assembly language around 1984 when I started seeing compilers generate code that was so finely-tuned that trying to track register usage could leave your head spinning - and realized that unlike hand-optimization, the compiler could completely re-optimize that register usage for each and every source change, no matter how trivial.
I likewise pretty much gave up on C when Java began meeting or exceeding it on performance tests and I started seeing programs like ArgoUML that belied the old adage that Java GUI programs had to be sluggish.
I do remain committed to keeping things flexible and investing in tunable (and self-tuning) technologies. These are strategies that have paid off.
An IDE is no substitute for an Intelligent Developer.
Joined: Mar 27, 2003
A JVM is horribly hungry, weighing in at over 100MB of RAM on most platforms. But when you put in charge of a complex workload, the overhead gets diluted to the point where the advantages offered outweigh the disadvantages.... I likewise pretty much gave up on C when Java began meeting or exceeding it on performance tests...
As you say, everything is about trade-offs. In general I agree that Java, PHP, ASP etc. are all going to be fine for programming websites. I use Java EE for the "more-than-average" Web site because the plethora of Java APIs makes it a breeze to do difficult things. On the desktop the benefits of platform independent bytecode make Java easy to deploy, especially via the Web as applets or through JNLP.
However, there are still some cases where knowing a low-level language like C/C++ is crucial, particularly where performance is concerned (handy we're in this forum then!). It probably isn't wise to throw it away. Just recently I've been heavily involved in a pro sound application, and no matter how hard we tried to get JavaSound to perform, it just couldn't work cross-platform efficiently enough. To get the low latency processing necessary, we need to use C/C++ to interface with low-level operating system calls directly. Better still some of the DSP routines can be optimised for specific architectures (32/64 bit, MMX, P4 etc.). Unfortunately the JNI overheads on Windows (which strangely are far reduced on some, but not all, Linux distros) meant doing regular processing of small audio chunks was also intolerable. So now the entire audio layer has to be re-written into C++ and applying portability layers so it can still be compiled (almost) everywhere. This does also have one advantage though: plenty of stable and mature audio decoders in native libraries. Fully-featured pure Java decoders for MP3, Ogg Vorbis and the like are hard to find and typically buggy at best.
I also recall a thread on this forum about FFT routines and someone commented that the routine in C++ was comparable to Java. IIRC, in C it was 2 times faster, and in assembly was 8 times faster. In DSP, those speeds matter, so using the assembly version is ideal.
So native programming can definitely have its advantages in performance wherever Java is lacking. Don't throw it away! As always, choose the right tool for the right job!
Originally posted by Charles Lyons: I also recall a thread on this forum about FFT routines and someone commented that the routine in C++ was comparable to Java. IIRC, in C it was 2 times faster, and in assembly was 8 times faster. In DSP, those speeds matter, so using the assembly version is ideal.
This is fairly far OT. If you are doing DSP, you need to use a DSP language for all the multiply and accumulates.
I have no problem with using specialized hardware and software for specialized stuff. But the OP seems to be talking about a standard boring web app. For them, especially since memory is free and CPU cycles are very cheap, its the software engineering time that is expensive. So focus on that, and do not micro-optimize until you run into performance issues when the full system functionality is being tested.
When comparing C/C++ to Java in performance terms it's important to note that Java has a deliberate handicap. Because portability is more important than performance in Java - including the ability to precisely reproduce results on all platforms, Java is not permitted by default to use the native floating-point hardware unless that hardware is implementing the IEEE floating-point standard. Thus, while recent benchmarks have shown Java meeting or exceeding C for most purposes, C ran away with the floating-point metrics because Java had to do everything in software.
YMMV - I'm more than a little suspicious that the (fairly) recent addition of IEEE floating-point to the IBM zSeries hardware in addition to their old, fairly quirky FP capabilities was because IBM has been pushing Java-on-the-mainframe.
There is a feature in Java to override portability and use the native FP capabilities, but it has to be explicitly requested, since "100% Java" is supposed to be "write once, run anywhere".
Most business apps, online or not, are data-intensive, not compute-intensive. Furthermore, floating point of ANY stripe lacks the precision for 100% accurate dollars-and-cents calculations, so that's not normally an issue.
Business use of floating-point arithmetic is primarily restricted to modeling and statistics, where the occasional fudging of a penny is generally immaterial. For really heavy tasks of that nature, C might be more appropriate. Then again, you might be better off doing it on FORTRAN on a supercomputer.
Which brings us around full circle. The most effective optimizations are generally those done in the large, not those done at the micro-level.