i write the best practices column for software test and performance magazine. http://stpmag.com. i'm back to the subject of post-deployment tuning and am looking for folks willing to share their thoughts, experience, horror/success stories, etc. my deadline is saturday sometime.
this time around (i last wrote about this one year ago) i'd like to address performance issues associated with the move towards applications-in-a-browser. here's my thesis, which i hope might elicit some response: in the old shrink-wrap days, it was extra important to do all tuning as early as possible in the development (or system integration) process. not so anymore. today we live in an era of componentized software, often released in beta form intentionally and designed to run in something as universal as a browser. given this shift, who cares about a thoughtful approach to post-deployment tuning? if there's a problem with functionality, the users of your app will just tell you what's wrong and you can issue a quick release or have the community do it if your code is open source.
obviously this is simplistic and outright wrong on many fronts, but i'm curious as to your thoughts. and i should remind folks that that i'm willing to abide by good journalistic practices and negotiate all attributions that might eventually appear in the column. that is, if it's best for me to write "according to one long-time programmer who works at large midwest manufacturing company" instead of "according to steve smith who works in IT for ford motor co." i'm happy to do so.
all the best to the smart folks here a javaranch. it's good to be back after many months away.
<<given this shift, who cares about a thoughtful approach to post-deployment tuning? if there's a problem with functionality, the users of your app will just tell you what's wrong and you can issue a quick release or have the community do it if your code is open source.>> In my opinion this premise is wrong on many fronts. This is like saying that the days of well thought out software, analysis, design, and requirements are dead because we now have browsers. In many ways software development process hasn't changed much through the years.
In my opinion poor performing software is just the tip of the iceberg. If it doesn't perform well it is usually sloppy on other fronts too.
I don't remember the stats but the earlier you find problems in the development cycle the cheaper it is to fix (finding problems at requirements time can be something like 100 times cheaper to fix).
As an example recently we had the one very slow query bring our whole site down. If the developer had done more due diligence and performance tested his code we would have avoided site downtime, and many many hours of site administrators time trying to track this needle in a haystack performance problem down.
Also web apps are much harder than client server apps to tune. Many applications now have multiple hosts, multiple web servers, multiple app servers, multiple network connections, and many other pieces, all with different parties/groups that manage them. In the missing index example I gave above there was no arrow pointing at the missing index to help us find it. Every subsystem manager had to drop what they were doing and see if the problem was in their area.
Also users may not be so understanding when you tell them "It's ok that your site is non-responsive now that we use browsers".
Note these are the same arguments that were used years back to eliminate bugs as soon as possible in the development cycle. The technology changes but software development is still software development, and users are still users.
I should add that I work for the infrastructure group. Often we are the first line of attack for application performance problems. We are the ones that are paged at 3 in the morning when application performance goes south, while developers are getting their beauty sleep. We very rarely have the ability to dive into the code to figure the problem out yet we are the ones responsible for getting the system back up. Often developers take a while to figure out where the problem lies even when they get involved. Meanwhile users see the hourglass.
Also any code release is a bit expensive in our environment as it requires an SCR, approvals, documentation, testing, deployment and yes even post deployment testing.
Joined: Mar 19, 2006
thanks a bunch, steve. i'm also skeptical of all these claims that apps-in-a-browser changes everything. still, i'm curious: do you see any ways in which the rise of web apps makes it easier to do post-deployment tuning? is it 100% hype or 95% hype and 5% substance?
and perhaps on a more practical front: it does seem that folks who do sys admin work are invariably going to bear of post deployment tuning. other than just being more careful, producing cleaner code, etc., is there anything specifically that developers can do make life easier (when it comes to perf tuning) on sys admins? and could you say just a bit more abot how you did troubleshooting of the missing index issue?
Geoff, My first thought on reading this is that there is a big difference between doing some post-deployment (or "it's almost time to deploy") tuning than not lifting a finger to look at performance until after deployment. While it is easy enough to add an index after deployment, some changes affect the design. Suppose my requirement says I need to process 1 billion transactions in a minute. If my solution is to process them linearly in a loop, I'm going to have a problem. It's signficantly more work to add parallelism than simply tune the initial approach.
"the users of your app will just tell you what's wrong and you can issue a quick release " - And then you can break something else and fix it and wait for the users to tell you and ... This sounds like a waste of time. And when did it become the PAYING users job to test your product anyway?
Another important thing is that some performance problems are functional problems. Internet Explorer times out after 5 minutes by default. If my query takes 6 minutes, it might as well not exist from a user's point of view.
Some performance type problems also expose your site to denial of service attacks. If doing a large query can use up enough processing time, it doesn't take that many to bring down your site. And again, this means the site might as well not exist from the user's point of view.
"do you see any ways in which the rise of web apps makes it easier to do post-deployment tuning?" - Sure. It's easier to push a new version of the application and know everyone gets it. This could be used in tuning cache settings based on unforseen demand. This could be useful even if an application has done thorough performance testing. Suppose setting X is best if we have an average of 100 hits per day and setting Y is best if we have an average of a 100,000 hits per day. If demand isn't known or esitmated accurately, it's nice not to have to redeploy to adjust things. Similarly, grid computing allows you to add resources after the fact. Useful if you get Slashdotted.
When I read your initial post, I interpretted "performance tuning" to mean testing and seeing where the problems reside. Based on the last question, I realize you are talking more about last minute tuning than testing. These are quite different concepts!
"is there anything specifically that developers can do make life easier (when it comes to perf tuning) on sys admins? and could you say just a bit more abot how you did troubleshooting of the missing index issue?" - Externalizing settings (so can change things without redeployment), documenting the expected configuration (indexes and the like)
do you see any ways in which the rise of web apps makes it easier to do post-deployment tuning? is it 100% hype or 95% hype and 5% substance?
As often is the case - it depends. Those that say that you can more easily do post deployment tuning in the web world are probably saying that you can more easily deploy code fixes. That can be, but is not always the case. For example if I have a client server application with 10 users in my building it is often very easy to deploy the new code. If you have a wide user community geographically dispersed it is not.
Likewise if you have one web application on one server, deployment is typically easy, but if you have an application that spans many servers and software products it often isn't. Still overall I would agree that deploying web apps is easier, though it is easiest to not have to redeploy at all.
other than just being more careful, producing cleaner code, etc., is there anything specifically that developers can do make life easier (when it comes to perf tuning) on sys admins?
Deploy solid well tested code. Have a production environment that you can easily determine where performance problems exist. This means monitoring tools that can peer inside an application and can look at the software servers (db servers, app servers, web servers etc), as well as the hosts. If you don't have such an infrastructure it is almost impossible to figure where that needle in a haystack is.
and could you say just a bit more abot how you did troubleshooting of the missing index issue?
Often when performance is bad, reports trickle in that are quite vague. People can't quantify what is slow (is it the database? the OS? application code? one piece of sql or page? the app server? the web server? file IO?...)., and when it became slow.
One of the most important things is that you need to have some knowledge about what normal is. What is normal can drift over time so you must keep this knowledge up to date. This includes knowledge about every subsystem. If you don't have this information it becomes very difficult to determine if a performance problem exists, where it is occurring, and often it turns into a pointing match.
So the key is to monitor and know your Production performance metrics. If one piece of the infrasctructure or application degrades you should know it. If you tune it and it improves you should know it (I have seen people tune and tune for weeks and not make any performance improvements as they randomly tune anything within flailing distance.).
is there anything specifically that developers can do make life easier (when it comes to perf tuning) on sys admins?
One of the most important thing you can do whether you are an administrator or developer is assume that the performance problem is yours until you prove otherwise. I have seen everyone sit around and point fingers and not take the 10 minutes it would take to prove that it was or was not their problem.
could you say just a bit more abot how you did troubleshooting of the missing index issue?
For such a simple problem I am embarrassed to say how many things we did before we figured this one out, and even more embarrassed to say that it is like groundhog day in that we repeat this procedure far to often. People often think the only effort in performance tuning is fixing the problem, but in many cases (like the one below) the fix took an hour, but diagnosing the problem took weeks!
Here is just a sample of the things we did: * Looked to see if any cron jobs were recently scheduled, * reviewed recent deployments, checked OS and db patches, * collected os and db server metrics (cpu/memory/IO)and reviewed them daily to look for clues, * changed locking schemes on tables, * reviewed logs in db/app server/web server/more, * brought down the primary db server and failed over to the replicate, * wrote custom tests to test performance of the OS/DB/more, * tuned sql, * opened tech support calls, * disabled scheduled jobs, * numerous meetings and con calls, * lots of finger pointing as the system kept going down.
All of this to find the relatively simple problem of having a missing index.
This is typical of what happens when you don't have diagnostic tools and performance metrics for the blackboxes we call applications and servers.
See the performance faq below for more info. [ April 12, 2007: Message edited by: steve souza ]
Joined: Mar 19, 2006
Wow, thanks Steve. This is hugely generous of you and really great information. Is it okay if I quote from these replies? Also, in my column should I still refer to you the same way I did in '06. "...says Steve Souza, a Washington, D.C.-based Sybase consultant with 20 years of experience..."?
I appreciate your perspective.
Joined: Jun 26, 2002
yes, quote away. Just post in this forum when the article is released. You can refer to me the same as in '06, although I guess I now have 21 years of experience [ April 12, 2007: Message edited by: steve souza ]
Joined: Mar 19, 2006
will do steve, thanks very much. here's the column from last year, just fyi. yipes, the challenge is coming up with something new to say on the subject: http://stpmag.com/issues/stp-2006-06.pdf. see p. 36. -g
thanks, steve, for posting the article. i really appreciate the ongoing help from everyone in the forum in my reporting/writing. it's great community and folks are patient with my questions. have a great weekend, all. p.s. if there are any thoughts on the column, as always i'd love to hear them.