File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes OO, Patterns, UML and Refactoring and the fly likes [Refactoring project] How do I begin this ? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » OO, Patterns, UML and Refactoring
Bookmark "[Refactoring project] How do I begin this ?" Watch "[Refactoring project] How do I begin this ?" New topic
Author

[Refactoring project] How do I begin this ?

Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

Hi everyone,
I just found out that I've been told to lead a project to refactor an
existing web application. The crunch is that this will be my first time leading a team (yikes, what does it take to be a leader) and I need to get some guidance on how to do this.
The webapp works fine but was built under pretty tight schedules. But, there's a lot (more like everywhere, so the code is very very brittle when faced with change requests) of code duplication in both the java code and also the db schema. Some context: we're using Struts/Tiles, SLSBs & Hibernate. Note that the deliverable is a new source tree, normalized and duplications removed.
I've read Fowler's refactoring book and understand that you need to write up front tests before beginning refactoring. I do test first design myself but that's when I'm actually writing new code! So my questions are:
1) What should be the first things to do before beginning the refactoring ?
2) If you've been faced with my situation, I'd be glad if you could share some "technical" and "management" tips.
Regards,
Pho


Regards,

Pho
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
In my opinion, the most important thing to do is building a strong suite of acceptance tests. You will need the tests to make sure that you don't brake something, and the code probably isn't very unit testing friendly, yet.
I would probably use http://fitnesse.org/ for this.


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

Some quarters have suggested that I build tests against the old codebase while others say, the new codebase. Note that the new codebase does not exist yet. One acquaintance had suggested that I start off from the business rules & write tests based on that, rather than the old code ?
Jeanne Boyarsky
author & internet detective
Marshal

Joined: May 26, 2003
Posts: 30777
    
157

As Ilja mentioned, you should definitely have acceptance tests. Some integration type tests could be useful too if you have any interfaces that are at a high level. Try to avoid creating tests that are too coupled to the implementation as that will change.


[Blog] [JavaRanch FAQ] [How To Ask Questions The Smart Way] [Book Promos]
Blogging on Certs: SCEA Part 1, Part 2 & 3, Core Spring 3, OCAJP, OCPJP beta, TOGAF part 1 and part 2
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Looks to me like you have two technical jobs: (1) normalizing the database, and (2) refactoring the Java code. You also seem to realize that the following is not part of your job: (9) improving the application.
I agree with others that tests will help when refactoring the code. If your business logic is at all complex, I think unit tests as well as acceptance tests will be appropriate; without unit tests, an exhaustive set of acceptance tests can end up being unreasonably complex, and a less than exhaustive set can let bugs slip through. Note that the 'unit' for a unit test can be more than a class - it can be a group of related classes, preferably a group that will neither be split up much nor be added to much during the refactoring. Unit tests will be more important for code that was more complex and difficult to write.
As for whether the tests should target the existing code or the new code - they should definitely target the existing code. The acceptance tests should be the same for both, anyway, since you aren't changing functionality, just refactoring. Since refactoring is best done incrementally, you'll want the unit tests to target the existing code, too, so they will be useful throughout the refactoring process: you want to be able to catch the bugs as they are introduced, rather than get overwhelmed by all of them at once at the end. Besides, if you write the tests for nonexistent final code first, how will you test the tests?
Normalizing the database will not work like code refactoring. Instead, you probably want to have someone who really understands database issues well to come up with a normalized schema before you actually do any of the changes. This can be done, for example, while other people are writing tests. Once you have the redesigned schema, you can actually implement it incrementally, adjusting the code as you go along.
To the extent that your project lead job involves management as well as technical leadership, remember that the job of the manager is to help the workers do the work. That means getting out of their way as much as possible, helping get rid of anything, especially administrative things, that get in their way, and only actually redirecting them if what they're doing is clearly unproductive. Be available when they need you, but I wouldn't initiate contact more than once a day on average at most - though I'd try to talk to everyone at least once a week. (If you're doing technical work, that's in addition to any technical discussions that are needed; if you're the technical lead but someone else is the manager, you can ignore this paragraph.)
Since you're willing to read books, I'll recommend one on software project management and leadership: "The Mythical Man Month", by Frederick P. Brooks Jr. Unlike the fad-of-the-year books, this one provides a lot of useful empirical data, helping you draw your own conclusions about what you need to do.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Warren Dew:
Looks to me like you have two technical jobs: (1) normalizing the database, and (2) refactoring the Java code. You also seem to realize that the following is not part of your job: (9) improving the application.

Notice though that often not being allowed to improve the application can make refactoring much harder. In a badly designed system, there often is much duplicated code, and some of it will contain bugs. To remove the duplication you will often *have to* remove the bugs...
I agree with others that tests will help when refactoring the code. If your business logic is at all complex, I think unit tests as well as acceptance tests will be appropriate; without unit tests, an exhaustive set of acceptance tests can end up being unreasonably complex, and a less than exhaustive set can let bugs slip through. Note that the 'unit' for a unit test can be more than a class - it can be a group of related classes, preferably a group that will neither be split up much nor be added to much during the refactoring. Unit tests will be more important for code that was more complex and difficult to write.

Unfortunately, such code often also resists being unit tested.
Normalizing the database will not work like code refactoring. Instead, you probably want to have someone who really understands database issues well to come up with a normalized schema before you actually do any of the changes.

I'm not sure this is true - I think I have heard from people who did database refactorings as incrementally as code refactorings.
Be available when they need you, but I wouldn't initiate contact more than once a day on average at most - though I'd try to talk to everyone at least once a week.

I would probably try daily Stand Up Meetings: http://c2.com/cgi/wiki?StandUpMeeting
Since you're willing to read books, I'll recommend one on software project management and leadership: "The Mythical Man Month", by Frederick P. Brooks Jr. Unlike the fad-of-the-year books, this one provides a lot of useful empirical data, helping you draw your own conclusions about what you need to do.

Another good book on the topic is "Peopleware" by DeMarco and Lister.
"Agile Database Techniques" by Scott Ambler has some chapters on normalization and database refactoring.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Pho Tek:
Some quarters have suggested that I build tests against the old codebase while others say, the new codebase. Note that the new codebase does not exist yet. One acquaintance had suggested that I start off from the business rules & write tests based on that, rather than the old code ?

I would definitely write tests against the old codebase as much as possible. Even if the old code base has bugs which should be fixed on the way, you will need to know what you fixed and communicate with the domain expert wether it really should be "fixed" (remember, sometimes it *is* a feature, not a bug...).
I also wouldn't expect some written down business rules to be correct or complete in any sense, let alone as unambiguous as code..
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Originally posted by Ilja Preuss:
Notice though that often not being allowed to improve the application can make refactoring much harder. In a badly designed system, there often is much duplicated code, and some of it will contain bugs. To remove the duplication you will often *have to* remove the bugs...

Sure ... as long I realize that I'm removing the bugs to make my own refactoring job easier, not because it's something I'm getting paid to do. If I make the mistake of thinking that what those managers really want is to improve the code rather than refactor it, though, I'm likely to decide, mistakenly, that what I really ought to do is throw out the code base and rewrite it from scratch - especially if I'm inexperienced enough that I haven't done much working with someone else's code - and I'll end up taking three times as long as the refactoring effort would have taken, while not actually accomplishing my objective (since differences I view as "improvements" the end user may view as "pointless annoying changes").
Hm, that brings up one of the things I forgot to mention - someone else is probably actually getting paid to actually improve the existing application (that is, make changes the end users actually asked for) while the refactoring team is making a refactored version of it. If you refactor very incrementally, you might be able to do this in the same source tree, but more likely they'll be working on a branch. You'll want to keep good track of the feature changes so you can add them to the refactored code base at the end.
Regarding complex code:
Unfortunately, such code often also resists being unit tested.

Can you expand on this? It's true that unit tests for complex code are themselves often more complex than the typical unit test, but I've always cost benefit ratio to be at least as good in these situations.
Actually, in this particular situation, there's a kind of cheaty way to write unit tests of complex units, given that the existing application 'works fine' - simply package up the existing code into a unit test, and compare the result to the code being tested.
I think I have heard from people who did database refactorings as incrementally as code refactorings.

Implementing the revised schema can certainly be done incrementally.
Developing the revised schema can be done incrementally as well, but unlike in code refactoring, I don't think it's advantageous to do so. The problem is that if one only normalizes part of the database at a time, one is likely to end up making changes that are different from the changes one would have made had the whole database been considered at once. Then when one considers more of the database, some of those things get changed a second time - which tends to be costly in terms of the associated code changes. Sometimes one has no choice - for example, when sneaking in some database cleanup in the course of other duties - but in this case, as part of a major refactoring project, he's got the luxury of cleaning up the database all at once.
I would probably try daily Stand Up Meetings

Hm, someone's XP tendencies are showing!
Actually, that's probably a good idea - for technical meetings. For managerial issues in a non-XP context, I've found that employees are often willing to be more open in private, with only the manager present.
I'll second the recommendation for "Peopleware".
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

Hm, that brings up one of the things I forgot to mention - someone else is probably actually getting paid to actually improve the existing application (that is, make changes the end users actually asked for) while the refactoring team is making a refactored version of it. If you refactor very incrementally, you might be able to do this in the same source tree, but more likely they'll be working on a branch. You'll want to keep good track of the feature changes so you can add them to the refactored code base at the end.

Ouch - I hope you are not doing this - refactoring while someone else is adding functionality, that is. In my not so humble opinion, in that case it would be far better to join forces and build one team which adds functionality and refactors on its way - both on the same code base.


--------------------------------------------------------------------------------
Regarding complex code:
Unfortunately, such code often also resists being unit tested.
--------------------------------------------------------------------------------

Can you expand on this? It's true that unit tests for complex code are themselves often more complex than the typical unit test, but I've always cost benefit ratio to be at least as good in these situations.
Actually, in this particular situation, there's a kind of cheaty way to write unit tests of complex units, given that the existing application 'works fine' - simply package up the existing code into a unit test, and compare the result to the code being tested.

Well, in my experience such code was always so coupled that you couldn't help but write system level tests - it wasn't even possible to bypass the GUI. You might be more lucky with your code, of course...

Implementing the revised schema can certainly be done incrementally.
Developing the revised schema can be done incrementally as well, but unlike in code refactoring, I don't think it's advantageous to do so. The problem is that if one only normalizes part of the database at a time, one is likely to end up making changes that are different from the changes one would have made had the whole database been considered at once. Then when one considers more of the database, some of those things get changed a second time - which tends to be costly in terms of the associated code changes. Sometimes one has no choice - for example, when sneaking in some database cleanup in the course of other duties - but in this case, as part of a major refactoring project, he's got the luxury of cleaning up the database all at once.

Mhh, I am not sure (and granted, I don't have any experience with this). How long would it take to "fully" revise the schema? Especially with such a project, which doesn't give immediate benefit to the customer, I would always anticipate a sudden stop of the project in favor to something with more immediate ROI. And when that happens, I'd rather have some of the normalizations applied to the database, than having all applied to a document. Your mileage may vary, of course...


--------------------------------------------------------------------------------
I would probably try daily Stand Up Meetings
--------------------------------------------------------------------------------

Hm, someone's XP tendencies are showing!

Definetely! Although Kent Beck just borrowed the idea from SCRUM, as far as I know.
Actually, that's probably a good idea - for technical meetings. For managerial issues in a non-XP context, I've found that employees are often willing to be more open in private, with only the manager present.

I am not sure I follow you - what managerial issues are you thinking of?
Warren Dew
blacksmith
Ranch Hand

Joined: Mar 04, 2004
Posts: 1332
    
    2
Originally posted by Ilja Preuss:
Well, in my experience such code was always so coupled that you couldn't help but write system level tests - it wasn't even possible to bypass the GUI. You might be more lucky with your code, of course...

I was thinking of a different kind of complex code - code that's internally complex because it implements complex mathematical algorithms, for example, even though the interfaces are simple. I agree spaghetti code is difficult to write good tests for.
Mhh, I am not sure (and granted, I don't have any experience with this). How long would it take to "fully" revise the schema?

Probably somewhat less than one tenth the time to implement those revisions in an existing system - and having such a document to guide one would, I suspect, make the subsequent work go about twice as fast.
Regarding issues better handled privately:
I am not sure I follow you - what managerial issues are you thinking of?

For example, it might be easier to say things like, "I know my performance has been lagging lately, but my father is dying of AIDS and I'm spending a lot of time at the hospital - I'll do the best I can and in any case it will be over in a couple months", to just the manager rather than to the whole group at once.
Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

From all the posts; it seems clear - in my mind, at least - that I need to start writing UAT types tests. They have the least dependence on the current code.
Some questions:
Is Fitness beneficial in my case ? Has anyone used it successfully in a production project ?

Looks to me like you have two technical jobs: (1) normalizing the database, and (2) refactoring the Java code. You also seem to realize that the following is not part of your job: (9) improving the application.

You are correct on all counts. I've actually completed the DB normalization work (our DB schema is auto-generated from our model code). The next phase is to make sure the schema is correct and complete. I'm currently doing a review with my DBA.
I also wouldn't expect some written down business rules to be correct or complete in any sense, let alone as unambiguous as code..

I concur with you on that.

Some possible tasks I brainstormed on my own. Anyone can offer additional preliminary tasks to kick off a refactoring project ?
1) Identify all functionality to test. (Gleaned from code, Struts's Actions, Functional specs..)
2) Identify and explore some testing tools. I found some interesting list of test tools at this roller weblogger blog item
3) The most important things to note is that the core goals of the refactoring exercise to remove duplication. Need to look out for some software metrics that will identify dependencies.
4) Identify how to measure the completion of this refactoring exercise.
Q) Should there be one test server or should individual developers install a copy of the app locally ? I'm leaning towards a local install as it'll be easier to divide the work up. What do you think ?
P/S I'm a techie by nature. So forgive me if I'm focussing too much on the details. I hope to learn, albeit slowly, how to differentiate the forest from the trees.
Regards,
Pho
Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

I've been rethinking about the the glanularity of my UAT tests.
What level should the tests be running ?
- Should I just write the tests at the Action level (http://strutstestcase.sourceforge.net/) & Cactus i.e. in-container (web) code.
or
- Should I write them at the level of urls e.g. httpUnit ?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Pho Tek:
Is Fitness beneficial in my case ? Has anyone used it successfully in a production project ?

Yes, I currently use it for a Swing application I need to refactor heavily, and it has saved my a1s several times.


3) The most important things to note is that the core goals of the refactoring exercise to remove duplication. Need to look out for some software metrics that will identify dependencies.

Take a look at http://www.redhillconsulting.com.au/products/simian/

Q) Should there be one test server or should individual developers install a copy of the app locally ? I'm leaning towards a local install as it'll be easier to divide the work up. What do you think ?

Developers will need to run the tests very frequently and need to know wether it were their own changes or those of someone else which broke a test. So, yes, they should be able to run the tests in their own personal sandboxes.
BTW, what version control system do you plan using?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Warren Dew:
Probably somewhat less than one tenth the time to implement those revisions in an existing system - and having such a document to guide one would, I suspect, make the subsequent work go about twice as fast.

Well, if it's cheap to do and used as a guide instead of a definitive specification, I wouldn't disagree, I guess...

For example, it might be easier to say things like, "I know my performance has been lagging lately, but my father is dying of AIDS and I'm spending a lot of time at the hospital - I'll do the best I can and in any case it will be over in a couple months", to just the manager rather than to the whole group at once.

Ah, I fully agree, of course! I'd actually think that a manager shouldn't even want to know that much detail about a personal situation.
Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761


BTW, what version control system do you plan using?

We are using CVS right now.
Regards,
Pho
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Pho Tek:
We are using CVS right now.

Then you should be aware of the fact that CVS isn't very refactoring-friendly. Every time you move or rename a class, you will loose it's history information. Also merging changes can get near to impossibly in this case.
If you can, I would think about using a better version control system for this, for example subversion.
And if you can't do that, you should integrate very frequently and communicate the refactorings very openly. Well, that would be a good thing to do, anyway.
BTW, I always found that the best way to get good team communication is to put the whole team together in one room.
Oh, and another XP technique comes to mind: Pair Programming. That would also foster the creativity of your developers and make sure that they are acting in concert.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: [Refactoring project] How do I begin this ?