Meaningless Drivel is fun!*
The moose likes Testing and the fly likes Test Data Strategies Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » Testing
Bookmark "Test Data Strategies" Watch "Test Data Strategies" New topic
Author

Test Data Strategies

Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
I'm interested in hearing how others manage test data. Let me be more specific...

Given a set of test data, I want this data to be used both for populating the database as well as populating object instances. In the past I have created the test data in XML format, and then written a framework to parse the XML with digester, create object instances which could be retreived for unit testing, and provided facilities to have the data in the objects inserted and removed from the database. This has allowed a common set of data for testing the persistence layer as well as testing the other layers that didn't require database access for testing.

Now I'm aware that dbunit can be used to manage populating and removing data from the database, but I'm unaware of anything that can be used in conjuntion with it to populate objects. Given that, it's always seemed preferable to write my own code to handle the whole thing. I like the idea of having the data in an XML document because it is easier to reference when writing tests, as well as being easier to work with existing data without having to mess around with data hardcoded in an object somewhere.

Is there anyone who does something similar, or maybe has a better solution?
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Why do you want to instantiate a bunch of objects from the data file? What kind of tests are you talking about? If it's integration testing, wouldn't it make sense to let the persistence layer read those objects from the database?


Author of Test Driven (2007) and Effective Unit Testing (2013) [Blog] [HowToAskQuestionsOnJavaRanch]
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
Why do you want to instantiate a bunch of objects from the data file? What kind of tests are you talking about?

I'm talking about unit testing here. It's just convenient to have populated objects for some of the unit tests. I know typically people just hardcode these as part of the setUp(), but particularly when dealing with complex object graphs, this takes out some of the tedium and chance for making errors, as well as making the tests more compact. Having done it in the past, it's something that I've found to be generally helpful.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Jason Menard:
I know typically people just hardcode these as part of the setUp(), but particularly when dealing with complex object graphs, ...

Here's something to think about: (I'm not saying it's what you should do, just that you might benefit from considering what I'm describing)

When you have a unit test that deals with a complex object graph, what exactly are you testing -- what is the unit? If my setUp() method gets too complex, it's often because I'm trying to bite too big a piece. Instead, I've often found that it is better to split "the unit" more aggressively so that the object graph is not that complex enough. Ideally, there is no object graph anymore -- just the class under test and interfaces it depends on.

Instead of testing the objects that make up this graph, you might be better off testing one of those objects at a time, making sure it collaborates with its surroundings like you expect it to. I.e. verifying correct behavior versus verifying correct state. With simple object hierarchies a state-based test is by far more simple. When you move up in complexity, though, the benefits of behavior-based tests gain advantage.
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
Still, the complete object is needed in some cases. Example:



Here I might be testing PRAssembler's assembleForm() method. I need to pass in the PurchaseRequest the data will be coming from as well as the instance of PRForm the data needs to be set to.

Let's say that individual Products on my PR are filled individually. I have a business rule that states I can ship some items on the PR if 50% of the individual requests for products on the PR have been filled. Design aesthetics aside, if I have a seperate object that examines a PR to determine whether or not it may be shipped, again we're dealing with a case where we need an object graph that must exist in a particular state.

Now I simplified this for this example, but you get the idea. The object graphs I deal with can be more complex than this example which I just made up. Again though, the object graph is not what's under test. The object under test is another object that operates on the object graph.

I'm certainly not disagreeing with anything you're saying, I'm just trying to point out that certain circumstances require certain things. I guess the short answer though is that you haven't run across this yourself?
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Hmm. Would it help if you'd get rid of, say, arrays and would use logical entities (first level classes such as "com.foobar.ProductList") instead to represent collections?

I'd love to get a glimpse of your domain model (do you have one?). I'm pretty sure I've seen a codebase similar to what you're working on and I have to admit that back then I just bit my lip and didn't write unit tests around the parts that pushed back too hard.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Jason Menard:
Let's say that individual Products on my PR are filled individually. I have a business rule that states I can ship some items on the PR if 50% of the individual requests for products on the PR have been filled. Design aesthetics aside, if I have a seperate object that examines a PR to determine whether or not it may be shipped, again we're dealing with a case where we need an object graph that must exist in a particular state.

What if you forget about all those business rules and just assume they're implemented correctly. Would that make it easier to set up the object graph you need for testing the assembler (as an example)? It might be a helpful exercise to strip down the scenario until you have something that's easy enough to set up, take a step back, and look at what you've got -- whether it tests the component you've got thoroughly enough (assuming that you'll write separate unit tests for those other things such as the business rules).
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

Lasse, I agree with you in most cases about the Setup getting too big to make it into smaller chunks.

But sometimes the Data Graph has to be big, and there is no way around that.

We have really big Data Graphs (aka Hydrations), and if we had code to create the DTO and fill it with data, it would be too much code. I prefer SQL scripts to actually load the data, and use the DAO objects to access the data from the Database and return DTOs filled in with that data. We have Factories here already built that do this for us so it is only a few lines of code.

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
What if you forget about all those business rules and just assume they're implemented correctly. ...

That doesn't really work if for example that's the behavior you're testing. The object graph provides the data input needed to test the behavior of the unit under test. The object graph itself is not what's being tested. We just need to make sure that other objects which need to operate on that data behave correctly.

Again, not so much of a problem if your objects are simple, but when your object is relatively complex it is painful to recreate the data constantly. Then what typically would happen is you just create a test instance or mock of the object to input to the classes that require it. Not a big deal. That said, it's nice to use the same set of test data throughout the unit tests, whether it be data that needs to be inserted into the database, or data that is instantiated within some object. If you want to accomplish such a goal, one way to do it is generate objects dynamically from a set of XML data and allow facilities to insert and remove from the database as well.

Another example, assuming the same complex object graph. The guy working on the persistence end of things wants to test that his DAO is inserting the data correctly into the database. Naturally at some point you will need a populated object in order to make this happen.

Would that make it easier to set up the object graph you need for testing the assembler (as an example)? It might be a helpful exercise to strip down the scenario until you have something that's easy enough to set up, take a step back, and look at what you've got -- whether it tests the component you've got thoroughly enough (assuming that you'll write separate unit tests for those other things such as the business rules).

I know what you're saying here, and I certainly agree with the idea that keeping the object graph as simple as it needs to be is a good thing. That said, with a team of several developers working on a project, providing something like what I'm talking about seems to make things easier for everyone involved once it's set up. It's so easy to modify and add new test objects if you write up such a framework for your app, although it does take a bit of effort in the beginning.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Could you use DbUnit to dump the XML test data into an in-memory HSQLDB and read the object graph from HSQLDB using your DAO's? There are some significant issues with that approach, though, including having your tests depend on the DAO's working correctly which may or may not be a problem depending on your project's status. Then again, having a set of XSLT stylesheets to transform your XML test data into a format some tool (Commons Digester?) understands so that it can generate the object graph for you... Somehow that doesn't sound attractive either.
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Originally posted by Jason Menard:
The object graphs I deal with can be more complex than this example which I just made up. Again though, the object graph is not what's under test. The object under test is another object that operates on the object graph.


Is the object under test complex, too? Could it do things in seperate steps? Could those steps work on just parts of the object graph?

I encountered those complex object graphs in the past, too. What I've learned is that almost always with some creativity you *can* find a design that is more modular, and therefore easier to test, often also easier to understand and more flexible. It's not always easy to find that design, though...


The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
There are some significant issues with that approach, though, including having your tests depend on the DAO's working correctly which may or may not be a problem depending on your project's status.

You are absolutely correct about the DAOs. I really try to avoid these kinds of dependencies. I think this is something similar to what Mark said he was doing, but aside from not wanting to have these dependencies, it's not really feasible for projects where everybody is off working on different parts of the design and the DAOs may not even be finished yet. Nope, I avoid that like the plague.

Then again, having a set of XSLT stylesheets to transform your XML test data into a format some tool (Commons Digester?) understands so that it can generate the object graph for you... Somehow that doesn't sound attractive either.

I was thinking about using Digester to parse a dbUnit XML file to create the objects, but it's a bit of a pain to create complex objects that way. It wouldn't be bad where one table equals one object, but a bit of a headache for something with some complexity to it. Another alternative, which I'm not all that crazy about but would be a bit easier than generating objects from dbUnit XML, is to write something to take the object graphs and create an XML document suitable for dbUnit.
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
Warning, long post...

Is the object under test complex, too? Could it do things in seperate steps? Could those steps work on just parts of the object graph?

In many cases things are broken down to do just this, yes.

What I've learned is that almost always with some creativity you *can* find a design that is more modular, and therefore easier to test, often also easier to understand and more flexible.

Yeah, we certainly try and are generally pretty successful. Let me give you an example though where it would seem to me that even though you are doing this, when it comes down to it you still need the entire object.

The domain is kind of specialized so it may not all make sense, but let me try to illustrate a real-world example of one of these complex graphs. It's a product catalog for a specialized domain. This catalog is a physical entity and doesn't only exist in the ether, so there's a very specific format that it follows. The "products" are bought off of contracts from various subcontractors. A product, aside from various other attributes, has multiple contracts which can be in different states. Each contract has a plethora of (relatively) simple attributes, some of which are collections of other (relatively) simple attributes such as images to name one, but more importantly has contract line items. The line item describes something that can be purchased and contains attributes such as product description, a part object (while each line item may only have one part, a part may exist on multiple line items), a collection of price ranges describing pricing information (pricing changes based on quantity ordered), and some other info. The line items are what is being purchased.

For a concrete example illustrating the above, my product is a Widget. I have a few different contacts that can be used to purchase Widget related things. I might have one line item on the contract that allows me to purchase the Widget itself (which references a Part), another line item which allows me to purchase cables for the Widget (another Part referenced), and another line item that allows me to purchase tech support for the Widget (no part referenced). Each contract line item also has price range information giving me pricing based on the quantity I'm purchasing. So if I need two cables it might cost me $100/cable, but if I need 20 cables it might only cost me $90/cable. In other words, what you're buying is a line item off of a given contract.

Here's what the object graph (still simplified) might look like:

Product
-------
1..* contracts

Contract
--------
0..* images
1..* lineItems

Image
-----
bit[] imageData
String caption

LineItem
--------
String description
Part part
1..* priceRanges

PriceRange
----------
int startQty
int endQty
double price

So, let's say we want to test that our DAO can insert a Product into the database. Certainly we will break this out into its component parts and have code that simply inserts a price range for example. We will have code that just inserts a line item, but also needs to make sure that each of the line item's price ranges are also inserted. We work our way up the graph like this. At the end of the day though, in order to test that our product is completely inserted correctly, we're still going to need the entire object graph. I can be certain that I write code that inserts a contract and test this in isolation, but I also need to be sure that when I want to insert the whole product, I am also inserting all contracts associated with that product.

I've found that maintaining my test data in XML and generating objects off of it works pretty good for this, as well as writing code to insert the same data into the database and remove it as needed. I would have thought that others have run across this and come up with a similar solution, but that doesn't seem to be as much the case as I thought it would be.

It seems intuitive to me that given that dbUnit is fairly popular, that folks would be doing similar things with populating objects (granted populating the db is a bit more work), and following that assumption I had assumed that more people might have had the desire to use the same set of test data to populate both the db and objects. None of you guys spend a significant amount of time with test data? Maybe I'm putting too much emphasis on it, but having a good set of test data for these large projects always seemed like a pretty good idea to me.
Lasse Koskela
author
Sheriff

Joined: Jan 23, 2002
Posts: 11962
    
    5
Originally posted by Jason Menard:
So, let's say we want to test that our DAO can insert a Product into the database. Certainly we will break this out into its component parts and have code that simply inserts a price range for example. We will have code that just inserts a line item, but also needs to make sure that each of the line item's price ranges are also inserted. We work our way up the graph like this. At the end of the day though, in order to test that our product is completely inserted correctly, we're still going to need the entire object graph. I can be certain that I write code that inserts a contract and test this in isolation, but I also need to be sure that when I want to insert the whole product, I am also inserting all contracts associated with that product.

So you've got something like this, for example:

Couldn't you now test the insert(Product) method piece by piece as follows:

Is this at all relevant to what you have?
Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112
Thanks for the elaboration - I think I'm getting close to understanding what you try to do, and why...

I think we have similar situations with having to persists complex object graphs to xml, or having to serialize them for RMI etc.

As you, we are doing small graint tests for subparts, but still want to have integration tests to see wether the parts work well together (if I understand you correctly). What we typically do is, for example, building the object graph in Java, write it to XML, read it again, and compare the resulting object graph to the original one.
Jason Menard
Sheriff

Joined: Nov 09, 2000
Posts: 6450
Thanks for the comments guys!
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Test Data Strategies
 
Similar Threads
Non J2EE transactional management in Java
Does this ORM exist?
More Roundtrips or More Data (opinions plz)
Weblogic CMP Beans
Database seeding and continuous integration