File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Testing and the fly likes How do you generate data with the right selectivity Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » Testing
Bookmark "How do you generate data with the right selectivity" Watch "How do you generate data with the right selectivity" New topic
Author

How do you generate data with the right selectivity

Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

I've frequently run into performance problems the selectivity of the test data doesn't match production. For example, let's say I have an employer, employee and phone number tables. An employer has one or more employees.. An employee has one of more phone numbers.

First thing that happens is whoever makes the test data might just put one phone number for every employee. So, the database thinks the employee id column on phone number table is highly selective. All your load tests runs great, but when you into production, your employees have 5 phone number each, and a spurious join with phone number table brings your performance down.

Second thing that happens is that you might make an assumption that each employer will have 10-1million employees.. And each employee might have 5 phone numbers. Not a bad assumption to make, but for some reason the client decides to put 1000 phone numbers for each employee. Again the selectivity of your column has changed that affects how the database performs.

So, does he rocket give a easy way to generate data like this? Like I want to say that for each employee generate 10 phone numbers. Then change it to 100/1000 phone numbers easily.
Gregg Bolinger
GenRocket Founder
Ranch Hand

Joined: Jul 11, 2001
Posts: 15299
    
    6

So, does he rocket give a easy way to generate data like this? Like I want to say that for each employee generate 10 phone numbers. Then change it to 100/1000 phone numbers easily.


Yes! The story you just described is very common and one that GenRocket was designed to solve. The solution in GenRocket is as easy as changing the Loop Count. When you create a Scenario in GenRocket, you decide how many of each Domain gets generated. Look at the following screenshots:



In this Scenario we've added a User and Address domain and also told GenRocket that Address is a Child of User. 10000 Users will be created and for each User, 100 Addresses will be generated.



The Loop Counts are editable. Click on them, put in the value you need, and download your scenario again. You can even reference Domain Attributes to dynamically change the Loop Count instead of hard coding the value.


GenRocket - Experts at Building Test Data
Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

Awesome! Is there a way to add some randomization to the loop counters?. Like, one thing I like to do is rather than have all every parent have the same number of child records, vary the number of children using a Gaussian randomizer.So, if do that in your example, I would do the address count with mean of 100 and std dev of let's say 30. So, the number of addresses will average to 100 addresses per user, and 95% of the users will have between 70 and 130 addresses. However, you will get this one user with 0 addresses.. or another user with 10K addresses. It's an easy way of putting edge cases in your performance test data
Hycel Taylor Iii
GenRocket Founder
Greenhorn

Joined: Feb 24, 2014
Posts: 10
Hey Jan,

We'll break down your problem domain in a Test Scenario using Generators to solve each task of your equation.



We want a counter so that we can control specific conditions of the equations; so we'll use the GenRocket RangeGen generator and start the count from 1.



We want to define our mean of 100; so we'll use the GenRocket ConstantGen.



We want to define our standard deviation randomly between 70 and 130; so we'll use the GenRocket RandomGen.



We want one user to have 0 addresses; so we'll use the GenRocket EvalGen, to return true on the first User.



We want one user to have 10,000 addresses; so we'll again use the GenRocket EvalGen, to return true on the second User.



For the rest of the users, 5% should have 100 addresses (mean) and 95% should have between 70 and 130 addresses (stdDev); so we'll use the GenRocket WeightGen.



Now we can use the GenRocket CaseActionGen to determine the number of addresses to generate for each user.
if (count == 1) then 0 is returned
if (count == 2) then 10000 is returned
else a count from the average user is returned (default)



Here's a data preview from the CaseActionGen:



If your remember, we made the Address Scenario Domain a child of the User Scenario Domain:



Thus, Address can now reference the User's numAddresses Attribute as it's loop count.



The summary tab will look a little different now because the number of addresses is only determined when the Scenario is running and on each iteration of the User Scenario Domain.


Jayesh A Lalwani
Bartender

Joined: Jan 17, 2008
Posts: 2271
    
  28

Cool!. That's way more flexibility than other tools I have seen

Personally, I like to use Gaussian probability while generating test data because Gaussian distribution is more "natural". Things in life usually fall on a bell curve, and using Gaussian probability makes you generate data that fall on a bell curve. For example, if you are randomly generating names, and you want to decide on the length of the name. You can randomly pick a length of between 0 and 1000. However, that's not realistic. In real data, you will generally have lot more names of 20 characters than names of 200 characters! Length of names usually follow a bell curve. Using a normal random generator will give you skewed data. That's why I use Gaussian probability for generating all random data.

However, I'm probably unique. I've never seen anyone be so nitpicky about their test data It might be a good thing for you to add. However, I wouldn't fault you for not adding something that only one person in this world uses. You already have a lot that will be useful to lots of people.
Hycel Taylor Iii
GenRocket Founder
Greenhorn

Joined: Feb 24, 2014
Posts: 10
Hey Jan,

Here at GenRocket, we believe that if one person wants a specific type of Generator, then it is likely that other persons will want to use that same Generator.

More than a few of our Generators have been created because of customer requests. We are currently up to 74 Generators and we believe that there is no limit to the number of Generators that can be thought up, created and added to the library of GenRocket Generators.

Thank you for your suggestion.
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 37890
    
  22
Does that mean the customers could create their own test generators and make them available to everybody else?
Hycel Taylor Iii
GenRocket Founder
Greenhorn

Joined: Feb 24, 2014
Posts: 10
Hey Campbell,

All GenRocket Generators are simple, short Groovy blocks of code. We originally gave customers the ability to create and add their own Generators, but our customers prefer that we write the Generators because we can create, test and deploy new Generators faster. We are working to improve the interface around having customers create their own custom Generators and Receivers. One of the items on our road map is to eventually allow customers to share or sale their custom Generators and Receivers to other GenRocket users.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: How do you generate data with the right selectivity
 
Similar Threads
More Roundtrips or More Data (opinions plz)
SQL injection?
SQL test answers
What can I do to pre-select something from database?
Frequency of used object in ArrayList