Question: Given a small subset of production data, can GenRocket extrapolate that data into a much larger set? And if so, is there a way to configure the rules on how to configure the data extrapolation?
Hey Henry. GenRocket's stance on extrapolating production data into test data is "Don't Do It". We've found that scrubbing production data to remove or mask sensitive data so that it is "safe" for test purposes is an anti-pattern. It usually isn't any quicker or easier than creating synthetic test data. Especially with GenRocket.
I guess that makes my intended follow up question (regarding data masking) moot...
However, I would like to know why it is an anti-pattern. Is it merely because GenRocket has really good tools, that makes extrapolating production data unnecessary? Or are there other issues that I should be concerned about?
When you're wanting test data or to generate test data, the first question that should be asked is what are you trying to test? Are you trying to do load testing, functional testing, integration testing, uniting testing, negative testing? It is important to know because each of these requires a specific type of data.
If you're load testing, then you don't really care what type of data you want to generate as long as it consists of the write data types, it has referential integrity and it is generated in massive quantities.
If you want to perform functional testing, then the data needs to look and feel like the real data that would be consumed by your application.
If you want to do integration, unit or negative testing, then you may want to condition the data so that you can test for specific results and you want to generate a unique set of data for each test.
In all of the above instances you want to control the data that your are creating. In all of the above instances you want to determine and generate the test data you want and you do that by generating synthetic test data.
None of the instance above should be using data from a production environment. No time should be spent or waisted in pruning production data. This is why we see scrubbing production data to remove or mask sensitive data so that it is "safe" for test purposes as an anti-pattern.
Scrubbing data is done in the absence of a good test data generation platform. GenRocket is designed to allow you to model and generate the specific test data you need to perform tests on the challenges you need to solve.
Thanks. I guess I had a unique situation. It was indeed load testing, but it only failed with one customer's data. So, it was really debugging with a specific (large amount) set of data.
In the end, it was important enough, and along with my promise that I would be the only one seeing the data (and will delete it right afterwards), I got the data. I did feel uncomfortable having it though -- especially, since it was not scrubbed.
Regardless, I am not even sure if extrapolating (or scrubbing) would have even helped here -- as at the time, we didn't know where in the mass amount of data was triggering the issue.