I've been using the following problem to try out different ways of constructing Java code. I get data from two sources, but in the same format, which is a directory of csv files. I want to see if the sources are producing the same results. So I need to compare data in the files. In one type of file, I use a key to find the right line and a column id to get the right data. In another, there is no key and I need to do a line by line comparison of a column, possible skipping a header, and there are other cases. I may want to run some or all cases, so I made an interface with the method compareData. That way any analysis methods don't need to worry about specifics. For each type of comparison I made an abstract class. My concrete classes provide the information about specific file names, key names and so on.
This exercise got me wondering about a few things.
1. I like the idea of using the same functions loadData() and testData() in all the implementations of the interface. But they are not public. So rather than expose them, would it make sense to have an extra layer where I have another abstract class that defines these as abstract methods which is then extended for the individual types (see SECOND TRY below)?
2. In the methods loadData and testData, I just use the global variables I set when I create the concrete class. I don't pass an argument list. That allows me to have one abstract class that would be extended by all the test classes (I just show one example). I don't even have a return value for load data for that reason (I might be loading numbers or arrays). Is that reasonable or sloppy?
3. I am a little worried about the parameters. I think I am set up well, if for example, the name of a key changes. I am willing to believe someday someone will decide to slip in another parameter and that would require changing a lot of code. I can create a class for the parameters, but it will be different for each basic type of test. I'm not seeing what that gets me in terms of maintainability. I had a look at generics, but didn't come up with any ideas. It seems that loadData() contains the bulk of the variability. I'm not quite seeing how to decouple it from the classes it will be in. Once I make MultivalueTest and ListTest, etc., each will implement their own loadData() because what they look for and what they load are different.
Any comments about my concerns or suggestions for the code are welcome. Thanks in advance.
and these bits so it all works
What is missing in this design is some proper abstractions and encapsulation of some behaviors.
The design can be broken down into the interfaces and classes as shown this class diagram.
Here are few points to back this design.
Data - What if you need to compare datasets got from webservice or 2 files instead of directories? Abstract the data.
DataLoader - Different strategy to load the data in future? Encapsulate data loading concern. Single responsibility. Same applies to DataTester.
DataLoaderImpl - Constructor takes config params. May be using injector. You will get rid of changing parameters by this. The only change would be in this DataLoaderImpl class.
ValueComparator - Uses Loader and Tester - configured in the constructor - and performs comparison logic. If you feel, it may also be good idea to move comparison logic to another abstraction and encapsulate it(if it can vary). In that case, you will have one class which only works as a facade to provide high level functionality of comparing two directories.
From your Main, construct all dependencies required by ValuComparator or use IOC container to manage wirings.
I appreciate the UML diagram, I find these help clarify the concepts for me.
Unfortunately I am having a real problem putting things into practice, so I think I am still not quite getting the concepts.
Looking at what you pointed me to, I think I might want to change a few things. At the core of my thinking is that I would like to have a consistent way to create a lot of tests structured so when someone else adds more tests later, it won't be too much work for them and will be consistent with the other tests.
So at some point, I thought I would be doing something like this:
what tests need to be run at any given time will vary.
Beyond this, I have this fuzzy concept in my mind that I will call 'Data.' There are certain operations that I need to support with regards to Data:
I see that rather than make these methods of a Data class, you have encapsulated them as separate classes to accommodate future changes. When I have looked up examples of implementing this strategy, they always have the form of the data constant. For example, data might always be a String. That is not true in my case. One of the key points of variability is the form that the data takes. It is much less likely that the data will come from a source other than a file and much more likely that the business logic for extracting the data from the file and precisely what is extracted will change.
A simplified example would be that one test might be comparing Strings and another test might be comparing Doubles. This is where my understanding falters. I end up making things like this:
Since DataLoader needs to return the data that it loads, it needs to return an instance of Data
Am I understanding what you are trying to tell me?