aspose file tools*
The moose likes Developer Certification (SCJD/OCMJD) and the fly likes Unit testing class that deals with file I/O Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Certification » Developer Certification (SCJD/OCMJD)
Bookmark "Unit testing class that deals with file I/O" Watch "Unit testing class that deals with file I/O" New topic
Author

Unit testing class that deals with file I/O

Sean Keane
Ranch Hand

Joined: Nov 03, 2010
Posts: 581

I'm guessing that most people create a single class that deals with I/O. How do you go about Unit testing this. In particular testing the methods that write to a file?

I have a single class called DataManager that handles all file I/O. So it has methods like this:
Now, when it comes to Unit testing the methods that read from a file, this is nice enough. I just create input files in a resouces folder, then load these in from the class path, and pass them into the methods of my DataManager class. So I will have something like this:

But when I come to Unit testing my methods that write to the data file, i.e. writeRecords, I end up with something quite messy. I need to write to a file and then read back to ensure it worked correctly.

What I am toying with is the idea of removing the dependency between the DataManager class and the file system. But I'm wondering what this will actually give me? And if other people have done this? Here is a thread here that covers this same problem.

One of the ideas mentioned in the thread are to code to an InputStream rather than a concrete file or path - in effect using injection. Now at the moment I am using a RandomFileAccess in my DataManager class to interact with the file. So I could change my DataManager class to coded to the interfaces DataOutput and DataInput.

But what does this actually give me? It just pushes the creation of the RandomFileAccess object up to my Data class and up to all my Unit test code too (my Data class is the only class that interacts with the DataManager class). So whilst it will remove the dependency on concrete files from the DataManager, it will make higher up layers harder to test and more verbose. It just seems like I'm pushing the problem up a layer. It also feels like I'm removing commonality that was isolated to the DataManager i.e. opening\closing files, to now have this commonality all over my Data class where I will have to open and close the file in many methods.

Another idea suggested in that page is to code everything to an interface. So create an interface for the file system itself, which can then be mocked up in testing. This seems like a bit of an overkill though and adding additional complexity to the design purely for the purposes of testing.

What do you guys think? How did you implement your class that reads\writes records to the data file? How did you Unit test this class?

SCJP (1.4 | 5.0), OCJP (6.0), OCMJD
Roel De Nijs
Bartender

Joined: Jul 19, 2004
Posts: 5286
    
  13

First of all having a class with nothing but static methods is not really OO, but that's a complete other discussion.

I only have the Data class, but that makes no difference for testing the file I/O methods. And to be honest completely unit testing the Data class consumed in the end a lot of time, because I had to change my little testing framework a few times. But in the end I was able to test all possible situations for every method, so I was really happy with the little framework I created.

Because I want to start for every test with the same file (so you can exactly know what should happen and you are not dependent of the order in which your tests are executed), I made a (temporary) copy for every test of the existing database file. And then I just executed the method(s) needed for that particular test.

So to test the "write all records back to file"-method I have following test:


SCJA, SCJP (1.4 | 5.0 | 6.0), SCJD
http://www.javaroe.be/
Sean Keane
Ranch Hand

Joined: Nov 03, 2010
Posts: 581

Thanks Roel. By the sounds of it you have the same set up as me in that the class that reads and writes to the file has a direct dependency on the file system.

The methods in my class require a path to the file, you probably have a reference to the file in your Data class. But the end result is the same. Both solutions have a direct dependency on the file system. That is why you have to work with an actual file when testing your Data class - precisely the same as me having to work with an actual file when testing my DataManager class.

I'd be interested if anyone has implemented this differently using some of the ways I mentioned in my previous post. For example, making your class that reads\writes to the file depend on interfaces instead and then were able to unit test their Data or DataManager class without the need for an actual file.

That is the main point I am concentrating on here - unit testing your Data or DataManager class without the need for an actual file. Of course at some point in your code you will have to write to the file.

I didn't mind my current solution so much when testing methods that read from the file. Because I just created a resources folder within which to store files that are read in during unit test from the classpath so this is quite nice. I have a set-up like:

But when it comes to testing the methods in my DataManager class that write records I have to actually write to a file. This means I will have to create a temporary file somewhere outside of my tests folder (as it's never a good idea to create or modify files from your source folders).

This just leaves me with creating the file is some temporary location - for example using createTempFile. I don't think this is a nice solution though!

Roel, where did you create the file that was modified when testing your methods that wrote records?
Sean Keane
Ranch Hand

Joined: Nov 03, 2010
Posts: 581

There is some good information for dealing with file I/O and unit testing here. In particular the reply that has the following options for dealing with file I/O when it comes to Unit testing seems like a good summary:

  • Option 1: Live with it.
  • Option 2: Create a slight abstraction where required.
  • Option 3: Wrap the whole file system.

  • I started playing about with this myself. So here is an example where the class has a direct dependency on file system (the fact that I'm using static methods here is of no odds - same problem when the class has state and you store the file reference as state):

    Now when it comes to Unit testing this code, there is no way to Unit test it other than to create a file. To get around this I can abstract away the file access by coding to interfaces. Here's an example:

    Now when using this class in the code for my assignment I would use it as follows:

    And when Unit testing, I can now mock up the file by using ByteArrayOutputStream and ByteArrayInputStream. For example:
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    For completeness: I don't have a reference to my file, just storing the the database file path in the Data class.

    Sean Keane wrote:Roel, where did you create the file that was modified when testing your methods that wrote records?

    For every test a copy of the database file is created in a given directory. Each file will be deleted on exit (which could be turned off of course if you want to expect the file if the test fails).

    Why would creating temporary files (with createTempFile) not be a nice solution?
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    I'd rather avoid writing to a file during my Unit test if possible. It's a nicer approach. I'm just looking into how much I'll need to modify my code to make this possible. So wondering if anyone else has taken this approach - but it seems none so far

    I'd like to be able to set up my DataManager class so that it just relies on a stream and then to pass an instance of my DataManager into my Data class. But I'm hitting a problem! A stream is just for reading or writing, not for both. So I don't see how I can use streams.

    Ideally I want to be able to do this:
    The whole idea of this approach is that:

    1. The Data class has no dependency on the file system - so I can Unit test it without reading\writing to files.
    2. The DataManager class has no dependency on the file system - so I can Unit test it without reading\writing to files.

    But the DataManager needs to be able to read and write to the stream. So I don't see how I can setup the DataManager to do this before injecting into my Data class.

    I can set up the DataManager to rely on the file system as follows:Then in the DataManager class I can construct streams within each method but this has me back to the DataManager having a dependency on the file system.

    The only way I can see this working for me, by passing in streams, is if I pass the stream into the actual method of the DataManager class. So if the method is writing records, then it should get an output stream, otherwise get an input stream. But then I am back to a utility class with static methods and I can't inject this into my Data class.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    If I go with the approach of passing the database file path into the constructor of my DataManager class then my Data class will have an indirect dependency on the file system through the DataManager reference.

    However, when Unit testing my Data class I can get around this indirect dependency by creating an interface, IDataManager, for my DataManager class. So make my Data class refer to the interface. Then during testing I can create another implementation of IDataManager where it doesn't rely on the file system.

    But this only solves the problem for the Data class. I'd still like to be able to remove the dependency that my DataManager class has on the file system so that I can Unit test it without writing to files.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    I guess another approach if I decide to pass the data file path into my DataManager would be to create two package protected methods in the DataManager class that retrieve the streams for input and out. All access to intput\output streams within the DataManager class will use the package protected methods.

    Then when it comes to Unit testing I can simply subclass the DataManager and over-ride the package protected methods to ignore the path to the data file and to instead use streams I have set up in testing.
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    Having an interface like IDataManager to decouple your Data class from the used implementation of your datastore (file, byte[], rdbms,...) is from a design perspective always an excellent idea You'll program against an interface and your classes will be loosely coupled. That could have been another improvement I could have made to my solution. On the downside you don't want to end with 100+ classes and interfaces for such a little assignment, because that would make it hard to get a grasp on things.

    And why are you not eager to read/write to a file when you make your unit tests? I notice this quote in your initial post:
    But when I come to Unit testing my methods that write to the data file, i.e. writeRecords, I end up with something quite messy. I need to write to a file and then read back to ensure it worked correctly.

    When you would use for example a byte[] stream, you also need to read back the byte[] to ensure it worked correctly.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Yeah, I agree, I don't want to pollute a simple project\design with lots of complexity just for the sake of Unit testing. I've come across very small projects myself where people have coded almost everything to an interface, and I'd be left scratching my head for a while trying to figure out why...only to realise they did it just to substitute the implementation for Unit testing.

    I'm still playing around with the idea, trying to come up with a way that I can change my design to Unit test it without writing to files, but also keep the design nice and simple. I'm not making much progress though!

    On why I don't want to write to files. Just as a general rule I try to avoid writing to file in my Unit tests where possible or at least keep it to a minimum. You can end up with a nicer design at times when you follow this approach because you remove the dependency from the file system from a lot of your code and just have the code that depends on the file system in one particular class that is doing nothing more than writing to disk (so very little to Unit test there). Also you don't have to worry about files access restrictions etc. if someone was to take your code and run it on another machine.

    My problem with current approach is I have a DataManager that takes a file name as it needs to both read and write to the file. Ideally I'd like to replace the file name with a stream. But this is not possible as a stream is one way communication - whereas the DataManager needs two way.

    So of the two approaches I have identified (1) Using streams (2) Using interfaces. The first does not seem workable and the second will only remove the dependency from the file system from my Data class but not my DataManager class...grrr.

    There was a third option to have package protected methods in my DataManager class that will allow me to set\get the stream, and I could then override these when testing to ignore the actual file that the DataManager receives as an argument. But this feels like I'm leaving a "hole" in my design purely for testing.
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    It seems you are putting in a lot of time just for testing the writeRecords-method, because that's the only method writing back to file.

    Will it be worth the effort and time? Everybody is rushing to submit before the mandatory course deadline and you seem to have lots and lots of time And thanks to all the R&D you already did, you'll gain a lot of (extra) knowledge thanks to this certification.

    Good luck! (seems you need a bit to get this solved adequately )
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    In terms of this certification, no it probably will not be worth it (like many of my posts hehehe!)

    If I modify my design\code in a way that makes the design harder to understand then it could actually negatively impact me! Which is why I'm trying to come up with a nice solution that looks good even if you never knew I did it to accommodate Unit testing. Because the assessors won't have my Unit tests and I'm not sure if this is really something I could document in my choices.txt i.e. that I made a design decision in order to facilitate Unit testing.

    But in terms of learning I think it's a good exercise to come up with a solution. This is something I'll likely encounter in the future. So if I come up with a solution now, then it will be useful to have in the back of my head when designing something else in the future i.e. I will have a "design pattern" in mind when designing my solution to a future problem rather than trying to refactor when I come to Unit testing my solution.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Failing to come up with a good solution quick enough for avoiding writing to files when testing my DataManager and Data class I have thrown together something to ensure any files I write to are in my build folder rather than anywhere in my source or test folders. The code is below, may be of use to others.

    I am working in Eclipse and the structure of my project is below. There are three source folders in my Eclipse project (1) src (2) tests\java (3) tests\resources. So you will see in my Java code below I find my build folder by assuming it is on the same level as my src folder.



    You can see how I use this class from the main method in the code below. My basic set up when testing methods that write to the data file is to:

    1) Find a template data file to use from the classpath.
    2) Create my temporary data file that will contain the contents of the template data file. This is the file that will be written to in my Unit test.

    I get my template file form the classpath as I store all my input files for my Unit tests in a resources folder which is on the classpath. That is this location from my folder structure above:

    So this template may contain zero records, many records, etc. Depending on the state I want the data file in before I run my tests. So for example, if I want to test that everything works correctly when writing records back to an empty data file, then I will create a template file that only contains the schema information and store this in my resources folder.

    Once I have my template I then create my temp data file. The code below creates this temp date file for me and returns the path to the file. It creates it in the build folder of my project.

    I also pass in a base path. The idea of the base path is that this folder structure will be created within the build folder and then my temp data file will be created within this. So for example. If my base path is "a\b\c" and my template file is called "db-1x1.db", then my temp file will be created in "build\a\b\c\db-1x1.db".


    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    Why not simply make a seperate test-project? That's what I did
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    I don't see what having a separate Eclipse test project would give me - it's same problem, just a different project, no?

    If I am writing to files in my unit tests then I want the files to be in somewhere like the build folder, not in the tests folder. In general, it's not good practice to write to project folders. For example, many source control systems will see these as new files that need to be added to source control.
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    Sean Keane wrote:I don't see what having a separate Eclipse test project would give me - it's same problem, just a different project, no?

    Because you'll only package your scjd-project, you won't risk having files of your test project being packaged in your submission jar.

    Sean Keane wrote:If I am writing to files in my unit tests then I want the files to be in somewhere like the build folder, not in the tests folder. In general, it's not good practice to write to project folders. For example, many source control systems will see these as new files that need to be added to source control.

    I deleted the files on exit (of the jvm the tests are running in), so my source control system never noticed these files. And if the files are noticed by the source control system (for any reason), you can always override and update them. Seems to me an easier approach than the class you created.
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    My project set up is quite a commonly used one (src, test, resources, build, dist folders etc.). If you are familiar with Maven you will notice this similarity. There is zero risk of my test code getting package up with my distribution (assignment). So making my test code a separate project makes no sense - there's no benefit for me.

    I'm not sure what you see difficult about my approach? My approach is very easy - generate the files into the build folder. It couldn't be simpler.

    Of course if you delete the files they won't be noticed by an SCM system. But if a test fails, I don't want the files deleted, I want them left there where I can inspect them. In such a scenario I don't want them in my project folders. It's not good practice to generate files into project folders.

    It makes no odds for this tiny project of course. But if you were to start generating files into source controlled folders in a development team, you may have a lot of annoyed developers .
    Roel De Nijs
    Bartender

    Joined: Jul 19, 2004
    Posts: 5286
        
      13

    Sean Keane wrote:I'm not sure what you see difficult about my approach?

    Maybe "difficult" was not the right word here. You just have the added flexibility (and complexity) of being able to give location where each temp file will be created. So for each temp file you have to provide the base path (which oculd of course simply be bypassed by having a 1-parameter version of the createFile-method).
    As a minor remark: why not use the FileChannels for copying your files?

    Sean Keane wrote:But if you were to start generating files into source controlled folders in a development team, you may have a lot of annoyed developers .

    Let's hope someone from the team is not to rollback the checkin of these generated files
    Sean Keane
    Ranch Hand

    Joined: Nov 03, 2010
    Posts: 581

    Roel De Nijs wrote:Maybe "difficult" was not the right word here. You just have the added flexibility (and complexity) of being able to give location where each temp file will be created.

    The base path is just really to follow my convention. You could just take the template and append a random string to the file name and generate into the build folder without any folder structure, or you could use some other means to create the folder structure.

    My convention I am talking about is how I store files read in during unit tests from my resource folder. Say I have a class suncertify.db.DataManager and a method called readRecords in this class. I will store my input files for testing this method in a package called suncertify.db.datamanager.readrecords - i.e. <package-name>.<class-name>.<method-name>.

    So I follow this same convention when generating my files that I write to. That is the only reason I have the base path as a parameter.
    Roel De Nijs wrote:As a minor remark: why not use the FileChannels for copying your files?

    Good idea!!!! I just hacked this together as quickly as possible. I knew there was a way of copying files without resorting to bytes, but couldn't remember what it was off the top of my head. Another thing learned\relearned as part of this project . Thanks.
    Phil Crow
    Greenhorn

    Joined: May 16, 2011
    Posts: 2

    Let's hope someone from the team is not "to tired" to rollback the checkin of these generated files


    I always like it when someone clearly articulates why to follow a convention or to use a pattern.

    I've been baffled by unit testing with IO for a long time, and still after this discussion but it's a big help to see someone else trying to get the testing to be a discrete unit as much as I want it to be.
     
    Don't get me started about those stupid light bulbs.
     
    subject: Unit testing class that deals with file I/O