File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes IDEs, Version Control and other tools and the fly likes Version Control for binary files - any good system available? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » IDEs, Version Control and other tools
Bookmark "Version Control for binary files - any good system available?" Watch "Version Control for binary files - any good system available?" New topic
Author

Version Control for binary files - any good system available?

Timo Patuelli
Greenhorn

Joined: Apr 08, 2009
Posts: 6
CVS and SVN are not intended for binary files. Storing such files in a file system with appropriate directory structure leads to other sort of problems.

Could you suggest any version control system, that could reasonably deal thousands of binary files like DOC, ODT, XLS, PDF, JPG, PNG, ZIP, JAR?
Mourouganandame Arunachalam
Ranch Hand

Joined: Oct 29, 2008
Posts: 396
You can try Mercurial.

Get more info @ http://en.wikipedia.org/wiki/Mercurial_(software)


Mourougan
Open Source leads to Open Mind
Angus Edison
Greenhorn

Joined: Apr 08, 2009
Posts: 3
I think SVN is good enough. We are using SVN to store and share more then 5000 word file and more then 10000 PDF files. It works perfect. Just for your Information.
Tiho Beretous
Greenhorn

Joined: May 28, 2003
Posts: 8
I'll chime in with support for SVN in this respect. It works fine for us - we put all of our help files in it (and associated images) and it works quite well.
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 16305
    
  21

SVN allegedy has the ability to efficiently difference binary files, reducing the amount of disk storage space required. Many VCS's allow custom differencers to be plugged in, however.

I've stored binaries in SVN and CVS for years without any problems. The only "gotcha" is that you do have to make sure that the binary is stored AS a binary, since VCS's these days are fond of altering the end-of-line characters in what they think are text files. In other words, the same sort of trouble you'd get when trying to download a ZIP file from a Windows FTP host if you forgot to tell the server that you wanted binary file transfer.


Customer surveys are for companies who didn't pay proper attention to begin with.
Timo Patuelli
Greenhorn

Joined: Apr 08, 2009
Posts: 6
to Tim Holloway:
"I've stored binaries in SVN and CVS for years without any problems"

May be you have used small files. Would you use 20 to 30 to MB binaries, you would notice horrible problems. CVS loads all the versions of such binary file as a single piece. Monthly we have had about 20 versions (on a daily basis), each about 30 MB. Request from a single developer led to 600 MB RAM occupied at once. When several developers accessed the CVS repository, the server was incredibly slow! In one year the size of all versions for a single file could be as much as 4-5 GB. Disk was not a problem, but RAM...

We couldn't work without versioning of that files. Using file system for version control was not an option. We decided to delete some versions from repository on a regular basis, according to some rule connected with the build process. Yes, we had really deleted them in the file system of CVS server, regularly. Not a graceful solution. But the others were even worse.

SVN is a bit better, but also far not perfect.


Now for a new somewhat similar large project I'm looking for a smart and rapid version control system. It can be commercial. But should meet our needs. So far I've found no suitable system.
Timo Patuelli
Greenhorn

Joined: Apr 08, 2009
Posts: 6
Mourouganandame Arunachalam wrote:You can try Mercurial.

Distributed system is not an option. The core "con" is that it leads to enormous network trafic, whereas many developers don't need that files.
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 16305
    
  21

Actually, I frequently archive production modules of that size. Maybe my network's just better tuned, but I can live with the time it takes.

However, I exclude files of any size from Version Control unless there's a reasonable expectation I'm going to retrieve them. Work files and directories are on the projects ignore list.

It sounds like what you want is a client smart enough to difference binaries BEFORE they're sent to the server, and I don't know of any. If you're talking JAR/WAR/EAR files or compiled code, there's an extra challenge. Today's optimizing compilers are smart enough to make significant global changes to the resulting code based on fairly minor alterations to source code. Including moving whole blocks of code in and out of line, hoisting loop code and much more. In simpler times, you could use a tool like IBM's ZAP utility and modify a few bytes and the difference file would be miniscule. No longer. Not unless you're coding in assembler or have one really dumb compiler like those in common use back when zapping was standard procedure.

I can think of 2 things that might alleviate your issues, assuming that you can't simple add the offending files to cvsignore or its equivalent:

1. Limit people to nightly checkins. Personally I'm not good at that, but allegedly some teams are.

2. Teach people to do partial checkins. This is a bit risky, since it's easy to do an incomplete checkin when a complete one is needed and vice versa. A variation on this is to target your builds so that the build results go into a separate directory and that directory is only checked in at end-of-day.

I actually did something like option #2 at a previous employer. All production modules had to be placed in CVS for the operations staff to retrieve. For their convenience (and ours) we had a special project that held the deployables. In order to avoid having everything in one large directory, we made subdirectories for the various products based on their toplevel package qualfiers (for Java projects) or their equivalents (for non-Java projects).
Angus Edison
Greenhorn

Joined: Apr 08, 2009
Posts: 3
Forgot to mention there is one disadvantage of using Subversion to host binary files. You need to have a working copy (your files) + .svn folder contains mirror of your working copy. The advantage is you can do some operation in location (e.g. check status and revert file). The disadvantage is you double your size in local working copy.

Although the hard disk price is very low today, keeping and copying such large folder is not easy. I vote for Subversion to provide "NO" mirror of local copy.
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61766
    
  67

"Timo Timo", please check your private messages for an important administrative matter.


[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
 
subject: Version Control for binary files - any good system available?