This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes General Computing and the fly likes file compression and hard disk block sizes Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Engineering » General Computing
Bookmark "file compression and hard disk block sizes" Watch "file compression and hard disk block sizes" New topic

file compression and hard disk block sizes

paul wheaton

Joined: Dec 14, 1998
Posts: 20488

I read on wikipedia that NTFS has LZ77 compression. Is it always on? Or is it usually off?

Further, while NTFS allows block sizes of up to 64k, I thought most installations used block sizes of about 8k. But some reading today seemed to suggest that a lot of people are using the 64k max. True?

These wacky questions are tied to the idea of having, say, 10000 text files ranging in size from 5k to 100k - averaging around 40k. The design in front of me says that each file is to be compressed to one file. And that these text files are getting an amazing 98% compression! And I'm thinking that it is quite likely that while the disk system is reporting that the file is smaller, the amount of disk space used is probably about the same.

I'm thinking I want to advocate putting all of the files into one zip file instead of the current approach. But I want to get my facts straight first.

Anybody have much knowledge about industry norms with NTFS compression or NTFS block size?

permaculture Wood Burning Stoves 2.0 - 4-DVD set
Balasubramanian Chandrasekaran
Ranch Hand

Joined: Nov 28, 2007
Posts: 215

Sorry i am not a expert in this area but,i think this link will help you get your answers.
Tim Holloway
Saloon Keeper

Joined: Jun 25, 2001
Posts: 15950

Compression is off by default, though IIRC, you can specify it to be on for a directory subtree (all the way up to the root, if desired). Compressed files/directories have their names display differently in the GUI, I believe as well. The Mac equivalent was to italicize, but I think NT just used an alternate font color if the standard preferences were in use.

There's 2 different space-saving mechanisms available here. One is sparse files, such as when you create a 10GB file and write 6 bytes at one end and 4 bytes at the other. NTFS won't allocate any of the intervening 9.999(?) GB until there's actually data for it. I think that feature is always on (I'd have to check the create file function defaults to be sure). The other is actual data compression (LZW or otherwise). Depending on the data, you may see huge space savings or larger files than they would be uncompressed (worst-case scenario).

For the ultimate in compression, a ZIP file is still better, since even if you create a compressed directory, the system overhead for the directory and its files is still more than for a single file containing a ZIP directory. Plus the compression copies when you copy the ZIP.

Customer surveys are for companies who didn't pay proper attention to begin with.
paul wheaton

Joined: Dec 14, 1998
Posts: 20488

Balasubramanian, thanks for the link. I've read a dozen pages like that one. The big question right now is: what's the norm?

Tim, you had me at "off by default".

When you advocate zip, I take it you advocate many files in one zip, not one file per zip?

And Tim: I love that sig!

... I'm currently running XP, and when I look at the properties on a tiny file, it shows "size" and "size on disk". I take it that the difference has to do with the block size. On my machine it is 4k.

Anybody care to share what their block size is?
Paul Clapham

Joined: Oct 14, 2005
Posts: 18541

I'm using XP too, and my block size is 4KB as well.

But I'm not sure you need to know what's the normal value unless you plan to put those text files on many different computers. Don't you just need to know the value for the computer you plan to put them on?

Or to put it another way, what would you do differently if you found that 20% of disks had an 8KB block size?
Marilyn de Queiroz

Joined: Jul 22, 2000
Posts: 9044
AIX has 512K blocks. It seems like you would need to know the block size on an NTFS system.

"Yesterday is history, tomorrow is a mystery, and today is a gift; that's why they call it the present." Eleanor Roosevelt
paul wheaton

Joined: Dec 14, 1998
Posts: 20488

Originally posted by Marilyn de Queiroz:
AIX has 512K blocks. It seems like you would need to know the block size on an NTFS system.

512K? Half a meg?
Marilyn de Queiroz

Joined: Jul 22, 2000
Posts: 9044
GPFS offers five block sizes for file systems: 16KB, 64KB, 256KB, 512KB, and 1024KB. You should choose the block size based on the application set that you plan to support:

* The 256KB block size is the default block size and normally is the best block size for file systems that contain large files accessed in large reads and writes.
* The 16KB block size optimizes use of disk storage at the expense of large data transfers.
* The 64KB block size offers a compromise. It makes more efficient use of disk space than 256KB while allowing faster I/O operations than 16KB.
* The 512KB and 1024KB block size may be more efficient if data accesses are larger than 256KB. You may also consider using these block sizes if your RAID (Redundant Arrays of Independent Disks) hardware works optimally with either size.

On the other hand, I read this
The smallest file extension is 4Kb. If a user creates or extends a file anywhere from 0-4096 bytes, a 4K block will be allocated from the free list to accommodate that request.

So I guess I would have to say that it depends.
I agree. Here's the link:
subject: file compression and hard disk block sizes
Similar Threads
Pronounciation of "Buzz Words"
Uploading image and thumbnail
Sample for a servlet handling file upload?
Download of files in Tomcat 5.5.9 with compression