I read on wikipedia that NTFS has LZ77 compression. Is it always on? Or is it usually off?
Further, while NTFS allows block sizes of up to 64k, I thought most installations used block sizes of about 8k. But some reading today seemed to suggest that a lot of people are using the 64k max. True?
These wacky questions are tied to the idea of having, say, 10000 text files ranging in size from 5k to 100k - averaging around 40k. The design in front of me says that each file is to be compressed to one file. And that these text files are getting an amazing 98% compression! And I'm thinking that it is quite likely that while the disk system is reporting that the file is smaller, the amount of disk space used is probably about the same.
I'm thinking I want to advocate putting all of the files into one zip file instead of the current approach. But I want to get my facts straight first.
Anybody have much knowledge about industry norms with NTFS compression or NTFS block size?
Compression is off by default, though IIRC, you can specify it to be on for a directory subtree (all the way up to the root, if desired). Compressed files/directories have their names display differently in the GUI, I believe as well. The Mac equivalent was to italicize, but I think NT just used an alternate font color if the standard preferences were in use.
There's 2 different space-saving mechanisms available here. One is sparse files, such as when you create a 10GB file and write 6 bytes at one end and 4 bytes at the other. NTFS won't allocate any of the intervening 9.999(?) GB until there's actually data for it. I think that feature is always on (I'd have to check the create file function defaults to be sure). The other is actual data compression (LZW or otherwise). Depending on the data, you may see huge space savings or larger files than they would be uncompressed (worst-case scenario).
For the ultimate in compression, a ZIP file is still better, since even if you create a compressed directory, the system overhead for the directory and its files is still more than for a single file containing a ZIP directory. Plus the compression copies when you copy the ZIP.
An IDE is no substitute for an Intelligent Developer.
I'm using XP too, and my block size is 4KB as well.
But I'm not sure you need to know what's the normal value unless you plan to put those text files on many different computers. Don't you just need to know the value for the computer you plan to put them on?
Or to put it another way, what would you do differently if you found that 20% of disks had an 8KB block size?
Originally posted by Marilyn de Queiroz: AIX has 512K blocks. It seems like you would need to know the block size on an NTFS system.
512K? Half a meg?
Marilyn de Queiroz
Joined: Jul 22, 2000
GPFS offers five block sizes for file systems: 16KB, 64KB, 256KB, 512KB, and 1024KB. You should choose the block size based on the application set that you plan to support:
* The 256KB block size is the default block size and normally is the best block size for file systems that contain large files accessed in large reads and writes. * The 16KB block size optimizes use of disk storage at the expense of large data transfers. * The 64KB block size offers a compromise. It makes more efficient use of disk space than 256KB while allowing faster I/O operations than 16KB. * The 512KB and 1024KB block size may be more efficient if data accesses are larger than 256KB. You may also consider using these block sizes if your RAID (Redundant Arrays of Independent Disks) hardware works optimally with either size. Reference
On the other hand, I read this The smallest file extension is 4Kb. If a user creates or extends a file anywhere from 0-4096 bytes, a 4K block will be allocated from the free list to accommodate that request.
So I guess I would have to say that it depends.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com
subject: file compression and hard disk block sizes