Practical DiskDiff in C#: Increased Accuracy

The code still has one problem. Your goal is to keep track of disk space usage, but the program is tracking the size of the files, not the disk space used by those files. The difference has to do with the way file systems work.

Each disk on your computer has what’s known as the cluster size, which is the unit of allo­cation for the files on that disk. When space for a file is allocated, a sufficient number of clusters is allocated to hold the contents of the file. This means if you have a disk with a cluster size of 4,096 bytes, a file with a single byte in it still occupies a lull 4,096 bytes of space.

For the purposes of this application, the effect of this can be considerable. If you have a file that has a thousand 300-byte files, the current implementation will total 300,000 bytes. Depending on the cluster size, however, the usage could be quite a bit more. The current cluster size on our system is 512 bytes, so the actual usage is 512,000 bytes. This cluster size is pretty small; a typical cluster size on an NTFS system is 4KB, which means you’d be using 4MB of space to store 300KB bytes.

It can get worse than this if you’re running the FAT16 file system. A FAT16 system can have only 64KB clusters on a disk, so if your disk size is 2GB, your cluster size is 32KB. In the previous example, this means you’d be using 32MB of space to store 300KB of file.

It’s therefore useful to factor the cluster size into determining the actual amount of space used by the files in a directory. The first thing to do is to figure out how to get the cluster size for a disk.

You can access the cluster size for a disk by using the GetDiskFreeSpace() function. This is a Win32 function, so you’ll need to use platform invoke to call it. It’s nicest if you encapsulate this function in a class, which is as follows:

public class ClusterSize {

private ClusterSize() {}

public static int GetClusterSize(string root) {

int sectorsPerCluster = 0;

int bytesPerSector = 0;

int numberOfFreeClusters = 0;

int totalNumberOfClusters = 0;

Console.WriteLine(“GetFreeSpace: {0}”, root);

bool result = GetDiskFreeSpace(

root,

ref sectorsPerCluster,

ref bytesPerSector,

ref numberOfFreeClusters,

ref totalNumberOfClusters);

return(sectorsPerCluster * bytesPerSector);

}

[DllImport(“kernel32.dll”, SetLastError=true)]

static extern bool GetDiskFreeSpace(

string rootPathName,

ref int sectorsPerCluster,

ref int bytesPerSector,

ref int numberOfFreeClusters,

ref int totalNumberOfClusters);

}

The declaration for the function is at the end of the class, and it follows the usual PInvoke format. The function gets the cluster size of a disk and returns the value.

This function works, but it has a few problems. The first one is that it’d be nicer to pass in a full directory rather than a disk name. The second one is that it’d be convenient for every directory to call and get this function, but you don’t want to call the function every time. In other words, you need to cache the value. You do this by keeping a static hash table that stores the cluster sizes for disks and then checking it calling the function. Add the following lines to the GetClusterSize() function:

string diskName = root.Substring(0, 1) + @”:\”;

object lookup;

lookup = sizeCache[diskName];

if (lookup != null)

return((int) lookup);

1. Switching to Use Cluster Size

Now that you have a way to get the cluster size for a disk, you can modify the main program to use this function. The code will support both the allocated and used sizes so you’ll have the option of (somehow) displaying both.

The first change is to the FileNode class. It will now store both sizes and determine their values in the constructor:

this.sizeUsed = file.Length;

long clusterSize = ClusterSize.GetClusterSize(file.FullName);

this.size = ((sizeUsed + clusterSize – 1) / clusterSize) * clusterSize;

A bit of explanation is probably in order. To figure out the allocated size of this file, you need to round the size to the next multiple of the cluster size. The first step is to determine the number of clusters, which you can do by adding one less than the cluster size to the size and then dividing it (an integer division) to get the number of clusters.

Whether this works is easy to determine by considering the boundary conditions. Assuming a cluster size of 512, a file that’s 1 byte long will occupy 512 bytes:

((1 + 511) / 512) * 512

Similarly, a file that’s 512 bytes will occupy 512 bytes:

((512 + 511) / 512) * 512

and a file of size 513 bytes will occupy 1024 bytes:

((513 + 511) / 512) * 512

Now that you’ve updated the FileNode object, you can also update the DirectoryNode class. You can add a SizeUsed property and add the UpdateTreeSizes() member to update both values when necessary. You can also take this opportunity to remove some of the code that tried to calculate these values during the file scan; it turns out to be more of a hassle than it was worth maintaining the code in both places.

Source: Gunnerson Eric, Wienholt Nick (2005), A Programmer’s Introduction to C# 2.0, Apress; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *