How to save time in processing huge disk drives

May 19, 2015 (339)

in Scanning, Tips and Tricks, Usage notes

If you process a huge drive with any software like ShowSize, it’s going to be a long wait. In order to calculate the folder sizes, the software has to go through the list of all the folders, their sub folders and files in all those sub folders, deep down in the folder tree. That is the only way to calculate the total size taken up by each top level folder. There is no shorter method to find the folder sizes.

Moreover, the scanning time is affected by Windows cache in memory!

If you have been using ShowSize for some time, you might have noticed that sometimes a disk is processed very fast and at other times, it is too slow. This might give you a wrong impression of the performance of the software whereas it has nothing to do with the software. What happens is that when you scan a disk for the first time, it takes much longer to finish because Windows has not cached any disk directory data. You will notice that a second scan is several times faster. This is so because during the first scan, Windows has already cached some directory data in memory and is able to reuse it. This cache is built up not just by ShowSize scanning activity but by any other software that uses the directory, for example, a file manager like Windows Explorer. Hence, a second scan often finds lot of directory data in cache and doesn’t have to fetch it from the disk each time. Hence, subsequent scans will be much faster. But if some time passes by and the disk is used by other type of software, it may happen that the directory data in cache is replaced by other type of data. If you then scan the disk again with ShowSize, you will again see a slow scanning time. So the scanning time really depends on how often the directory data is scanned and whether the data is in Windows cache.

To give you an idea, my laptop on which I’m writing this article, has about 300,000 files and 40,000 folders on its main drive that sometimes takes just 23 seconds to scan and sometimes takes as long as over 4 minutes. Such wide variations are the result of Windows caching the scan data.

Is Cache used for Network Shared drives too?

In my tests, I could not find any improvements in the scanning time on subsequent scans when scanning a network shared drive. This leads me to believe that Windows cache is not really used in the same manner for network shared drives. If someone knows the answer, please provide feedback here. Moreover, when I used ShowSize in a VM to scan VMWare shared folder with the host, the performance was terrible. It might be some kind of bug in VM code too.

What do we learn from the above discussion?

When benchmarking (comparing) programs like ShowSize, do not compare the times of just a single scan. Do several scans at different times to find out which software is faster. Otherwise, you may be misled by the skew in results due to Windows cache.

Another thing, instead of scanning shared drives over the network, it is better to install ShowSize on the actual computer that hosts the drive and scan it as a local drive for best performance.

Here is the general strategy I recommend to save ShowSize scanning time for huge drives

Avoid scanning as a network shared drive

See the above discussion. Network drives do not seem to get the advantage of Windows cache. It is better to install ShowSize on the computer that hosts the drive so that the drive is not scanned over the network. Otherwise, a scan would take a longer time to finish as all the list data is fetched across the network. One exception is, if you want to scan very few folders and take advantage of the Exclude Folders feature described later, you can probably save much time on network drive scans too.

Don’t switch ON the collection of compressed sizes

Please make sure that you are not collecting the “compressed sizes” in the Options–Scan page. If you have switched on this option, it is going to be slower and even the Windows cache that saves time on a second scan of the drive is not going to help. By default, this option is OFF in ShowSize.

Switch on the ShowSize scan option to “Skip symbolic links”

If you have Symbolic links that are linked to other drives, you will get inflated Size reports because the size data of other drives will be added up unnecessarily in the current drive’s reports. Moreover, the symbolic links will increase the total scanning time depending on how many files and folders are actually linked or added through them.

New! Use the Exclude Folders feature in ShowSize Pro

By using this feature, you can limit the scan to only the folders that you want, you can even use the above features (including network drives scans) and yet save on scanning time. Let me show you an example.

For the same laptop drive used in the examples above, I want to take advantage of the Exclude Folders feature in ShowSize Pro. I click on the “Exclude Folders” button below the left pane. Here is what I see:

Excluding folders from scan in ShowSize disk space analyzer

I checked on some folders above to “Exclude” them from the scan so that they are skipped. If you see carefully, the folders that I selected are Windows built-in folders like Program Files and Windows itself. I get no advantage in measuring their sizes because I can’t do anything about it. I can not go about deleting files from the Windows folder just like that. It would be disastrous. What I’m really interested in are the folders other than the ones I checked for exclusion. Once I do that, ShowSize remembers them for this drive. Note that I selected an additional folder “mm” to exclude as I know it is huge and contains my VMWare Virtual Machines.

After setting up the Exclude folders, I do a scan again. I finish it much faster in about 11 seconds because I have excluded about 30% of the files and folders by excluding the above folders. Depending on how often you want to scan a drive, you can really use this Exclude Folders feature to save your time.

