October 11, 2017

Why are file sizes only getting bigger, not smaller?

'Infrequently Asked Questions' seeks the answer

By Brandon Baker
PhillyVoice Contributor

It's the perpetual head-scratcher: Though our storage devices grow exponentially bigger with each new iteration of the iPhone or PlayStation, they never seem to net us much in the long run.

It begs the question: Rather than storage getting bigger, why can't files get smaller?

Eager to know more, we reached out to Brian Stuart, an associate teaching professor at Drexel University's College of Computing and Informatics and former senior engineer who specialized in data and cloud storage solutions.

The meat of the question: Why do file sizes for download seem to be getting bigger, not smaller? For example, 64 gigabytes of storage on my phone used to get me a whole lot more just a few years ago, but now it's hard to fit everything I might need into that tiny amount of storage. Game systems are even working on terabytes these days. What's the deal?

The world is full of questions we all want answers to, but are either too embarrassed, time-crunched or intimidated to actually ask. With Infrequently Asked Questions, we set out to answer those shared curiosities. Have a question you want answered? Send an email to entertainment@phillyvoice.com, and we’ll find an expert who can give you the answer you’re craving.

In the computing world, there's an old saying that "software will expand to fill all available space." In some cases, this results from expanding features, higher resolution images, longer videos, etc. Another phenomenon is similar to something found in nature. If you were to remove all the birds from an environment, then the population of the worms on which the birds fed would explode. For developers, resource constraints are the primary motivation for efficiency. As those constraints are relaxed, developers are less motivated to be economical with storage space.

I don't know if you watch "Silicon Valley" on HBO, but the startup in that show operates on the premise they can shrink big file sizes down to basically nothing. Is that realistic at all? Will that ever happen?

The short answer is no. There are basically two types of compression: lossless and lossy. Lossless compression allows us to perfectly reconstruct the original data. However, each file contains a fixed amount of information and cannot be compressed any smaller than that amount. The theoretical foundations for measuring information were established in 1948 by Claude Shannon. For lossy compression, we trade off quality for space-saving. In effect, we can make the file smaller by throwing away some of the information. If we cleverly pick which information to throw away, then our human senses will perceive the reconstructed sounds, images, etc., as being very close to the original. We see this happening in practice every time we listen to an MP3 recording or watch HD television. However, because our senses require a substantial amount of information to perceive the material as natural, compressing the data beyond that point results in images and sounds that people identify as low quality.

Can you briefly explain what the inherent difference is between a document, photo and video file, and why the size difference is so significant between them?

It boils down to the amount of information they contain. The old adage that "a picture is worth a thousand words" turns out to be more true than we might expect. Without compression, a single byte can describe a single letter, digit, punctuation mark, or spacing in a text file. That same byte can describe the brightness of a single pixel in an image with no color. So a thousand words might take 6,000 bytes to represent, but 6,000 bytes would only be enough for an image of 100 pixels by 60 pixels without any color.

Another way to look at it is that a 12-megapixel color image without any compression will typically take about 36 megabytes. By contrast, just the text of the King James version of the Bible can be stored in fewer than 5 megabytes. Video is, in essence, a set of many, many images--typically 30 images per second of video. So, without any form of compression, one minute of video where each frame is a 12-megapixel image would take over 64 gigabytes of storage space. In practice, the difference isn't as dramatic as it might seem, because we can use lossy forms of compression on images and videos, and because we can use lossy forms of compression on images and videos, and because videos contain a great deal of redundancy that can be removed. However, for text, we must use lossless compression, if we compress it at all.

MicroSD cards fit an awful lot of storage for their size. How is it that those cards can be so tiny but an external hard drive is still so clunky with a similar amount of storage?

There are several factors at play here. The first is that the capacities aren't as similar as they might appear. A terabyte is 1,000 gigabytes. So, a two-terabyte disk drive holds the same amount of data as 125 16-gigabyte microSD cards. It is still true, however, that a typical disk drive is significantly larger than 125 microSD cards. In a classic disk drive (that in the industry we sometimes call spinning rust), there are various motors, arms, filters and other hardware that take up space. Additionally, there's a sturdy metal frame that provides for a more robust device than an SD card. Solid-state drives are built with frames that match the dimensions of traditional drives so that they can be drop-in replacements. Among storage devices based on flash memory, like SD cards, there is another factor that results in different sizes. There are basically two different ways of approaching flash technology. The approach that provides greater reliability and longer retention results in fewer bits in the same amount of space.

If you were to recommend someone buy additional storage for their phone or computer today, what's the current futureproof recommendation? Should we be looking at terabytes now and not gigabytes? I know this varies by person and their needs, but like I mentioned up top, 32 gigabytes doesn't get you much with a phone anymore and 64 gigabytes seems to have become the new baseline.

You hit it right on the money with the observation that it really does depend on the person and their needs. In a lot of cases, the consumer no longer has the option to increase storage without cracking the product open. For those that are always pushing the limits (e.g. those who create, edit and archive HD movies), about all you can do is buy the biggest devices you can and live with the constraints. For the rest of us, there tends to be a sweet spot in terms of bytes-per-dollar. By getting devices that are about one generation behind the newest--often about half to one-quarter of the capacity of largest available--you can get a lot of storage and not break the bank.

As far as futureproof goes, from the consumer's point of view, that's a wonderful dream. From the manufacturer's point of view, it's a given that you will run out of storage and need to buy more. For data preservationists, the wisdom is to keep copies of the data on every form of storage media. That way it won't matter which ones continue to be supported, and which ones fall out of use.

Brandon Baker
PhillyVoice Contributor