07 September 2011

Your data and what really happens when you 'Delete'

This morning I ran across a re-tweet of this by someone here at work with a link to an article about how Windows systems do not actually completely erase files when you tell it to delete a file or folder - even when "permanently" removing it from the Recycle Bin. Upon reading the article, I felt a huge pang of Nerd Rage and felt compelled to set the record straight to the best of my knowledge. After all, I did work for over 5 years getting my degree in Computer Science, the least I could do is actually use it. The article I mentioned is half-hearted in its explanation, but avoids some major points about file deletion in general regardless of the operating system as a means to sell their product. Let me explain a few concepts before I dive into clarifying some things that are misleading in the article:

What Filesystems Are & How They Work
A filesystem is a logical method for the storage of information by electronic devices in an organized and efficient manner to facilitate later retrieval and update. This is akin to an office's filing cabinet system, where certain papers are filed away in drawers according to some rules. Whether the information is a single bit, a term paper, pictures, videos, or huge volumes of scientific information, every electronic device that can store information, no matter how rudimentary or complex, has a filesystem of some sort that it uses to store and retrieve data. In fact, computers (desktop, handheld, or servers and some other devices) may interact with more than 1 filesystem in order to accomplish whatever task it needs. However, the purpose for all filesystems is the same: to give the computer some organized way of storing and retrieving data from a particular medium.
Commonly, a filesystem is defined much like a way a library or bookstore organizes its books: there is some central location where information about where each book is in the library or store. This is usually a list, and contains some method of locating the book. Whenever you want to get a book, you would go to this list, and it would direct you to the physical location where you can take the book off the shelf. When a new book is added, a physical location is chosen for the book, the book is placed on the shelf and a new entry is added to the list describing the book and its location. Computer filesystems are very similar - they divide up the physical space available on a storage medium into logical sections, giving each logical section an address. Then, when a file is stored, the file is divided into pieces and written into those sections. The address to each section where pieces of the file was written is recorded into a table (called a file allocation table, or file pointer/reference table) so that when the computer needs to access that file again, it knows where to look for the pieces so it can reconstruct the file again. In fact, the pieces to a single file could be stored in a way where pieces physically appear out-of-order and have other data from other files appear in-between. However, thanks to the file allocation table, a computer knows where and in what order all those pieces are supposed to be in. This process of splitting files up and reconstructing them is automatic and performed very quickly, and oftentimes the user wouldn't even notice. In fact, the programs working with these files don't even notice, since the process is handled largely by very important components of the operating system.
I should mention that there is no single type of filesystem. There are many different filesystems that serve many different purposes. Storage media, whether they are hard drives, CDs, flash drives or tape, must be initialized and assigned a filesystem to use - a process called formatting. Formatting allows whatever operating system that will be accessing the media know how to work with the information stored on it. Ideally, any filesystem should work on any medium, however this is most often not the case. Some filesystems are required to be used for certain operating systems or even storage medium. For example, over the years, Windows has run on variations of the FAT filesystem (originally FAT16, then FAT32), but now modern versions of Windows require to be run on hard drives formatted to use the NTFS filesystem. MacOS systems have used a number of filesystems including UFS and HFS, but the most recent incarnations of the platform use the HFS+ filesystem. Unix and Linux computers are by far the most flexible and have the capability to access Windows and MacOS filesystems as well as those it uses almost exclusively, including EXT2, EXT3, EXT4, ReiserFS, and many more. Most modern operating systems do allow accessing other types of filesystems as well, but support is limited and sometimes only allows certain filesystems and/or functions.
How Computers Delete Files
A common misconception is that when a file is deleted that it is completely removed from the system (even if it is removed from a "trash"-type item, like a Recycle Bin or Trash Bin). This is not entirely true. In order to make the process of removing a file as efficient and speedy as possible, many filesystems and operating systems will merely flag a file as removed and remove it from listing. This is akin to hiding a file, or "sweeping it under the rug." Some systems do this by either renaming the file, or marking the file or its pieces as removed in the file allocation table. To the operating system, the file doesn't exist any longer. Then, when the operating system needs space to store a file, it may use the physical space once used by pieces of a deleted file and overwrite the information on disk with the new information.
As mentioned before, the motivation behind "marking" a file as being removed rather than actually "wiping" it out and completely destroying the data the file once contained is mainly performance. For very small files, "wiping" the file (overwriting the file with zeros or random data) is relatively fast since there are fewer places the media must be visited in order to write data. For larger files however, the process could be time consuming and users rarely want to wait for that 1GB file to be deleted. Moreover, because the operation of writing is considered to be a "critical" task (meaning that the information you want to write is of the upmost importance and therefore should not fail or be corrupted), writing data typically takes longer - sometimes its the media's fault and is just slow on writes, and sometimes its the operating system's fault where other operations occur at the same time as a write (such as verifying the data was written properly onto the media). Thus, if a "delete" operation overwrote a file, it would take a significantly longer amount of time to write garbage data to "erase" the real contents of the file than just to rename it or flag it as having been removed and wait for a period when another file can take its physical place and overwrite it then. This practice, regardless of operating system, is very common and only a small number of filesystems and operating systems will completely erase a file, and many times if destroying the data on the disk is indeed available, it is an operation that may need to be explicitly set to be performed for the performance reasons mentioned before.
The Cloud & Your Data
The concept of "the cloud" has recently become the buzzword in the past couple years, and has shown to be a great shift in the way users use all manner of Internet-connected devices and store their information. The "cloud" is basically all manner of web applications and programs that are run on servers in the Internet that serve users with services which may also be able to store their information for them so that it is accessible anywhere they can access the Internet. The advantage of this is, obviously, the ability to access your files and information conveniently on (almost) any device at any time wherever the user can access the Internet. Moreover, "cloud" services usually also tout that your information is continuously backed up, so that even if there is a catastrophic failure in the service, your information would not be lost forever. However, there recently has been a lot of debate about cloud computing's disadvantages.
The list of disputes is too numerous to go over here, but one that is hotly debated is the security of your information online. Some services have overcome this by providing some method of encryption of data on their servers, but the method for which this security is performed is not standardized nor has any law mandating minimum security and encryption practices. However, one issue that is not nearly as discussed, but still a hot issue, is the question of what happens to your data when it is deleted on a cloud service.
Some cloud service users may believe that when they click "Delete" when deleting a piece of information that the information is gone forever, but they may be wrong to assume so. Cloud services run on servers which run on server operating systems like Windows Server, Linux, Unix, MacOSX Server, and many others. However, like their desktop counterparts, these servers rely on high-performance utilities and filesystems, and therefore are subject to the same removal issues I addressed earlier. So what really happens when you click "Delete?" Unfortunately, the method used for removal varies from cloud service to cloud service, but it probably would do the same thing that happens on your desktop - the file is made invisible, but actually still exists in the storage media the cloud service relies on.
Fortunately, some big name cloud computing companies understand this issue and typically solve the problem by encrypting the data stored in their services before it ever touches their storage media, so unencrypted information is never stored. However, as before, this encrypted information is never immediately completely removed when it is deleted, and may still be subject to recovery, and worse still is that if the encrypted information can be decrypted (which, obviously, it can, otherwise what's the use of storing information can can never be decrypted), the encryption only serves to make peering into sensitive information extremely more difficult but altogether not impossible (improbable, but not impossible).
Now, I am a big proponent for cloud-based computing, but it is my belief that highly-sensitive information should not be stored on such services. Though I entrust that someone obtaining the hard, physical copy of the information I store on such services is next to impossible (how would they break into the cloud service's data center, and how would they know which hard drive to take out of the millions there probably are?), I do not trust the authentication mechanisms these services employ, and this is widely accepted to be the glass-jaw of the entire secure cloud computing idea. If someone managed to take your username and password, the encryption was all for naught. Therefore, I propose that highly sensitive information be stored on highly-trusted systems with strong security mechanisms and practices, and access of that information occur on highly-trusted systems with strongly-encrypted communication mechanisms if the data needs to be stored digitally at all.
How To Securely Delete Files
Commonly available secure file removal utilities will typically overwrite the file subject for removal with garbage data in one or multiple passes, depending on the security level provided (if the option to change the level exists). This is like taking a line of a confidential document and marking over areas of text with a black magic marker multiple times until the text could no longer be read (note: this analogy is oversimplified and is quite flawed, but I think it gets the point across). The reason you would overwrite files with random garbage data multiple times happens to be the fact that recovery of data using forensic tools is a possibility if the file is only overwritten once or twice. This is especially the case when working with hard drives, where a magnetic "residue" is usually left behind on the hard drive's platters from initial writes to a physical location. The idea with secure removal utilities is to "wipe clean" the "residue" as much as possible so that the ability to distinguish legitimate data from garbage data is nearly impossible.
As mentioned before, the article I came across in the tweet was quite misleading, as it implies that Windows is the only system that is vulnerable to snooping and that being able to securely remove files required the purchase of a piece of software. However, as I mentioned throughout, this is not a security issue of solely Windows, but inherent in all electronic devices that store information, even temporarily. Thankfully, however, for most desktop computers and servers, there are many freely available programs that can assist in the destruction of sensitive files.
Firstly, for Windows computers, Sysinternals (a division of Microsoft who creates advanced utilities for Windows administrators and users) has SDelete, a utility designed to adhere to Department of Defense digital file sanitizing standard DOD 5220.22-M. This utility allows users to securely wipe files, folders and even entire drives and allows the user to specify a number of passes other than the DOD standard. For MacOS X users, a post I found says that there is a secure file removal utility included with the operating system (originally found in the Unix operating system) which can be manually invoked from a terminal to perform a similar type, but just as secure, of file removal. Linux and Unix users have the most options, with a number of free, open source utilities, including "srm" (the same used in MacOS X), "shred" and "wipe" to securely wipe and remove files, directories and media. Really, the only reason to use the product advertised is if it had a convenient, easier to use interface - although I wouldn't advise using secure wipe utilities unless you have a good grasp on using command line utilities anyway.
BE FORWARNED: Using these utilities will make it impossible to recover data once used, and in some cases, carelessness can inadvertently wipe files and entire media not intended for wiping, thus resulting in unintended data loss. Use at your own risk.
Finally, to avoid peering into undeleted files (files that you have not marked for deletion at all), use file, folder and/or disk encryption utilities such as GnuPG to avoid ever having information touch storage media unencrypted in the first place.
Being misinformed can lead to misconceptions, and those misunderstandings can lead to costly mistakes. I wrote this not to bash on the people who are trying to make money selling a product - I know this better than most as I'm a computer programmer myself - but I wrote it to allow people to have a better understanding of what actually happens behind the scenes and where pitfalls may be. Many people assume a lot about their computers and sometimes are misinformed, and as a computer programmer, I see this all too often and feel I would be doing a disservice not to try to set the record straight. I merely felt that the article preyed on the general fear of personal or professional security breaches on their computers and electronic devices, especially since the land of the Internet has become a more dangerous place with black-hat hackers and viruses becoming more and more sophisticated. I have never been a fan of using fear, especially fear that exploits knowledge that is incomplete, as a means for marketing products, and believe that being informed is something that is important, especially with today's ever-changing technological landscape.


Infinidium said...

Make a goddamn TL;DR bro. :P

Cynthia L said...

Hmm....very useful information. Thank you for the sharing. If we understand what really happened when data deleted, it would be easier to find them back again. If we cannot get it back, can data recovery software like Recuva or MiniTool Power Data Recovery really help with the data recovery?

Taylor Nash said...

Dear, I like all your post very much, it tells me so many thins that I do not touch before, hope you can do this job all the time. By the way, I want to introduce a useful Free Photo Recovery Software, which is a fantasy.