April 2010 Archives

Preserving digital files

| | Comments (0)

During the cleanup project I mentioned in my last blog post, I learned a few things about preserving old personal (not business) digital files.

First, what to save. The short answer: everything and then some.

Why everything? Because it's cheaper than picking and choosing. My entire collection of Apple ][ disks, converted to 143KB disk images and compressed, came to 1.7MB. Even at the time I archived those in the mid-1990s, that would have fit on two floppy disks for a total cost of less than $10.00. Today the economics are a little different, but still, a one-terabyte external hard drive costs $90.

What does "and then some" mean? It means don't just save the TurboTax file; also save a PDF of it. Think about why you might want to look at a 2010 tax return in 2025. Is it because you have a computer with TurboTax 2010 installed and were feeling like editing your old tax return for old times' sake? No, it's because you're running for Congress and have decided to publish your old returns during your campaign. You're much likelier to need to read an old document than to edit it. Save a format that makes it easy to read.

"And then some" also means converting certain physically represented information to digital. Example: take a picture of the CD-ROM case with the serial number, and put the JPEG in the same directory as your files. The picture's usually enough to prove ownership of software, and if you ever have to reinstall TurboTax 2002 to get the IRS off your back, it'll be nice to know what to do when the license-key installer dialog pops up.

Next, which archival format? I like containers rather than individual files. On Windows, that's .zip. On Linux, it's .tar. On a modern Mac, .dmg. These containers are designed to survive transmission over networks, to move easily from an obsolete medium to a modern one, and to preserve metadata. I've witnessed the grief of helping a friend recover an old Mac file, only to find that the resource fork was gone. I've personally needed to know the last-modified date of an old file because that date was more significant than the file itself, and discovered to my horror that the date was today. Stick your old files in the proper container and take reasonably good care of the container, and you'll get them back out again exactly the way they were put in.

Which file format? For files that already exist, don't change them. But going forward, pick formats that are (a) open or ubiquitous, (b) understood by non-DRM applications. Examples:

  • Plain old text files. As is customary, Apple will someday invent another new character for line endings, but otherwise text files are universal.
  • High-resolution, lossless TIFF for scans of old photos. High-quality JPEG is probably OK for photo scans, too, depending on how important they are.
  • PNG for static graphics. The only example I can think of is Eagle PCB files.
  • WAV for important audio like the cassette tape of your dad interviewing his mom when she was 90 years old. But also make an MP3 so you can easily email the interview to your kids.
  • PDF for final versions of electronic documents (see above), and EPS for vector graphics. Both these are proprietary, but enough open-source viewers exist that they're unlikely to be unreadable in the future.

Obviously, it's only an educated guess which file formats will be readable decades from now. But ASCII, PNG, TIFF, JPEG, WAV, MP3, and PDF/EPS are good bets for today's documents.

So we have our entire life's Word documents rendered to PDF and stored in a zipfile. Which medium should we use? Two answers, depending on size.

For small file collections (10GB or less) I'd love to recommend CD-R or DVD-R if my personal experience with their longevity weren't so poor. I'd estimate 50% failure rate for well-stored CD-Rs over 10 years old. Moreover, computers with no moving parts are becoming common; in ten years, perhaps a CD-ROM reader will be as rare as a floppy drive is today. (Update: some have pointed out that the demise of any computer medium won't happen suddenly, and that there will be time to migrate data stored on old media, which is why the CD-ROM obsolescence argument is weak. I agree in principle. In reality, stuff gets put on shelves and discovered years later. Moreover, we're talking about personal files, where the hassle of tracking down a friend with an obsolete reader might be a high enough barrier to recovery.)

That leaves USB drives and SD cards, and I'll pick SD cards, even though they're a little more expensive. Two reasons. First is reliability. I've found SD cards to be more reliable than USB drives, which makes sense because SD cards usually store the only copy of pictures taken on digital cameras, meaning failure is potentially devastating, and USB drive files need to last only long enough for a sneakernet file transfer, meaning failure isn't a big deal. The second reason is form factor. SD cards are uniform and stackable. (Update: there are questions about flash memory longevity. The point is that if the medium is rewritable, then for archival purposes it has failure built into the design. Might make more sense to make multiple DVD-R copies and store them in different places.)

For big file collections, I'd buy an external 1TB hard drive, fill it up, and put it away. I know that hard drives have lots of moving parts, but they're well-sealed inside their cases, and I've personally had great success getting data off hard drives last used nearly 20 years ago.

What about online storage? Nope. I haven't yet found a consumer storage service that has a WORM (write-once, read many) philosophy about storage; it's too easy for me or a mischievous web weenie to issue a command that erases either my files or my entire account. And any software-as-a-service relationship (including any DRM purchase) is effectively a lopsided, eternal contract with a company. They can change the terms of that contract any time, and if the company goes away, it's likely your files will, too. I love Gmail for the service they provide, but I don't expect them to be my email archiving solution.

Should you encrypt your local backups? Several reasons why I say no. First is that I don't want it; if I die, I do want my wife and kids to be able to intelligently dispose of this stuff. Second, I don't need it; nobody cares about my personal data except me. Third, I can't follow through: either I'll forget the passphrase (or forget to divulge it on my deathbed), or else I'll keep it written down next to the SD cards, in which case it's no more effective than physical security of the SD cards. Your opinions may differ on this one, but let me ask: if your files are so important and secret, how come you don't back up or encrypt the files on your computer today?

And finally, in spite of all this carefully reasoned advice, consider throwing it away instead of saving it. If you don't, your future heirs will have to when you're gone. I admit that it's cool to know that I could call up my high school freshman year book reports on a moment's notice, but I doubt I ever actually will. I don't want to be featured on the digital version of Hoarders in 40 years, after all.

Time out of mind

| | Comments (1)

While throwing out old storage boxes this weekend, I discovered my old Power Macintosh 6100, packed up in 1997. To use eBay terminology, it was in vintage condition. It had the mouse, keyboard, VGA monitor adapter, power cord, and a few floppy disks. There was no excuse not to plug it in and see what happened.

So that's what I did. The chimes played and the smiling Mac appeared on the screen. The clock battery had died, but otherwise it was working like the day I'd last shut it off. My work was all there in Nisus Writer and ClarisWorks formats, patiently waiting 13 years for me to resume. On one of the unlabeled 3.5-inch floppy disks was a series of .DSK files. Those files were images of 5.25-inch Apple ][ diskettes. I downloaded an Apple ][ emulator and was soon running the Applesoft BASIC and 6502 assembly programs I'd written when I got my first computer at age 10.

I have a recurring dream where I'm in a house where I lived long ago. It's just as if it had remained abandoned since the day I left; it's dark and filled with cobwebs, but otherwise the furniture is still there. These dreams always have the effect of compressing time. I remember old situations so vividly and freshly that my mind thinks hardly any time has passed. Exploring this old Macintosh and Apple ][ was the same experience, but without the cobwebs, because digital files don't age. My programs from decades ago ran just as well as they did back then.

The time-compression effect was as strong as the files were perfect. It transported me to the room where my family kept the Apple. I felt the pattern of the carpet and the texture of the walls. I smelled the slightly musty air. I felt the resistance of the door and the momentary change in air pressure as I opened it. I was ten years old again. Woz hadn't yet crashed his plane, Steve Jobs hadn't yet met John Sculley, and Microsoft wasn't yet the enemy because they didn't sell operating systems.

It's tempting to dive deeper. There are "Classic" Mac websites. Apple ][ fan clubs are still going strong. eBay stands ready to help me complete my retro hardware collection. But I wouldn't really be reliving old memories; I'd be replacing them with new ones. Today, if I close my eyes and think hard, I can still evoke the sensation of pure wonder I felt when, as a child, I first ran Bob Bishop's magical "APPLE VISION" program. But I'm sure I could replace that memory with a jaded "my, how far we've come" chuckle if I loaded up the dancing man in the TV set today.

I chose to keep my memories, not make new ones. I copied my old personal files to a fileserver, then wiped the Macintosh's hard drive and packaged it up to sell on eBay. The hardware is gone, and only the software remains.

About this Archive

This page is an archive of entries from April 2010 listed from newest to oldest.

February 2010 is the previous archive.

June 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.2-en