October 2006 Archives

"Sneak peek"

| | Comments (1)

So many people misspell "sneak peek" these days that it's almost become proper English. Reminds me of "chaise lounge," a misspelling of the French term "chaise longue" meaning "long chair."*

Nonetheless, here I am on a rampage about it. For the last time: a "peak" is the top of a mountain, and a "peek" is a look. You take a peek at a peak.

Setting aside the issue of spelling, the term is so overused that it's meaningless. For example, if you are publishing a "sneak peek" on a blog, there's nothing sneaky about it anymore; everyone can see it because it's now available on the web.

* More on this phenomenon on Wikipedia. My favorite example is that "sweetheart" is a corruption of the original word "sweetard." Tonight I'll test-drive that term of endearment on my lovely sweetard wife Mary and see how my etymological defense holds up as she beats me to a pulp.

Wizardry in Applesoft BASIC

| | Comments (8)

Long, long ago, probably 1982, a friend gave me a copy of a program that was supposedly a prototype of a role-playing game called Wizardry. I had just finished the real version of the game (which was written in Apple Pascal) and was astonished to see this crude, low-res graphics version written in Applesoft BASIC. It was done in way too much detail to be a forgery, and there were enough differences in the storyline that it really did seem to be something that could have involved into the final product.

Not recognizing the possibly long-term geek appeal of the program, I deleted it a few weeks later.

It's possible it was not what it appeared to be; maybe it wasn't a prototype but rather a knockoff programmed by an idle Wizardry fanboy who had neither the money to buy a real copy nor the moral makeup to pirate it. But either way, it would be cool to see it today. I wonder whether a copy of this BASIC program still exists on a dusty 5.25" floppy somewhere in the world.

After Wednesday's unfortunate plane crash, people are expressing surprise that small airplanes are allowed to fly near large cities. Some are saying that the "terrorist threat" calls for new flight restrictions on small planes.

A couple thoughts:

  • Suppose you're a terrorist. Which of the following sounds easier? Plan A: spend months in flight training, rent a Cessna with ten pounds of Semtex plastic explosives, then fly it into the 30th floor of a building and blow yourself up. Plan B: put the plastic explosives in a backpack, take an elevator to the 30th floor, and blow yourself up. No rational terrorist (yes, terrorists are entirely rational) would use a light aircraft for any malicious purpose, because a car or a backpack would serve the same purpose at lower cost and with less training.
  • Clueless quote of the week: Governor George Pataki says "It's just unfathomable that five years after September 11th, an inexperienced pilot can be circling the city and not under the control of any of the radar towers of the airports around the city." Is he trying to say that radar tower control would keep a terrorist from crashing a plane into Manhattan? How would the exchange go? "Tower to terrorist, please divert from building." "Terrorist to tower, OK, sorry about that!"

This wasn't terrorism. A guy in a plane made a wrong turn and didn't see an obstacle. Just because a terrorist could do on purpose what someone did by accident doesn't mean we should ban that activity. It makes no sense to evoke terrorism in response to this accident.

Notebook drive recovery

| | Comments (0)

After helping poker buddy JJOK get the data off his dead laptop drive, I found to my delight that karma really works. Here's how I recovered all the data off my dead notebook computer (as well as how I killed it in the first place).

Sunday afternoon I decided to install Ubuntu 6 on my Fujitsu Lifebook p1510d. I deleted unnecessary data off the WinXP partition, defragmented it, and then did the installation. Ubuntu resized the NTFS partition, installed GRUB, and all was well. Unfortunately the Atheros wireless chipset on the notebook didn't play nicely with the Ubuntu drivers, and the only known fix was a recompilation of the drivers myself with a "+4" inserted in an obscure .c file to get around some seemingly superfluous extra bytes in the network stream. The author of the fix also blithely noted occasional kernel panics. So I applied my usual fix to Linux desktop issues like these, which is to uninstall and wait another three months.

But this time I did the uninstallation in a boneheaded fashion: I booted back into XP, deleted the Linux partition, and reformatted it as an NTFS volume. Those of you paying attention will note that the MBR was now pointing to a nonexistent bootloader. Everything worked great until a couple hours later when I tried to awaken the laptop from hibernation. Fortunately, I knew exactly what the problem was and exactly how to fix it.

Enter a deadly combination of laziness and hastiness. My notebook is an ultraportable and doesn't have a floppy drive, so I couldn't create a DOS floppy disk and type "fdisk /mbr," which would have fixed the problem. I didn't feel like burning a CD-R to do the same thing, so instead I found a utility on the web to turn a USB key into a bootable DOS volume.

This worked great except that it mounted itself as the C: drive, and my dead drive as a "second fixed disk." Apparently fdisk won't do much of anything with a fixed disk other than the first one. Remember, at this point I could have easily solved the problem by burning a CD-R. That was the laziness I spoke of earlier.

Now, here's the hastiness. In a moment of excessive cleverness, I booted my Ubuntu CD-R and copied the MBR from the USB key to my dead drive with the following command:

dd if=/dev/sda of=/dev/hda bs=512

Even if this had done what I'd intended, it would have erased the partition table from my drive. Instead, what it actually did before I frantically pounded control-c-control-c-control-c-control-c-control-c on the keyboard was copy 840KB of my USB key onto my dead drive, which not surprisingly now claimed to be not my trusty 30GB Windows XP installation, but a 32MB USB key circa 2002.

I came this close (holding index finger/thumb really close together) to shamefully hauling out the Fujitsu PC Recovery CD and just starting over, but I did the math and guessed that the cost even of contacting software companies for stuff I'd bought and asking for permission to reinstall would likely exceed the cost even of a couple hours of investigation, and the investigation route might even recover my personal data. So I pressed on.

Step one: find a Linux rescue CD. Boot into it. Run gpart -W /dev/hda /dev/hda and thus write an inaccurate but workable partition table to the drive.

Step two: fdisk /dev/hda and change the first partition from a 32-meg FAT partition to a properly sized NTFS partition.

Step three: do some research on the web and figure out that NTFS, bless its soul, writes a second copy of 16 critical files to the middle of each NTFS partition. Normally it keeps these 16 files at the front of the partition; my dd command overwrote one or more of those. So I was now confident that I'd lost no irreplaceable data.

Step four: find a Windows XP installation CD. Boot into recovery mode. Run fixboot, which rewrote the NTLDR file and probably pointed the MBR at it. At this point the shell seemed to agree with me that there was a C drive, but that it had some serious issues.

Step five: CHKDSK /R. I think what this tool did was notice that the critical files were missing or invalid, and restored them from the mid-disk versions. I ran it once more for good measure.

Step six: reboot. Not only was my laptop back, but it even resumed successfully from hibernation!

Thank you, NTFS designers, for creating a filesystem that can withstand this kind of abuse. Thank you, NTFS hackers who authored miscellaneous web pages that gave me hope that my drive was recoverable. Thank you, worldwide army of Linux hackers, for building the tools that destroyed my drive, and for building some of the tools that fixed it.

Google Code Search

| | Comments (0)

Pretty slick! It even searches tarballs! (This search was for term in a .c file I posted long ago to my 3Com Audrey hacking page.)

Amazon S3 as personal backup

| | Comments (3)

Jeremy recently wrote about using Amazon's S3 service as a backup server for his personal data. I ran the numbers and figured the cost was acceptable for my use case (about 15GB of digital photos with a couple hundred MB added each month), so I looked into existing frontend technology for S3 backups. There's JungleDisk and s3sync (sorry, getting too lazy to convert to links; use your favorite search engine), but neither was quite right for my requirements.

The principal problem is that my wife uses the file system (including filenames) as a filing system. This is probably what 98% of computer users on the planet do, too, but as I've written before, adequate search and automated organization technology (such as you'd find in Google's Picasa) make this work superfluous, so I don't do it -- I tend to dump poorly-named files into folders and let indexing software find them when I need them.

But my wife does organize files, and that means that she moves files around on the filesystem and occasionally renames them. This wreaks havoc with programs like s3sync that identify an object by its path. If the path changes, the object at the old path ceases to exist, and the one at the new path must be uploaded all over again. If you're paying for bandwidth, as you do with S3, this means a single top-level folder rename could be quite expensive.

I think the solution is a Venti-style layer over S3. It would work something like this:

- For every file in the directory to be backed up, compute a strong hash of the file contents.

- For each unique hash generated, upload the file corresponding to that hash, keyed by a hex representation of the hash. Rely on S3's built-in capability to avoid re-uploading objects whose contents haven't changed. Update 10/6/2006: As Antony pointed out, this feature doesn't exist; it was just wishful thinking. I will have to first list the bucket contents, and use the result to skip the files already uploaded.

- Upload a representation of the directory structure mapping paths to hashes, as well as whatever other metadata is needed to reconstitute the file at recovery time.

- Maintain a log of objects and refcounts to them in the directory structure. As objects are orphaned (meaning a file was deleted or revised), add them to a queue with timestamp. Once a certain amount of time has elapsed since addition to the queue, such as two weeks, remove them from the list. If they're still orphans, delete them.

If I've thought this through correctly, then this backup system lets you rename (and move) files all you want, and doing so won't cause them to be sent over the wire again. The recovery process isn't too onerous in terms of backup file format; just reassociate paths and metadata with each object, and you're done. You get versioning of individual files for free via the delayed garbage-collection mechanism. And in fact you could set up the system to back up several home PCs and not worry about double-backups of identical files on each PC, assuming the hash store were a big shared soup.

Disadvantages:

- Compression window is limited to a single file. Probably not a horrible loss for the average home dataset, where files don't have much relationship to each other.

- Granularity of versioning is per file. This would be expensive, for example, if you were making small daily edits to a giant Quark file that actually changed only a few well-localized bytes in the file. Perhaps a Bittorrent-style piece mechanism, or whatever rsync does, would address this issue.

- Not entirely convenient backup format. For example, s3sync ends up mirroring your directory structure on S3. So with appropriate security measures you could use your web browser as a convenient filesystem browser. This proposal would give you the browser structure, but the moment you wanted to actually get a file, you'd have to copy and paste the hash key to generate the path to another part of the S3 bucket.

I think this is about 50 lines of Python (famous last words). Maybe I'll try to write it this weekend, unless someone out there beats me to it.

Update 10/6/2006: Looks like the Perl version of s3sync first lists the bucket and collects all the etags (MD5 hashes), and then it skips files already uploaded. If the Ruby port preserves this behavior of the Perl version, then it might handle the move-triggers-reupload issue. So it's possible that s3sync already does enough of what I want.

About this Archive

This page is an archive of entries from October 2006 listed from newest to oldest.

September 2006 is the previous archive.

November 2006 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Powered by Movable Type 4.2-en