database cleaning

September 27, 2005

Spent a big part of today messing with Buzztracker quarterly cleaning.

The databases usually fill up within 3 months time. I always put it off until the last minute because it's such a slow, tedious process -- this cleaning up of old data, archiving, and making sure everything comes out squeaky clean.

It never does come out clean. It comes out a mess. The archiving bit is no big deal. But archiving 4 gigs of data isn't trivial. You have to make new tables, copy data, copy the new tabled data, download it, check the integrity, remove it from the main table, optimize the table to clean up the space and check to see the main table is functioning properly. For some reason, probably because I'm not a database specialist by any stretch of the imagination, the main table is never functioning properly when all is said and done.

And so it goes that the process of repairing and re-optimizing begins. Usually it ends with a support ticket to the lovely people at Rackspace who respond both quickly and knowledgeably. After going back and forth on the support ticket for a while we usually figure out the source of the problem and get things up and running again.

The key to maintaining any semblance of sanity throughout the process is to have backups. Always make a backup of the main table before touching it so if you do screw up, you're just a 10 minute copy away from starting back at the beginning.

But in the end everything seems to be working properly again, and I have another eighty thousand new articles sitting on my backup drives, waiting to be picked apart by the next buzztracker phase.

Previous entry:
guardian 2005
Next entry:
Comment preview: