Tuesday, October 31, 2006

The Persistence of (Bad) Online Data

A particularly trying case of persistent bad data was brought to my attention a few months ago by Lou Rosenfeld, co-author of Information Architecture for the World Wide Web. We'd been in discussions with Lou to publish a new book on search analytics. The discussions had progressed as far as an editor issuing a contract for discussion, before Lou decided to launch his own publishing company.

And that's where the problem started. Issuing a contract at O'Reilly results in various data being added to our own internal tracking databases. When Lou decided to publish his own book, we flagged the book as "cancelled" in our own database. But unfortunately, our system sends out automated notice of cancelled titles via an ONIX feed -- even cancelled titles that have never been announced. That would have been well and good, except for the fact that someone else's automated system noticed a new book in the cancelled book feed, and added the book to their database -- without the cancelled flag!

(Aside: That's a classic bug cascade. As my friend Andrew Singer once noted, debugging is discovering what you actually told your computer to do, rather than what you thought you told it. Unfortunately, you sometimes don't understand what you told your computer to do until some particular corner case emerges.)

From there, it went from bad to worse.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home