Latest News + Hitchhiker Blog

Tales of Migration, Part 9
In Which We Plan Data Cleanup Steps

This is the ninth in a series covering the library system migration of the Bartlett Library at the Museum of International Folk Art (and our partner library at the Laboratory of Anthropology).

One of the key decisions to make in any migration is how much data cleanup to do, and when to do it.

Most of us don’t have perfect data. Some patron names are added in one format but others are wrong. Some catalog records may be incomplete. There are a multitude of variations. Here at the Bartlett Library we currently have a system that doesn’t use MARC format bibliographic records, so every single record for every item we own will have to go through a major conversion to put it in MARC. That has to happen, no matter what. But what about the areas within each record that also have inconsistent formats? Do we clean them up before we migrate, as part of the migration, or after?

These decisions have to be made case-by-case, and it’s well worth thinking about them before your migration process begins.

If you have old, outdated records consider eliminating them early in the process. These might be patron records for people who haven’t used the library in the last three years, or item records for books marked missing years ago and never found. If you clean them out up front you will have fewer records to convert, and that may save you money.

If you know there is a specific area that’s inconsistent or just plain wrong – for us, it’s the configuration of accession numbers we use as key identifiers for each item in the collection – and if you can state a simple rule for how to fix it, consider including the fix in the data conversion process. If you can give the vendor a simple rule such as “If the data in this field looks like xx-yyyyy then remove the xx- and make it 00yyyyy” then the vendor may well clean that field up for you at no extra cost. Think about areas of your records that you know people sometimes fill in incorrectly. Can you create a simple if-then rule to fix the most common errors? If you can, then those are good candidates for fixing automatically during the conversion.

I, sadly, know that the process of converting my non-MARC records to MARC will create even more inconsistency and error, so I know I will have to do a lot of my data cleanup manually, record by record, after the migration is complete. The Laboratory of Anthropology Librarian, however, has been working toward a migration longer than I, and has done a lot of her manual clean-up ahead of time. Each situation is different.

Before we finalize data plans we are going to ask for help from an expert who has worked with cataloguing, databases, and other Koha migrations at the State Library. There is nothing like an extra set of eyes, backed by a sharp brain, looking at your data. You look at your records all the time. Someone from outside your organization who takes a fresh look may see a solution you’ve missed. If you can get another professional opinion for free, grab the chance. There’s nothing to lose.

Some good guidelines for data cleanup:

  • If you have records you can eliminate, consider doing it early on to save money
  • If you can make simple rules to fix known data errors, provide the rules to the vendor and ask if they can be included in your data conversion process
  • If the problems with your data require manual work, looking at each and every record involved individually, then look at how much time you have before the conversion. Is there time to do the work beforehand? If not, save it for later.

By now we’ve done about as much as can be done ahead of time. We have a good vendor with good software, a partner library and an agreement on how we’ll work together, all our necessary approvals and contracts, a clear timeline, and a plan for how to manage known problems with our data. Time for a deep breath before taking the plunge.