How can the Dolph Briscoe Center for American History--a government-funded university archive--clean, edit, and cross-walk 100,000 records to an entirely new metadata system and digital media repository? What can it do to make this process easier in the future?
Here's a record of goals met and where future metadata interns would pick up in the initiative.
Metadata Crosswalking:
Completed
Researched other people/institutions that have completed or are in the process of migrating special collections digital asset metadata.
Compiled readings on metadata schema and user manuals.
Composed list of "Institutions Migrating Metadata from Dublin Core to MODS" and adopted Islandora (ex. University of Central Florida libraries and University of North Texas).
Helped compose "DBCAH MODS Metadata Documentation" style guide.
Consulted and borrowed from UNT Library’s "Input Guidelines for Descriptive Metadata" style guide.
Updated current metadata schema version appropriate for each element.
PBCore, MARCrelator, ISO 639-3, LOC linked data, Getty, etc.
Needs further action
Update each section of the DBCAH MODS Metadata Documentation style guide into the appropriate places within the wiki.
Metadata Normalization:
Completed
Using OpenRefine, created controlled vocabularies from existing values in the DMR.
Cleaned datasets and arranged isolated unique values in Google Docs.
ext_authority
publisher, language, roles, avroles, genre, names, city, county, state country, etc.
local
publisher, contributor, subjects, projects, historical placenames, local placenames, creator, etc.
Other Documentation within the "Mike - Clements Metadata" folder.
Reconciled cleaned/isolated data sets against authority lists.
LCSH, LCNA, VIAF, Geonames
Needs further action
Complete geographic placenames reconcile against Geonames using OpenRefine. Determine whether a Python installation issue is the problem, as per Christina Harlow’s post on GitHub (the result of our consulting her about our reconcile problem).
Dolph Briscoe Center for American History Digital Archives Wiki (DAWG):
Completed
Created information architecture for the "Metadata Style Guide."
Migrated over the DC-MODS style guide based on new controlled vocabularies and values.
Authorities, Elements, Qualifiers, Notes, Tutorials, Previous Versions, etc.
Created "Tutorials" on how to use OpenRefine.
"Cleaning Messy Data Sets Using OpenRefine."
"Downloading and Using OpenRefine"
"Eliminating Duplicate Entries Using Geographic Placenames.
"Reconciling Against LCSH Database Using SPARQL Endpoint."
"Reconciling Against VIAF Using OpenRefine."
Needs further action
Tutorials
"Reconciling Against Geonames Using OpenRefine"
More documenting our attempts to reconcile as opposed to a tutorial since we never actually were able to get the reconcile to work.
Info on Our Digitization Specs
"Resource Packet Directory" (see “Notes for Mike” outline in Google Docs)
"Introduction," "What is Metadata?," Things you should ask yourself before you attempt such a project," "Understanding our MODS Metadata," "Mapping from DC to MODS," "What was difficult for us?," "What are the big changes?"