Once the project developed an overall understanding of the scope of the Philippine collections at the University of Michigan, we quickly realized that the large scale of work required to address problems in multiple descriptive metadata systems would benefit from bulk editing and analysis.
By analyzing the collections as a whole, we hoped to develop analysis tools that would deepen the understanding of the extent and distribution of issues related to harmful terminology in the metadata, which had been identified in the recommendations from the harmful terminology glossary. The aim was to develop tools that would analyze hundreds of descriptive records, as identified during the RC/RC collections survey, in aggregate. We also hoped to create tools that could assist collections managers in analyzing and understanding collection metadata, with the ultimate goal of creating digital analysis tools that could assist in identifying and updating collection descriptions, and which could be reused by other projects.
RC/RC researchers identified more than two hundred collection descriptions related to the Philippines. From these, we assembled a dataset of 247 finding aids for analysis.
These finding aids came from three collections on campus: the Bentley Historical Library, Special Collections Research Center, and the William L. Clements Library. All provided data in a text-based markup format known as Encoded Archival Description (EAD), which was shared in eXtensible Markup Language, a standard data format that is frequently used to share metadata. Graduate student Ella Li (School of Information) and faculty member Jesse Johnston (School of Information) worked together to develop tools to analyze the metadata. They created a series of interactive Jupyter notebooks, which use Python programming and various extended analysis modules to parse and analyze the metadata. The goal was to analyze the data, with the end goal of producing useful data visualizations, and to begin creating tools that could be used or repurposed to begin making changes to the descriptions. The resulting code is available, and can be reused or repurposed, through a series of interactive code examples now available on GitHub.