Arthur Koehl

I am a data scientist and historian working at the UC Davis DataLab. At the lab I collaborate on interdisciplinary research projects, teach workshops, and mentor interns. This page presents my main digital projects.

email: avkoehl at ucdavis dot edu

Getty Foundation Award - Develop Software Platform for Museums and Libraries to Aid in Cataloging Historical Works

With Carl Stahmer
not yet started

A grant by the Getty Foundation to build a tool to assist museums and libraries working to catalog massive collections of early modern prints. The goal is to create a network of member organizations that share their collections in a single searchable pool. The shared collection will then be examined with Archive-vision and other digital art history tools. The combined image and meta data will then be analyzed to explore questions of similarity and dissimilarity from humanist and image processing perspectives.

Quintessence - Dynamic Corpus Exploration of EEBO-TCP Using Cutting Edge NLP

With Samuel Pizelo

Seamlessly integrating state-of-the-art data analytics and Natural Language Processing tools with dynamic corpus exploration, Quintessence seeks to add dimension to the study of English texts from the Early Modern period. Based on the English texts in the EEBO-TCP archive and adorned with Northwestern University’s MorphAdorner, our corpus of approximately sixty thousand texts allows for in-depth computational analysis of Early Modern print at varying scales to scholars.


Bright and Blind Spots of Water Research in Latin America and the Caribbean

With Alyssa J. DeVincentis, Hervé Guillon, Romina Díaz Gómez, Noelle K. Patterson, Francine van den Brandeler, J. Pablo Ortiz-Partida, Laura E. Garza-Díaz, Jennifer Gamez-Rodríguez, Erfan Goharian and Samuel Sandoval Solis

Water resources management is threatened by climatic, economic, and political pressures, and these challenges are on particular display in Latin America and the Caribbean. To assess the region’s ability to manage water resources, we conducted an unprecedented literature review of over 20,000 multilingual research articles using machine learning and an understanding of the socio-hydrologic landscape.

github: hrvg/wateReview

Malgre-Nous - A Digital Initiative to Represent Their Story


Between 1942 to 1945, approximately 130,000 young men from Moselle and Alsace were conscripted into the German Army. They are called the "Malgre-Nous" (despite-us in English) in reference to the fact that they were fighting against their will for a country that was not their own. Most representations of the Malgre-Nous focus on their experiences on the eastern front, relying on personal memoirs and interviews. This web project takes a different approach to presenting the story of the Malgre-Nous. It seeks to lay the foundations for a more complete view of their experience by using data, maps, and visualizations to generate a more comprehensive picture of their presence on the multiple fronts in Europe during the Second World War.

dataset: avkoehl/malgre-locations

Archive-Vision - CBIR for Early Modern Image Sets

With Carl Stahmer

The digitization of historical image sets has vastly outpaced our ability to meaningfully search those sets. Archive-Vision (Arch-V) allows users to query a library of images using an existing image as a seed. Arch-V uses SURF feature detection and description to find scale and rotation invariant keypoints on images. The keypoints of the seed image are then compared with those of the images in the set. The matching keypoints are then filtered using their geometric descriptors and statistical methods. Using the robust matching keypoints, the best matching images to the seed image are found.

See in action on the English Broadside Ballad Association at UCSB!

github: avkoehl/archive-vision