Patrick J. Burns
Associate Research Scholar, Digital Projects @ Institute for the Study of the Ancient World / NYU | Formerly Culture Cognition, and Coevolution Lab (Harvard) & Quantitative Criticism Lab (UT-Austin) | Fordham PhD, Classics | LatinCy developer
The Future of Ancient Literacy: Classical Language Toolkit and Google Summer of Code
Article in submission for Classics@
forthcoming
Abstract
The Classical Language Toolkit (CLTK) is a collection of software and texts researchers bringing natural language processing to the languages of ancient, classical, and medieval Eurasia and North Africa. This essay chronicles the CLTK’s participation in the 2016 Google Summer of Code, a program run by Google to encourage the growth of open source software. Google pays a stipend to student programmers, who in turn contribute code to an approved project between the months of May and August. GSoC accepted the CLTK and allotted it funding slots for two students. The CLTK, having received over 100 student applications, chose Patrick Burns (then a doctoral student in Classics at Fordham University) and Suhaib Khan (an undergraduate at the Netaji Subhas Institute of Technology). Patrick proposed to write a multiple-pass, rules-based lemmatizer for the Latin language. Suhaib’s proposal was to rework a codebase for the Classical Language Archive, a front-end JavaScript application for use as a reading environment by non-programmers. Kyle Johnson acted as the supervisor for the former project and Luke Hollis for the latter. We first offer a brief introduction to the CLTK and then turn to the two summer projects, what each of the three are and the motivations behind their creation. We conclude with a statement of how we envision the entire CLTK ecosystem working together to offer readers of Latin and Greek a presentation of texts and supporting materials not available in print editions.