Patrick J. Burns
Associate Research Scholar, Digital Projects @ Institute for the Study of the Ancient World / NYU | Formerly Culture Cognition, and Coevolution Lab (Harvard) & Quantitative Criticism Lab (UT-Austin) | Fordham PhD, Classics | LatinCy developer
The Role of “Small” Models for Ancient NLP in a World of Large Language Models
Abstract for invited talk at ALP2025
Abstract
In the field of Latin natural language processing, there are tasks for which competitive, if not state-of-the-art, performance is exhibited by large language models like GPT 4o or Claude. Yet as opposed to modern English (for which that statement may also be arguably true), there are some Latin NLP tasks like coreference resolution or automatic question generation for which work on smaller, task-specific models is either just underway or does not yet exist. LLMs in this case have “skipped” steps on a path of continuous development and improvement. I argue in this talk that, while we should take advantage of such LLM advancements in Latin NLP, some significant part of our attention should also be directed backwards on filling in these skipped steps. By returning to and focusing again on “small” language models—including everything from rigorously evaluated and field-tested static embeddings models to the last iterations of smaller LLMs like BERT models, most especially those with custom task-specific heads—we can promote a culture of interpretable and explainable philology: interpretable, following Russell and Norvig (Artificial Intelligence 4th ed. [2021], p. 711-12), because these smaller models—from their training data to their configuration and parameterization—can be directly inspected, and explainable because such models allow us to maintain an understanding of how specific outputs result from specific inputs. In sum, I argue that, although LLMs will serve (and serve well) short-terms interests in ancient-language NLP, we should redouble our efforts—through data curation, through attention to model parameterization, and through competitive evaluation (like shared tasks)—to develop smaller models equally up to our biggest language challenges. While my talk will use Latin as its ancient-language focus, the conclusion will discuss ways to adapt lessons learned to other ancient languages, such as Ancient Greek, Akkadian, and Middle Egyptian, among others.