3 LatinCy components and their annotations

3.1 LatinCy components and their annotations

# Imports & setup

import spacy
import pandas as pd
from pprint import pprint
nlp = spacy.load('la_core_web_lg')
text = "avus eius Acrisius appellabatur."
doc = nlp(text)
print(doc)

/Users/pjb311/.venvs/latincy/lib/python3.11/site-packages/spacy/util.py:910: UserWarning: [W095] Model 'la_core_web_lg' (3.8.0) was trained with spaCy v3.8.3 and may not be 100% compatible with the current version (3.7.5). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

avus eius Acrisius appellabatur.

Here are the components provided by the default LatinCy models…

pprint(nlp.pipe_names)

['senter',
 'normer',
 'tok2vec',
 'tagger',
 'morphologizer',
 'trainable_lemmatizer',
 'parser',
 'lookup_lemmatizer',
 'ner',
 'remorpher']

The dataframe below summarizes the key annotations provided by these components…

data = []

for token in doc:
    data.append([token.text, token.norm_, token.lemma_, token.pos_, token.tag_, token.morph.to_json(), token.dep_, token.ent_type_, token.has_vector])

df = pd.DataFrame(data, columns=['text', 'norm', 'lemma', 'pos', 'tag', 'morph', 'dep', 'ent_type', 'has_vector'])

df

	text	norm	lemma	pos	tag	morph	dep	ent_type	has_vector
0	avus	auus	auus	NOUN	noun	Case=Nom\|Gender=Masc\|Number=Sing	nsubj:pass		True
1	eius	eius	is	PRON	pronoun	Case=Gen\|Gender=Masc\|Number=Sing\|Person=3	nmod		True
2	Acrisius	acrisius	Acrisius	PROPN	proper_noun	Case=Nom\|Gender=Masc\|Number=Sing	xcomp	PERSON	True
3	appellabatur	appellabatur	appello	VERB	verb	Mood=Ind\|Number=Sing\|Person=3\|Tense=Past\|VerbF...	ROOT		True
4	.	.	.	PUNCT	punc		punct		True