3  LatinCy components and their annotations

3.1 LatinCy components and their annotations

# Imports & setup

import spacy
import pandas as pd
from pprint import pprint
nlp = spacy.load('la_core_web_lg')
text = "avus eius Acrisius appellabatur."
doc = nlp(text)
print(doc)
/Users/pjb311/.venvs/latincy/lib/python3.11/site-packages/spacy/util.py:910: UserWarning: [W095] Model 'la_core_web_lg' (3.8.0) was trained with spaCy v3.8.3 and may not be 100% compatible with the current version (3.7.5). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
avus eius Acrisius appellabatur.

Here are the components provided by the default LatinCy models…

pprint(nlp.pipe_names)
['senter',
 'normer',
 'tok2vec',
 'tagger',
 'morphologizer',
 'trainable_lemmatizer',
 'parser',
 'lookup_lemmatizer',
 'ner',
 'remorpher']

The dataframe below summarizes the key annotations provided by these components…

data = []

for token in doc:
    data.append([token.text, token.norm_, token.lemma_, token.pos_, token.tag_, token.morph.to_json(), token.dep_, token.ent_type_, token.has_vector])

df = pd.DataFrame(data, columns=['text', 'norm', 'lemma', 'pos', 'tag', 'morph', 'dep', 'ent_type', 'has_vector'])

df
text norm lemma pos tag morph dep ent_type has_vector
0 avus auus auus NOUN noun Case=Nom|Gender=Masc|Number=Sing nsubj:pass True
1 eius eius is PRON pronoun Case=Gen|Gender=Masc|Number=Sing|Person=3 nmod True
2 Acrisius acrisius Acrisius PROPN proper_noun Case=Nom|Gender=Masc|Number=Sing xcomp PERSON True
3 appellabatur appellabatur appello VERB verb Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbF... ROOT True
4 . . . PUNCT punc punct True