3  LatinCy components and their annotations

3.1 LatinCy components and their annotations

# Imports & setup

import spacy
import pandas as pd
from pprint import pprint
nlp = spacy.load('la_core_web_lg')
text = "avus eius Acrisius appellabatur."
doc = nlp(text)
print(doc)
avus eius Acrisius appellabatur.

Here are the components provided by the default LatinCy models…

pprint(nlp.pipe_names)
['senter',
 'normer',
 'tok2vec',
 'tagger',
 'morphologizer',
 'trainable_lemmatizer',
 'parser',
 'lookup_lemmatizer',
 'ner']

The dataframe below summarizes the key annotations provided by these components…

data = []

for token in doc:
    data.append([token.text, token.norm_, token.lemma_, token.pos_, token.tag_, token.morph.to_json(), token.dep_, token.ent_type_, token.has_vector])

df = pd.DataFrame(data, columns=['text', 'norm', 'lemma', 'pos', 'tag', 'morph', 'dep', 'ent_type', 'has_vector'])

df
text norm lemma pos tag morph dep ent_type has_vector
0 avus auus auus NOUN noun Case=Nom|Gender=Masc|Number=Sing nsubj:pass True
1 eius eius is PRON pronoun Case=Gen|Gender=Masc|Number=Sing|Person=3 nmod True
2 Acrisius acrisius Acrisius PROPN proper_noun Case=Nom|Gender=Masc|Number=Sing xcomp PERSON True
3 appellabatur appellabatur appello VERB verb Mood=Ind|Number=Sing|Person=3|Tense=Imp|VerbFo... ROOT True
4 . . . PUNCT punc punct True