Noun chunking is the task of identifying a noun and the words directly related to it. More specifically, with respect to the LatinCy pipeline, it is using annotations of the POS-tagger (tagger) and the dependency parser (parser) to identify all children of a token tagged as NOUN. Unlike the NLP tasks in the preceding sections, this task cannot be associated directly with a pipeline component; rather it is a special Span case defined in the spaCy language model itself under syntax iterators.
# Imports & setupimport spacyfrom pprint import pprintnlp = spacy.load('la_core_web_lg')text ="Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur."doc = nlp(text)print(doc)
Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur.
As defined in the syntax_iterators, noun chunks become attributes of the spaCy Doc object. Noun chunks will return any token span for which the dependency head has the NOUN tag. A one-word noun chunk is just the noun itself, so not of particular interest. So, really what we are looking for are longer spans of noun-related text.
# Print the noun chunks in the textfor noun_chunk in doc.noun_chunks:iflen(noun_chunk) >1:print(noun_chunk)