Throughout this book, we have annotated many texts and often reported these annotations in simple lists and tables. But it can be useful for for some tasks to see the annotations and how they relate to our texts. SpaCy includes some basic visualization tools through a package called displacy that can help us accomplish this goal.
There are two primary types of annotations that displaCy can help us visualize: dependency parses and named entities. This notebook demonstrates both types of annotations.
# Imports & setupimport spacyfrom tabulate import tabulatefrom pprint import pprintnlp = spacy.load('la_core_web_lg')text ="Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur."doc = nlp(text)print(doc)
Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur.
13.1.1 Visualizing dependency parses
# importfrom spacy import displacy
Let’s take the first sentence from our sample text and list the relevant dependency annotations…
Token Dep Head
-- --------- ---------- ---------
0 Haec nsubj:pass narrantur
1 narrantur ROOT narrantur
2 a case poetis
3 poetis obl:agent narrantur
4 de case Perseo
5 Perseo obl narrantur
6 . punct narrantur
The table above reflects all of the relevant dependencies, but again in a list format, it can be hard to trace these relationships. Here is where a visualization can be useful…
We can also use displaCy to visualize named entity annotations. As we read in the previous chapter on the NER component, the LatinCy models currently support named entity annotations for the following entity types: PERSON, LOC, and NORP (i.e. groups of people). As with the dependency parses, let’s first present entity annotations in a table format…
# Extract entitiestext ="""Iason et Medea e Thessalia expulsi ad urbem Corinthum venerunt, cuius urbis Creon quidam regnum tum obtinebat."""text = text.replace("v","u").replace("V","U")doc = nlp(text)data = []tokens = [item for item in doc]for token in tokens: data.append([token.text, token.ent_type_]) print(tabulate(data, headers=['Text', "Entity Type"]))
Text Entity Type
--------- -------------
Iason PERSON
et
Medea PERSON
e
Thessalia LOC
expulsi
ad
urbem
Corinthum LOC
uenerunt
,
cuius
urbis
Creon PERSON
quidam
regnum
tum
obtinebat
.
The disadvantages of this type of presentation are not hard to notice. It lacks compactness and perhaps more importantly there is a lack of clear visual distinction between the different entity types. By contrast, displaCy can show us the entities in clear running text with each entity type color-coded for maximum visual impact.
# Visualizing named entitiesdisplacy.render(doc, style="ent", jupyter=True)
Iason
PERSON
et
Medea
PERSON
e
Thessalia
LOC
expulsi ad urbem
Corinthum
LOC
uenerunt, cuius urbis
Creon
PERSON
quidam regnum tum obtinebat.
References
spaCyVisualizers MS Chapter 1 “Getting started with SpaCy”, section “Visualization with displaCy”