15 Visualizing Annotations

15.1 Visualizing Annotations with DisplaCy

Throughout this book, we have annotated many texts and often reported these annotations in simple lists and tables. But it can be useful for for some tasks to see the annotations and how they relate to our texts. SpaCy includes some basic visualization tools through a package called displacy that can help us accomplish this goal.

There are two primary types of annotations that displaCy can help us visualize: dependency parses and named entities. This notebook demonstrates both types of annotations.

# Imports & setup

import spacy
from tabulate import tabulate

from pprint import pprint
nlp = spacy.load('la_core_web_lg')
text = "Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur."
doc = nlp(text)
print(doc)

Haec narrantur a poetis de Perseo. Perseus filius erat Iovis, maximi deorum. Avus eius Acrisius appellabatur.

15.1.1 Visualizing dependency parses

# import

from spacy import displacy

Let’s take the first sentence from our sample text and list the relevant dependency annotations…

sents = list(doc.sents)

data = []

for token in sents[0]:
    data.append([token.i, token.text, token.dep_, token.head.text])
    
print(tabulate(data, headers=['','Token', 'Dep', 'Head']))

    Token      Dep         Head
--  ---------  ----------  ---------
 0  Haec       nsubj:pass  narrantur
 1  narrantur  ROOT        narrantur
 2  a          case        poetis
 3  poetis     obl:agent   narrantur
 4  de         case        Perseo
 5  Perseo     obl         narrantur
 6  .          punct       narrantur

The table above reflects all of the relevant dependencies, but again in a list format, it can be hard to trace these relationships. Here is where a visualization can be useful…

# visualize dependency parse

displacy.render(sents[0], style='dep', jupyter=True)

15.1.2 Visualizing named entities

We can also use displaCy to visualize named entity annotations. As we read in the previous chapter on the NER component, the LatinCy models currently support named entity annotations for the following entity types: PERSON, LOC, and NORP (i.e. groups of people). As with the dependency parses, let’s first present entity annotations in a table format…

# Extract entities

text = """Iason et Medea e Thessalia expulsi ad urbem Corinthum venerunt, cuius urbis Creon quidam regnum tum obtinebat."""
text =  text.replace("v","u").replace("V","U")

doc = nlp(text)

data = []
tokens = [item for item in doc]

for token in tokens:
    data.append([token.text, token.ent_type_])    

print(tabulate(data, headers=['Text', "Entity Type"]))

Text       Entity Type
---------  -------------
Iason      PERSON
et
Medea      PERSON
e
Thessalia  LOC
expulsi
ad
urbem
Corinthum  LOC
uenerunt
,
cuius
urbis
Creon      PERSON
quidam
regnum
tum
obtinebat
.

The disadvantages of this type of presentation are not hard to notice. It lacks compactness and perhaps more importantly there is a lack of clear visual distinction between the different entity types. By contrast, displaCy can show us the entities in clear running text with each entity type color-coded for maximum visual impact.

# Visualizing named entities

displacy.render(doc, style="ent", jupyter=True)

Iason PERSON et Medea PERSON e Thessalia LOC expulsi ad urbem Corinthum LOC uenerunt, cuius urbis Creon PERSON quidam regnum tum obtinebat.

References

spaCy Visualizers
MS Chapter 1 “Getting started with SpaCy”, section “Visualization with displaCy”