Perseus Annis Adventures

Perseus Annis environment enables syntactical searches in annotated Greek and Latin texts. However, the query syntax is neither simple nor self-evident.

For the patient (or impatient?), the Annis query language syntax — applied to another set of corpora — is here: ANNIS2 --- Search and Visualization in Multilevel Linguistic Corpora.

Others can read on here to try out my recipes for finding things in Perseus Annis' Greek and Latin corpora.

Syntactic annotation documentation for Latin is here: Guidelines for the Syntactic Annotation of Latin Treebanks

Nouns in nominative

Here is how I searched for nouns in nominative.

case="nominative" & POS="noun" & #1 _=_ #2

There turn out to be 212 annotated Cicero's nominatives. It seems that the clause & #1 _=_ #2 is obligatory (tried first without it, to no avail), and it seems to mean that the first and the second condition both apply to the same word.

There are 105 participles in accusative:

case="accusative" & POS="participle" & #1 _=_ #2
  1. How would you search for verbs?
  2. How would you search for participles in genitive?

Adverbs modifying verbs

Find all verbs modified by adverbs.

POS="verb" & POS="adverb" & #1 ->parent #2

In a syntactic tree, the verb (element #1) is “parent” of the adverb (element #2).

A nice variation:

POS="adjective" & POS="adverb" & #1 ->parent #2

Is there a noun governing an adverb?

Find subject and predicate (verb)

Find any word (form) which, in the sentence's tree, is subject of the verb:

form & POS="verb" & #2 ->parent[relation="SBJ"] #1

Discussion: first condition — find any form, find any verb. & connects. Select element number 2 if it is the parent in node (e. g. if it is connected with) element number 1, on the condition that their relationship is “subject” SBJ. The expression finds elements regardless of how many other words are between them (but in the same sentence).

Caesar has 99 such cases, Plato 252. On corpora larger than 10,000 tokens I get a timeout.

Find any nominative which is subject of the verb (not rocket-science syntactic experiment, I know):

case="nominative" & POS="verb" & #2 ->parent[relation="SBJ"] #1

Plato has 178 results (on 6097 tokens in the corpus).

Find any participle which is subject of the verb — you get the idea:

POS="participle" & POS="verb" & #2 ->parent[relation="SBJ"] #1

Well, the Plato corpus contains 13 such cases. And quite thorny, at that — have to figure out how to deal with predicative expressions.

Variations

1. Find subjects in nominative

POS="noun" & 
case="nominative" & 
POS="verb" & 
#1 _=_ #2 & 
#3 ->parent[relation="SBJ"] #2

1.a Find subjects in nominative, predicates in indicative

POS="noun" & 
case="nominative" & 
POS="verb" & 
mood="indicative" & 
#1 _=_ #2 & 
#3 = #4 & 
#3 ->parent[relation="SBJ"] #2

2. Find SPO structures with direct object in accusative, predicate in indicative, subject in nominative

POS="noun" & case="nominative" & 
POS="verb" & mood="indicative" & 
POS="noun" & case="accusative" & 
#1 _=_ #2 & 
#3 _=_ #4 & 
#5 _=_ #6 & 
#3 ->parent[relation="SBJ"] #2 & 
#3 ->parent[relation="OBJ"] #5

Tip of the day — check out the “Arch Dependency” tab beneath each result, they're great and useful.

Find predicate nominals (subject complements)

The following Annis query:

form & 
LEMMA="sum" & 
#2 ->parent[relation="PNOM"] #1

finds sentences of type “Sapientes beati sunt”.

Relative clauses as subjects

The query:

form & 
form & 
form & 
#1 ->parent[relation="SBJ"] #2 & 
#2 ->parent[relation="SBJ"] #3

finds sentences such as fuere qui crederent. We can make the confusing point (verb as SBJ) even more prominent:

POS="verb" & 
POS="verb" & 
form  & 
#1 ->parent[relation="SBJ"] #2 & 
#2 ->parent[relation="SBJ"] #3

Searching for Greek words

Greek has to be entered in Unicode, with accents. This query for the form δικάζου won't produce any results on the Plato corpus:

form="δικαζου"

Betacode doesn't work either:

form="dika/zou"

This search, however, finds one occurrence:

form="δικάζου"

Search for all forms of δικάζω (two in the Plato corpus):

LEMMA="δικάζω"

Find only the participles of δικάζω (there is exactly one — I proudly use what I already learned):

LEMMA="δικάζω" & POS="participle" & #1 _=_ #2

This one (with the operator = instead of _=_) produces the same result in this context. Should read up on Annis operators.

LEMMA="δικάζω" & POS="participle" & #1 = #2

Finding specific formation

We want to find phrases of type φίλοι γάρ εἰσιν.

case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #2 . #3

This search finds 8 results in the Aeschylus corpus.

It seems that an Annis query must be written in pairs (#1 . #2 & #2 . #3) – the version #1 . #2 . #3 is not valid.

Find phrases like the one above, but with nominative as the subject:

case="nominative" & LEMMA="γάρ" & LEMMA="εἰμί" & #1 . #2 & #3 ->parent[relation="SBJ"] #1

Plato corpus – 1 result, in others I get a timeout.

Find attributes and governing nouns (or whatever):

form & form & #2 ->parent[relation="ATR"] #1

786 results in Plato corpus, including phrases such as ἐν Λυκείῳ.

The other way around:

form & form & #1 ->parent[relation="ATR"] #2

Produces 786 results as well, but in different order (ἐμὸς πατήρ comes first now).

Find all attributive phrases with πατήρ:

form & LEMMA="πατήρ" & #2 ->parent[relation="ATR"] #1

20 results in Plato. ἐμὸς πατήρ is one, ὁ (ἐμὸς) πατήρ another. (Should be able to get multiple attributes?)

arity!

While most of the categories offered by Perseus Annis seem familiar from classroom, the strangely named “arity” operator is something else. It is a “meta-operator” which, given the “arity number”, selects only search terms that govern exactly so many other words and sentence elements.

E. g. to find all verbs governing four other elements:

POS="verb" & #1:arity=4

In the Cicero corpus there are ninety such situations. By studying the Arch Dependency diagrams, you'll discover that a comma can also be governed by the given element.

Now, one of verbs governing four elements is “interficio”. If we want to concentrate on forms of interficio governing four elements, we do it like this:

LEMMA="interficio" & #1:arity=4

Can you decode what is found by this search? (If not, try pasting it into Annis search interface!)

POS="noun" & #1:arity=4

Magnis in periculis

Our colleague Šime Demo (Croatian Studies, University of Zagreb) thought of a beautiful search:

POS="adjective" & POS="preposition" & POS="noun" 
& #3 ->parent #1 & #2 -> parent #3 
& #1 .* #2 & #2 .* #3

This searches for prepositional phrases of the type “magnis in periculis”, i. e. with preposition interposed (adjective – preposition – noun). On the currently available Latin corpus, the phrase sharply distinguishes prose (Caesar, Cicero, Sallust, Petronius) from poetry (Propertius, Vergil). Lots of it in poetry, rarely in prose. Šime, thanks!

Study coordination

A seemingly simple and self-explanatory syntactical relationship is coordination. However, for treebank notation it has to be learned a little differently.

Take a simple Latin sentence made up of three clauses, connected asyndetically (just with commas):

Ego scribo, tu legis, ille pingit.

In treebank notation, here the root of the sentence is the (first) comma; on it are dependent the three predicates and the other comma (the full stop is on the same level as the root).

Here is an Annis QL query that finds all kinds of coordination, finding root and its child connected through “COORD” relationship, or arc:

form & 
form & 
#1  ->parent[relation="COORD"] #2

We can modify the query to find (“filter”) just commas as roots:

form="," & 
form & 
#1  ->parent[relation="COORD"] #2

A similar, but more complex case from the Caesar corpus annotated in Perseus is part of Caes. Gal. 2.33:

ad Venetos, Venellos, Osismos, Coriosolitas, Esuvios, Aulercos, Redones, quae sunt maritimae civitates Oceanumque attingunt

Absolutely, Cicero seems to have more cases of coordinating comma than Caesar. But, since in the Perseus annotated corpus, Cicero's 6229 tokens yield 12 cases, while Caesar's 1488 yield 5, relative ratio is actually 0.19 percent for Cicero — 0.33 for Caesar. Sallust, who has 27 cases on 12311 tokens, is between Cicero and Caesar, with 0.2 percent of his corpus. Jerome, with 8382 tokens, seems to have zero coordinating commas, which is slightly strange.

Using Annis corpus to test annotations

Problem: a difficult sentence has to be syntactically annotated.

Quicquid oritur, causam habeat a natura necesse est

(C. div. 2, 60)

A proposed annotation is here. But is it correct?

To test it, we write an Annis query and see if there is anything similar in the annotated corpora:

POS="verb" & 
POS="verb" & 
#1 ->parent[relation="SBJ"] #2

(“Find two verbs of which one governs the other, and their relationship is labeled as “SBJ”.)

This search finds, among other results, the well-known passage from Cicero's In Catilinam:

quod eam [sicam] necesse putas esse in consulis corpore defigere

(Cic. Catil. 1, 16)

I guess this confirms my annotation.

Simple sentences for annotating exercises

1. Sentences which already have been annotated

Annotations can be found in Perseus Annis.

  1. rumores adferebantur
  2. crebri ad eum rumores adferebantur
  3. Remi primos civitatis miserunt
  4. … si suas copias Haedui in fines Bellovacorum introduxerint…
  5. hi omnes nuntiaverunt…
  6. qui moleste ferebant
  7. qui finitimi Belgis erant
  8. Germanos in Gallia versari noluerant
  9. Caesar ad se adduci iussit
  10. qui novis imperiis studebant
  11. Q. Pedium legatum misit ipse
  12. cum primum pabuli copia esse inciperet…
  13. coniurandi has esse causas…
  14. … uti ea quae apud eos gerantur cognoscant
  15. … reliquos omnes Belgas in armis esse
  16. … omnem senatum ad se convenire
  17. … exercitum in unum locum conduci

2. Sentences without previous annotation

Some exercises done by NJ.

  1. mane surgo (cf. annotation) — Annis query:
    POS="verb" & POS="adverb" & #1 ->parent[relation="ADV"] #2

    Or, even more precisely:

    POS="verb" & POS="adverb" 
    & #1 ->parent[relation="ADV"] #2 & #2 . #1
  2. sol ortus est (cf. annotation)
  3. surrexit de lecto (cf. annotation)
  4. vigilavit heri diu (cf. annotation)
  5. vesti me (cf. annotation)
  6. da mihi calciamenta et udones et bracas (cf. annotation)
  7. iam calciatus sum
  8. adfer aquam manibus
  9. manus sordidae sunt
  10. iam lavi meas manus et faciem
  11. adhuc non tersi
  12. procedo foris de cubiculo
  13. vado in scholam

Additional exercise: using Annis notation, try to find similar sentences in the Perseus annotated corpus.


  1. satis declarauit Dionysius
  2. Despotus VI. mille equites gratia praesidii Smederovo reliquerat
  3. rex qui maximas copias duxit ad Troiam
  4. magister cum omnibus classiariis ad oppidum tendit
  5. Nec uos quidem, iudices timueritis
  6. animus grauioribus curis sedulo coquitur
  7. Stephanus Malipetrus et Victor Soprantius ad imperatorem se mature conferunt
  8. Copias uero quas adduxi tuum imperium sequentur (!)
  9. Milites ac turba omnis uenationi incumbit
  10. adolescens quidam, Dalmata natione et lingua, urso mirae magnitudinis occurrit
  11. Cadit itaque ubique magnus numerus ferarum
  12. imperator in Clazomeniorum agro copias exposuit
  13. Myra fuit ciuitas Lyciae

Sentences from the ”new Menge” (taken from classical authors):

  1. Argumenta plus quam testes valent .
  2. Effluit voluptas corporis et prima quaeque avolat .
  3. Consul ego nuper defendi C. Pisonem ; qui , quia consul fortis fuerat , incolumis est rei publicae conservatus .
  4. Furti damnatus est .
  5. Huic legioni maxime confidebat .
  6. Illi pictores non sunt usi plus quam quattuor coloribus .
  7. Si Fabio laudi datum esset , quod pingeret , etiam apud Romanos multi Polycleti et Parrhasii fuissent .
  8. Illi dimicare non ausi turpiter se in castra receperunt .
  9. Cum quaepiam cohors impetum fecerat , hostes velocissime refugiebant .
  10. Alacris exsultat improbitas in victoria .
  11. Ordiamur ab eo , quod primum posui .
  12. Invidetur commodis hominum .
  13. Cn. Pompeius est omnium gentium , omnium saceulorum facile princeps .
  14. Est apud Platonem Socrates , cum esset in custodia publica , dicens Critoni sibi post tertium diem esse moriendum.
  15. De insidiis celare te noluit .
  16. Omnes immemorem beneficii oderunt .
  17. Multi alacres exspectant .
  18. Milites amplius horis quattuor fortissime pugnabant .

Sentences from Pinkster

Source: Pinkster, Harm (1942-) [1990], Latin Syntax and Semantics, xii, 320 p.

  1. pater filium laudat .
  2. ovum ovo simile est .
  3. Alexander erat rex Macedonum .
  4. pater ambulat .
  5. pater hostibus timorem iniecit .
  6. interea ea legione quam secum habeat murum fossamque perducit .
  7. num stulte anteposuit exilii libertatem domesticae servituti ?
  8. Narbonensis provincia amplitudine opum nulli provinciarum postferenda breviter que Italia verius quam provincia .

Sentences from Caesar

Found with Annis QL: subject is a noun in nominative, predicate is verb in indicative, has direct object in accusative. Examples are shortened here (most of the sentence is omitted).

  1. Equites proelium commiserunt .
  2. Sectionem eius oppidi universa Caesar vendidit .
  3. Caesar VI legiones ducebat .
  4. Eorum fines Nervii attingebant .
  5. Copias Haedui introduxerint .

And with any word in nominative:

  1. qui facultates habebant .
  2. neutri initium faciunt .
  3. locum nostri castris delegerant .
  4. Illi eruptionem fecerunt .
 
z/perseus-annis.txt · Last modified: 30. 08. 2013. 22:20 by njovanov
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki