Arabic Syntactic/Predicate-Argument Annotation

For the most part, our syntactic/predicate-argument annotation of newswire Arabic follows the bracketing guidelines for the Penn Treebank (English) where possible.

Some points where the Arabic Treebank differs from the Penn English Treebank:

Some examples of annotated trees and how they look in the annotation tool can be seen here.

Basic sentence structure

The sentence (S) is at the top level of structure (each "paragraph" also has a Paragraph label above any other brackets). The subject (labeled NP-SBJ) is inside VP after verb. If the subject precedes the verb, it is labeled NP-TPC and traced to (NP-SBJ *T*) following the verb. All sentences have a subject (-SBJ) and a predicate (VP or -PRD). (NB: The VP is often same as the S, if nothing precedes the verb.)

1. A simple sentence with NP subject following the verb:

2. A simple sentence with pro-drop:

3. An "equational" sentence with an adjectival predicate:

Node labels and functional "dashtags":

Coordination is done as adjunction (Z (Z ) and (Z )); coordination has the same structure at all phrase levels.
4. This is an example of NP coordination:

VP arguments and adjuncts

As in the Penn English Treebank, the distinction between arguments and adjuncts of the verb or verb phrase is made through the use of functional dashtags rather than with a structural difference. Both arguments and adjuncts are children of the VP node. No distinction is made between VP-level modification and S-level modification. All constituents that appear before the verb are children of S and sisters of VP; all constituents that appear after the verb are children of VP.

ARGUMENTS of the verb are: NP-SBJ, NP-OBJ, SBAR (no dashtag or -NOM-SBJ/OBJ), S (no dashtag or -NOM-SBJ/OBJ), PP-DTV, PP-CLR (closely/clearly related -- a PP the annotator's intuition says is an argument, though it doesn't fall into one of the official argument categories).

ADJUNCTS are: any XP with any other adverbial dashtag, PP (no dashtag), ADVP (no dashtag).

5. In this example, the NP-SBJ is the subject, NP-OBJ is the object of the verb, and NP-TMP is an adverbial (temporal) NP:

NP arguments and adjuncts

The argument/adjunct distinction is shown structurally inside NPs. Argument constituents are children of NP, sister to the head noun: (NP head (NP argument)). Adjunct constituents are sister to the NP that contains the head noun, child of the NP that contains both: (NP (NP head) (NP adjunct)).

Arguments are genitive, possessive, or (for deverbal head nouns) constituents that would be arguments of the verb that the noun derived from.
Adjuncts are all other modifiers of the NP.

6. NP with NP argument -- the NP argument (NP maHal~) "(of) place" is a sister of the head noun SAHib "owner" itself:

7. NP with PP adjunct -- the NP containing the head noun (NP Al+mu$ar~adi+iyona) "the homeless" and the PP adjunct (PP-LOC fiy...) "in..." are sisters, both children of a containing NP:

Empty categories

The empty categories are essentially the same as in the Penn English Treebank. The most common being

As in the Penn Treebank, we are not showing any pronominal coreference. Coreference will be indicated only for empty categories and exceptional cases such as VP gapping structures.

8. A simple sentence with pro-drop:

9. A topicalized NP subject trace:

Clitics

Clitics that play a role in the syntactic structure are split off into separate tokens (e.g., object pronouns cliticized to verbs, subject pronouns cliticized to complementizers, cliticized prepositions, etc.). Clitics that do not affect the structure are not separated (e.g., determiners).

10. PP with a cliticized object pronoun, split apart so that the NP can be shown:

11. Subject pronoun cliticized to a complementizer, split so that the structure can be shown:


Please send e-mail to Ann Bies if you have any questions, comments, additions, etc.