Penn Arabic Treebank
Arabic Part-of-Speech/Morphological annotation
Arabic Syntactic/Predicate-Argument annotation
- TreeBank, pass 1: initial, partial annotation, primarily in place for training annotators and to improve speed for pass 2
- TreeBank, pass 2: fully annotated trees.
Guidelines
- TreeBank, pass 3: quality control
We use Tim Buckwalter's lexicon and morphological analyzer, which is available from LDC, catalog number LDC2002L49.
Tim Buckwalter's transliteration system, which we use, is explained here.
Click here for instructions on how to use TreeEditor, our AG-based constituent structure annotation tool by Hubert Jin. We also use an AG-based tool for the morphological tagging, SelectPOS.
January 28, 2003 -- release of the Arabic Treebank: Part 1 v 2.0 through the Linguistic Data Consortium.
- Treebank Guidelines for the release. Or click here for a .pdf version of these guidelines (you can search these guidelines using the "Find" button in Acrobat -- use the Acrobat "Find" not your webbrowser's "Find"!).
- POS Guidelines for the release.
- General information about the corpus and the release.
- Technical information about the corpus.
- Information about the POS annotation process.
- Information about the treebanking annotation proces.
- A guide to collapsing the morphological/POS tags in our Arabic Treebank into the Penn English Treebank POS tagset.
- Please contact the Linguistic Data Consortium for information on acquiring and using the Arabic Treebank corpus, LDC catalog number LDC2003T06.
Other Treebanks:
Penn Treebank (English)
You can search these guidelines using the "Find" button in Acrobat (use the Acrobat "Find" not your webbrowser's "Find"!):
Penn Parsed Corpus of Middle English (PPCME)
Penn Chinese Treebank
Penn Korean Treebank
Prague Dependency Treebank
An excellent and extensive introduction to syntax by Beatrice Santorini: An introduction to syntactic theory
Please send e-mail to
Ann Bies
if you have any questions, comments, additions, etc.