Arabic Part-of-Speech Tagging

A few things to keep in mind when part-of-speech tagging Arabic text:

How to tag an Arabic word:

Dealing with problematic cases:

POS Questions & Answers

Divided/compound proper names in Arabic (Abdul Ahmed, e.g.): Label all parts of the name with the "Is a name" button.

Idioms: (for example, in what in them = 'included'): Label each word independently for its own part of speech (ignore the idiomatic meaning).
Don't focus too much on the translation/gloss. The gloss is useful and important as an indicator of what the tag is if other structural indicators don't tell you, but it is not so important in and of itself. Put something in the comment line if the gloss is really bad, but if the gloss is understandable as is, just let it go.

Wrong vowel: use the "Should be u" "Should be a" "Should be i" buttons. Don't worry about (and no need to comment on) which syllable has the wrong vowel at this point, since it will be obvious to the corrector.

Missing hamza: use the "Hamza problem" button.

Typos: use the "Typo" button.

Noun vs. Adjective: If the word is really an adjective, but there is no ADJ solution given, use the "NOUN -> ADJ" button. Similarly, if the word is really a noun, but there is no NOUN solution given, use the "ADJ -> NOUN" button.


Please send e-mail to Ann Bies if you have any questions, comments, additions, etc.