Using Machine Learning to Improve Rule based Arabic Named Entity Recognition

Loading...
Thumbnail Image
Date
2011-01
Journal Title
Journal ISSN
Volume Title
Publisher
The British University in Dubai (BUiD)
Abstract
Arabic Language is widely spoken and highly influential Language both politically and geographically. Thus it is crucial to perform Information Extraction on diverse Arabic texts. In past decade many researchers have targeted the Information Extraction in general and Named Entity Recognition in particular for Arabic language. Mostly researchers have applied Machine Learning for Arabic Named Entity Recognition while few researchers have used hand crafted rules for Named Entity Recognition task.The Machine Learning techniques and rule based techniques for named entity recognition are mostly viewed as rival approaches. The work presented in this thesis is an effort to combine rule based and Machine Learning approaches into a Hybrid System for Named Entity Recognition. The Person, Organization and Location entities identified by rule based system are used as features combined with several other features for Machine Learning system. The final outcome provides enhanced Named Entity annotations.The evaluation of the experiments conducted shows that the Hybrid approach stated in thesis significantly improves the quality of named entity recognition of independent rule based system and independent Machine Learning system. Moreover the statistical significance tests confirms that the results obtained are valid and not occurred by chance.
Description
DISSERTATION WITH DISTINCTION
Keywords
information extraction, named entity recognition, machine learning, named entity annotation
Citation