Using Machine Learning to Improve Rule based Arabic Named Entity Recognition

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The British University in Dubai (BUiD)
Arabic Language is widely spoken and highly influential Language both politically and geographically. Thus it is crucial to perform Information Extraction on diverse Arabic texts. In past decade many researchers have targeted the Information Extraction in general and Named Entity Recognition in particular for Arabic language. Mostly researchers have applied Machine Learning for Arabic Named Entity Recognition while few researchers have used hand crafted rules for Named Entity Recognition task.The Machine Learning techniques and rule based techniques for named entity recognition are mostly viewed as rival approaches. The work presented in this thesis is an effort to combine rule based and Machine Learning approaches into a Hybrid System for Named Entity Recognition. The Person, Organization and Location entities identified by rule based system are used as features combined with several other features for Machine Learning system. The final outcome provides enhanced Named Entity annotations.The evaluation of the experiments conducted shows that the Hybrid approach stated in thesis significantly improves the quality of named entity recognition of independent rule based system and independent Machine Learning system. Moreover the statistical significance tests confirms that the results obtained are valid and not occurred by chance.
information extraction, named entity recognition, machine learning, named entity annotation