Please use this identifier to cite or link to this item: https://bspace.buid.ac.ae1234/58
Title: Using Machine Learning to Improve Rule based Arabic Named Entity Recognition
Authors: Shoaib, Muhammad
Keywords: information extraction
named entity recognition
machine learning
named entity annotation
Issue Date: Jan-2011
Publisher: The British University in Dubai (BUiD)
Abstract: Arabic Language is widely spoken and highly influential Language both politically and geographically. Thus it is crucial to perform Information Extraction on diverse Arabic texts. In past decade many researchers have targeted the Information Extraction in general and Named Entity Recognition in particular for Arabic language. Mostly researchers have applied Machine Learning for Arabic Named Entity Recognition while few researchers have used hand crafted rules for Named Entity Recognition task.The Machine Learning techniques and rule based techniques for named entity recognition are mostly viewed as rival approaches. The work presented in this thesis is an effort to combine rule based and Machine Learning approaches into a Hybrid System for Named Entity Recognition. The Person, Organization and Location entities identified by rule based system are used as features combined with several other features for Machine Learning system. The final outcome provides enhanced Named Entity annotations.The evaluation of the experiments conducted shows that the Hybrid approach stated in thesis significantly improves the quality of named entity recognition of independent rule based system and independent Machine Learning system. Moreover the statistical significance tests confirms that the results obtained are valid and not occurred by chance.
Description: DISSERTATION WITH DISTINCTION
URI: http://bspace.buid.ac.ae/handle/1234/58
Appears in Collections:Dissertations for Informatics (Knowledge and Data Management)

Files in This Item:
File Description SizeFormat 
thesis.pdfFull Text1.38 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.