Arabic Question Answering from diverse data sources

dc.Location2018 T 58.6 K43
dc.SupervisorProfessor Khaled Shaalan
dc.contributor.authorKHATER, FERAS
dc.date.accessioned2019-02-12T07:00:41Z
dc.date.available2019-02-12T07:00:41Z
dc.date.issued2018-07
dc.description.abstractCurrently, Arabic users are still forced to extract manually the accurate answers of their questions, which is a difficult task with a vast amount of information available on the Internet. Actually, the existing Arabic Question Answering (QA) systems do not meet the users’ needs in terms of performance and scope that cover all types of questions. The motivation behind this research is the need for new approaches to handle all types of questions and answer them beyond the factoid questions. Therefore, we present in this paper a new design of the linguistic approach to develop a reliable Arabic QA system and data source with the ability to address the following challenges: (i) handle both factoid and complex questions in Arabic language, (ii) extract the precise answer from available resources, (iii) evaluate the proposed QA system based on a gold standard data set, and (iv) provide an Arabic Corpus of Occupations (ACO) corpus that has been made freely and publicly available for research purposes. Our QA system is a web application that helps us to get an answer to the question posed from different data sources. Accordingly, we conducted experiments on a set of 230 question from the previously published resources, TREC, CLEF, and Arabic Corpus of Occupations (ACO) corpus. The system performance shows an average precision of 36%, by answering 72 questions, as well as the Recall was 78% and F-Measure was 51%. Besides, the aim that attracted us to build the Arabic Corpus of Occupations (ACO) corpus was the lack of free, annotated and large-scale Arabic resources that can be used in training and testing Arabic QA systems. In this paper, we provide ACO corpus of one million words written in Modern Standard Arabic (MSA). The corpus contains 700 occupations which are analyzed carefully and manually annotated. We use Cohen's Kappa coefficient method to evaluate the reliability of the tagged content. The corpus content has been tagged and assessed by two different groups of taggers. Accordingly, the inter-annotator agreement indicates that the reliability of ACO corpus is almost perfect agreement. As well as, the content of the corpus is highly confidence and reliable according to the result achieved by 90%.en_US
dc.identifier.other2015328041
dc.identifier.urihttps://bspace.buid.ac.ae/handle/1234/1326
dc.language.isoenen_US
dc.publisherThe British University in Dubai (BUiD)en_US
dc.subjectArabic Question Answering (QA) systemsen_US
dc.subjectdata sourcesen_US
dc.subjectArabic usersen_US
dc.subjectArabic languageen_US
dc.titleArabic Question Answering from diverse data sourcesen_US
dc.typeDissertationen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2015328041.pdf
Size:
2.41 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: