Improving performance of collaborative question answering systems by using semantic resources

Javed, Muhammad Arshad

Improving performance of collaborative question answering systems by using semantic resources

Files

120166.pdf (1.87 MB)

Date

2015-06

Authors

Javed, Muhammad Arshad

Publisher

The British University in Dubai (BUiD)

Abstract

In this modern age of technology, World Wide Web (WWW) provides us a platform to share the information with each other. People use different types of web applications for example online forums/blogs, portals for question answering, e-mail, and prompt messaging tools to collect and share their information and develop online communities. All these shared information on the web create a huge collection of data. This data is increasing day by day. Online social networks gather data from individual users and offer them to create link with other users of mutual interests in the same network. In this fashion, the social networks evolved as platforms to launch and uphold the social relationships in addition to share their knowledge and information. To manage such a large information, we need to use Information Retrieval (IR) techniques in efficient way. An Information Retrieval (IR) system retrieves the text related to the query of the user from massive collection of documents in real time. A document may comprise a collection of text, like a web page or an article. Information Retrieval system efforts to gratify the user's requirements effectively. Usually, an IR system takes the user query in natural language and returns the documents containing information pertinent to the question. One typical example of an IR system is Question Answering System. Usually a question answering system contains three phases namely question analysis, document retrieval and answer analysis. The question analysis phase takes the user questions and applies several processes such as question classification, query expansion to increase the probability of finding the relevant documents. The document analysis phase takes the processed question and retrieves the documents containing possible answers. The answer analysis phase identifies the relevant passages or set of sentences containing the possible answers and presents it to users. Thus, Question Answering Systems are very useful for retrieving documents from a collection of documents. In order to take full advantage of data generated by users over the social networks, a special class of Question Answering Systems was designed. These systems are called Collaborative Question Answering (CQA) Systems or Community Question Answering Systems. There are dozens of Collaborative Question Answering Systems available on the internet. The research proposed in this dissertation focuses mainly on CQA Systems and proposes methods to improve performances of these systems. One major problem with the existing CQAs is the mismatch between the user questions and the set of questions present in the CQAs. Though these CQAs contain the question, which is semantically similar to the user question, they fail to return the answers. The research in this dissertation proposes the methods to solve this issue. Thus, the scope of this dissertation is limited to the question analysis phase of the CQA systems. The overall performance of a CQA depends a lot on the question analysis phase. The question analysis phase in the proposed research attempts to improve the question matching in two steps. In the first step, called Question classification, questions are classified into several coarse grained and fine grained classes based on some rules. Based on predicted class of the question, the entity type (person, location, time etc.) expected to be present in the answers are determined. In question classification, we have used Wikipedia and WordNet tools. In the second step, called query expansion, irrelevant words are removed and semantically equivalent words are added. We have used a freely available open source thesaurus named Collaborative International Dictionary of English (CIDE) to find the semantically equivalent words. The methods proposed in this research are tested over a number of questions collected from existing CQA systems. The results are presented in the thesis.

Keywords

semantic resources, information retrieval system, question answering system, Collaborative Question Answering (CQA)

URI

http://bspace.buid.ac.ae/handle/1234/800

Collections

Dissertations for Informatics (Knowledge and Data Management)

Full item page