Please use this identifier to cite or link to this item: https://bspace.buid.ac.ae1234/1459
Title: Positive Unlabelled Learning to Recognize Dishes as Named Entity
Authors: TAREK, AIMAN
Keywords: social media
user-generated content
named entity recognition
Issue Date: Apr-2019
Publisher: The British University in Dubai (BUiD)
Abstract: With the growth of social media, there is a need to analyse the user-generated content; especially the text reviews. Online text reviews have a lot of potential and opportunities for both users and business owners. Many researches target analysing text reviews extracting useful info especially Named Entity Recognition. In this research, I focus on extracting food and dish names as a named entity. With the lack of labelled data, I try to overcome the cold start and avoid manual labelling by building a lookup table from a dictionary. I work with Yelp dataset, going through each text review, using each noun as a candidate, label the positive samples using the aforementioned lookup table, then using Positive Unlabelled learning techniques to recognise more entities within the unlabelled data, by predicting the probability for each candidate. I considered the surrounding words; preceding and following in building the model, as well as Part of Speech tag for each. To eliminate duplicates due to repeated candidates from different reviews or sentences, I calculate the average and represent each candidate entity only once. The results show how we can automate entity recognition process, using dictionaries and machine learning techniques and achieve an acceptable accuracy of 67% and boost the newly discovered entities by around 15% using Positive Unlabelled learning over automatically build lookup table. This research has the potential to be extended to other topics other than food and dish names, also it acts as a framework and algorithm independent.
URI: https://bspace.buid.ac.ae1234/1459
Appears in Collections:Dissertations for IT Management (ITM)

Files in This Item:
File Description SizeFormat 
2016210186.pdf1.04 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.