Using Data Mining and Text Mining Techniques in Predicting the Price of Real Estate Properties in Dubai

Loading...
Thumbnail Image
Date
2014-05
Journal Title
Journal ISSN
Volume Title
Publisher
The British University in Dubai (BUiD)
Abstract
Data mining is defined as the discovery of previously unknown patterns and relationships between stored data and represents the interesting information in understandable format. On the other hand, text mining is also looks for interesting hidden information, but on human natural language data. Data mining techniques are applied in several domains and industries and one of these domains is the real estate domain. Predicting the price of real estate properties based on past sales transactions is one of the data mining applications. In this thesis, the effectiveness of using linear regression modeling as a data mining technique in predicting the price of a real estate property based on online real estate classifieds is examined. The results of linear regression predictions based on the structured features are considered as the baseline of the research. Then, text mining is used to convert the unstructured text features into proper format so that the accuracy of linear regression prediction is tested again after involving text features in the experiments. All experiments are implemented and tested using RapidMiner5 software tool. Before starting the experiments, the dataset is collected from different online real estate property classifieds that offer villas and apartments either for renting or selling in Dubai, UAE. The dataset has been divided into six subsets based on the type of the classified. The experiments are carried out on each subset by first using the regular structured numerical features of the real - iii - estate property in predicting the price using linear regression modeling. Then, the linear regression prediction experiments are repeated after text mining the descriptive text features. After text mining process, thousands of features are generated to reflect the significance of thousands of unique words using the term weighting scheme: term frequency-inverse term frequency TF-IDF. It is found that the results of linear regression prediction have been improved significantly after adding text mining to the experiments. The root mean squared errors RMSEs for all data subsets have decreased leading to enhancing the accuracy of prediction. For example, the RMSE is reduced by almost 56% for two data subsets that concerns the classifieds of the offered villas for renting and the classifieds of the apartments that are offered for selling. Also, the linear correlation between regular features and the price feature has increased noticeably. For example, the correlation coefficient metric has increased by 206.08% for the dataset that holds the records of the apartments that are offered for selling. Moreover, the analysis shows that the key factor that controls the renting or selling price of a real estate property in Dubai is its location. To the best of our knowledge, this research is the first scientific analysis of Dubai’s real estate classifieds and it is the first trial in improving the accuracy of using data mining technique in price prediction using text mining. Also, verifying the experimental results is supported using a real world dataset that reflects the current trends in the real estate market.
Description
Keywords
data mining, text mining, real estate properties, Dubai
Citation