DISGD: A Distributed Shared-nothing Matrix Factorization for Large Scale Online Recommender Systems

Hazem, Heidy; Awad, Ahmed; Hassan, Ahmed; Sakr, Sherif

DISGD: A Distributed Shared-nothing Matrix Factorization for Large Scale Online Recommender Systems

Date

2020

Authors

Publisher

Open proceedings org

Abstract

With the web-scale data volumes and high velocity of generation rates, it has become crucial that the training process for recom mender systems be a continuous process which is performed on live data, i.e., on data streams. In practice, such systems have to address three main requirements including the ability to adapt their trained model with each incoming data element, the ability to handle concept drifts and the ability to scale with the volume of the data. In principle, matrix factorization is one of the popular approaches to train a recommender model. Stochastic Gradient Descent (SGD) has been a successful optimization approach for matrix factorization. Several approaches have been proposed that handle the first and second requirements. For the third require ment, in the realm of data streams, distributed approaches depend on a shared memory architecture. This requires obtaining locks before performing updates. In general, the success of main-stream big data processing systems is supported by their shared-nothing architecture. In this paper, we propose DISGD, a distributed shared-nothing variant of an incremental SGD. The proposal is motivated by an observation that with large volumes of data, the overwrite of updates, lock free updates, does not affect the result with sparse user-item matrices. Compared to the baseline incremental approach, our evaluation on several datasets shows not only improvement in processing time but also improved recall by 55%.

Citation

Hazem H. et al. (2020) “DiSGD: A distributed shared-nothing matrix factorization for large scale online recommender systems,” Advances in Database Technology - EDBT, 2020-March, pp. 359–362.