DISGD: A Distributed Shared-nothing Matrix Factorization for Large Scale Online Recommender Systems
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
Open proceedings org
Abstract
With the web-scale data volumes and high velocity of generation
rates, it has become crucial that the training process for recom mender systems be a continuous process which is performed on
live data, i.e., on data streams. In practice, such systems have to
address three main requirements including the ability to adapt
their trained model with each incoming data element, the ability
to handle concept drifts and the ability to scale with the volume
of the data. In principle, matrix factorization is one of the popular
approaches to train a recommender model. Stochastic Gradient
Descent (SGD) has been a successful optimization approach for
matrix factorization. Several approaches have been proposed that
handle the first and second requirements. For the third require ment, in the realm of data streams, distributed approaches depend
on a shared memory architecture. This requires obtaining locks
before performing updates.
In general, the success of main-stream big data processing
systems is supported by their shared-nothing architecture. In this
paper, we propose DISGD, a distributed shared-nothing variant of
an incremental SGD. The proposal is motivated by an observation
that with large volumes of data, the overwrite of updates, lock free updates, does not affect the result with sparse user-item
matrices. Compared to the baseline incremental approach, our
evaluation on several datasets shows not only improvement in
processing time but also improved recall by 55%.
Description
Keywords
Citation
Hazem H. et al. (2020) “DiSGD: A distributed shared-nothing matrix factorization for large scale online recommender systems,” Advances in Database Technology - EDBT, 2020-March, pp. 359–362.