Back-to-the-Future Weekend Reading - Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

• 219 words

The intense travels around the world in the spring have kept me from keeping up on the historical reading that I would like to do, as such there have not been that many suggesting for the back-to-basics reading list. The fall is going be not that much different but I will make an effort to get back into a reading habit.

I want to kick off the fall readings not with an historical paper but with two that detail GraphLab, an excellent framework for high performance machine learning that originally has been built by the Carlos Guestrin (Carlos is now the Amazon Professor of Machine Learning at UW). GraphLab has been used to build several different data mining and graph processing toolkits and applications. The research in the papers has been performed on Amazon EC2. Instructions for running your own GraphLab Cluster on EC2 can be found here

Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin and Joseph M. Hellerstein, Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 8, pp. 716-727 (2012)

PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Joseph Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson and Carlos Guestrin, Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, 2012