Efficient Graph Clustering for Big Data

Introduction

Graph-based clustering or community detection is an important focus area of data mining, social network inference, and machine learning. This is the process of partitioning data points (known as nodes) into different clusters in such a way that the nodes within a cluster possess high intra-cluster similarity while they have low inter-cluster similarity with those in a different cluster. The result of clustering process should produce high dense within-graph and low dense between-graph output for both non-overlapping clustering and overlapping clustering models. The goal of our work is to build non-convex optimization frameworks that can quickly approximate the best clustering, with further guarantees not only for convergence of the algorithms but also for their computational efficiency in the era of big data.