AutoSCAN: Automatic Detection of DBSCAN Parameters and Efficient Clustering of Data in Overlapping Density Regions

The density-based clustering method is considered a robust approach in unsupervised clustering technique due to its ability to identify outliers, form clusters of irregular shapes and automatically determine the number of clusters. These unique properties helped its pioneering algorithm, the DBSCAN, become applicable in datasets where various number of clusters of different shapes and sizes could be detected without much interference from the user. Several implementations were subsequently proposed to tackle some of the limitations of DBSCAN. Most notably, the original DBSCAN’s dependence on its two input parameters required its users to have some insight of the dataset being clustered.

This proposed thesis uses the statistical data from a given dataset’s k-nearest neighbor results in order to accurately determine the optimal parameter values. It removes the burden on the users, and automatically detects the clusters of a given dataset. This approach also proposes an efficient re-implementation of the original algorithm to cluster datasets that exhibit adjoining cluster members. Finally, we will show that our method provides faster running times when compared to earlier approaches.