Implementation of projected clustering based on SQL queries and UDFs in relational databases

Harikumar Sandhya, Haripriya Harikumar, MR Kaimal

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Projected clustering is one of the clustering approaches that determine the clusters in the subspaces of high dimensional data. Although it is possible to efficiently cluster a very large data set outside a relational database, the time and effort to export and import it can be significant. In commercial RDBMSs, there is no SQL query available for any type of subspace clustering, which is more suitable for large databases with high dimensions and large number of records. Integrating clustering with a relational DBMS using SQL is an important and challenging problem in todays world of Big Data. Projected clustering has the ability to find the closely correlated dimensions and find clusters in the corresponding subspaces. We have designed an SQL version of projected clustering which helps to get the clusters of the records in the database using a single SQL statement which in itself calls other SQL functions defined by us. We have used PostgreSQL DBMS to validate our implementation and have done experimentation with synthetic as well as real data.
Original languageEnglish
Title of host publication2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)
PublisherIEEE
Pages7-12
Number of pages6
ISBN (Electronic)9781479921782
DOIs
Publication statusPublished - 20 Feb 2014

Fingerprint

Dive into the research topics of 'Implementation of projected clustering based on SQL queries and UDFs in relational databases'. Together they form a unique fingerprint.

Cite this