Preprocessing and Clustering Techniques for Uncovering Demographic Insights: A Study of Book Preferences Using Pca and K-Means
Keywords:
K-Means Clustering, Dimensionality Reduction, Principal Component Analysis, Demographic Data, BookAbstract
Authorities have interest in the use of demographic data to understand users’ behavior due to its importance in making industries unique. In the analysis of the demographic data regarding books’ preferences this study makes a use of the Principle Component Analysis (PCA) and K-means clustering analysis. First a PCA was used in an attempt to reduce the dimensionality of the given dataset, the goal was to maximize the variance retained in the data while at the same minimizing the raw data complexity to 95%. Last, in an effort to classify the data into different demographic areas by age as well as geographical region, the K-means clustering was applied. The outcomes implied in the study show that PCA, as well as K-means clustering, are efficient in distilling information from extensive and great-detail demographic data. The insight arising from the results is additional information in the area of reader preference, as well as suggestions for future marketing and recommendation approaches. These methods were used on the book dataset that contain user and book data, and the results proved that every demographic had preferences with different books. On top of knowledge derived from analysis of the results, it offers further understanding of the readers’ choice of content and indicates areas where marketing and recommendation strategies may be headed in the future. As a result of clustering customers, businesses can for example, target the marketing campaigns and offer appropriate books to the specific ages and geographical areas