Pemodelan Skor Kelayakan Berita Digital dengan Pendekatan Kombinasi Unsupervised dan Supervised Learning
Abstract
The rapid evolution of digital technology has transformed the media landscape, making news more accessible while introducing content quality and accuracy challenges. The rise of misinformation and fake news has diminished public trust in traditional media. The research objective of this study is to develop a systematic method for evaluating the quality and potential impact of news articles before publication. By adapting credit risk scoring principles, a model was used to predict the suitability of news content based on factors such as title length, number of images, news category, and publication timing. A variable target was first formed using three clustering methods: K-means, K-Modes, and K-Medoids. The results indicated that K-Means outperformed the other methods, leading us to use its outcomes to determine publication suitability. Subsequently, stepwise logistic regression was applied to implement the credit risk scoring approach, allowing for variable selection and assessment of importance. Ultimately, ten variables were identified to generate a newsworthiness score, with minimum and maximum scores of 997 and 1407, respectively. The average scores for articles deemed publishable and not publishable were 1137 and 1110. A cutoff score of 1123 was established based on these averages, categorizing 6708 articles (57.9%) as suitable for publication. These findings aim to assist media organizations in refining their content curation processes, thereby enhancing the overall quality of news consumption.