Pendeteksian Penjiplakan Kode Program C dengan Bisecting K-means
Abstract
Plagiarism of source codes assignments is a widespread problem in academic institutions. Manual plagiarism detection is time and energy consuming. Therefore, a system that can help detecting this plagiarism is needed. The detection can be done by grouping similar source codes based on their structure. This method is used in previous research by using automatic K-means iterations algorithm. That algorithm, although produced decent clusters, had a long execution time. The purpose of this research is to improve the time efficiency and clusters result quality by using bisecting K-means algorithm. The results showed a significant improvement in execution time from 11.68 seconds to 6.64 seconds. Bisecting K-means also produced fewer clusters with slightly better Rand Index than K-means iterations. Furthermore, experiments using 2-gram to 6-gram showed that 4-gram resulted in the best performance
Collections
- UT - Computer Science [2330]