Speaker Identification System Modeling Using MFCC as Feature Extraction and SVM as Pattern Recognition

Almanfaluthi, Luthfan

View/Open

Fulltext (5.063Mb)

Date

2014

Author

Almanfaluthi, Luthfan

Buono, Agus

Nurhadryani, Yani

Metadata

Show full item record

Abstract

Everyday people exchange information using voice may also exchange information with the media texts and tools. Voice signal every human has the character and qualities of different or unique. Indonesia has a diverse ethnic, communities and cultures, many problems are for the same word has different pronunciation patterns. Because of this problem could be a problem in the speaker identification system, so it is necessary to develop a system that is relatively more robust to the problem of intra-speaker variability and noise. Speaker identification system is more focused on the analysis of the two subsystems, namely Feature Extractor and Pattern Recogniser. Mel-Frequency Cepstrum Coefficients (MFCC) is one of feature extraction that is often used for processing the human voice for calculating the cepstral coefficients with the consideration of human hearing. Support Vector Machine (SVM) is one of the classification techniques of data with the supervised learning that is able to classify the multi-class so it is suitable for the classification of more than two classes. Data collection was performed using a microphone to record sound. Sound source was obtained from 10 adult speakers with differences in gender, age and ethnicity, which each speakers say 50 times the word "COMPUTER" so that obtained 500 data. Record duration is 2 seconds with a frequency of 16 KHz. Before data is processed, a preprocessing stage consisting of the elimination of silence, normalization and noise addition. Gaussian noise is added from the level of 80 dB to 0 dB. After the MFCC feature extraction is done, the next stage is SVM pattern recognition using QP and SMO algorithms. Kernel function tested for RBF, Linear, and Quadratic for each algorithm. Pattern Recognition using Kernel quadratic function with a ratio of 90 : 10 for the test data that the original sound without noise, SMO algorithm produces accuracy of 97.0% and the accuracy of the system can maintain above 70% up to 40dB noise addition. The number of errors for all 10 speakers using the test data of the original sound without noise is at most the number 9 speakers (Male, 41 years old, Java). The processing time SMO algorithm is better than the QP algorithm. Future studies may be added to increase the accuracy of Noise Cancelling the voice data is contaminated by noise.

URI

http://repository.ipb.ac.id/handle/123456789/68740

Collections

MT - Mathematics and Natural Science [3984]