Methods for detecting fake voice signals
Abstract
Methods for detecting fake voice signals
Incoming article date: 04.05.2025The article analyzes various approaches to the generation and detection of audio deepfakes. Particular attention is paid to the preprocessing of acoustic signals, extraction of voice signal parameters, and data classification. The study examines three groups of classifiers: Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and neural networks. For each group, effective methods were identified, and the most successful approaches were determined based on a comprehensive analysis. The study revealed two approaches demonstrating high accuracy and reliability: a detector based on temporal convolutional networks analyzing MFCC-cepstrogram achieved an EER metric of 0.07%, while the Support Vector Machine with a radial basis function kernel reached an EER of 0.5%. Additionally, the latter method demonstrated the following metrics on the ASVspoof 2021 dataset: Accuracy = 99.6%, F1-score = 0.997, Precision = 0.998, and Recall = 0.994.
Keywords: audio deepfakes, preprocessing of acoustic signals, support vector machine, k-nearest neighbors, neural networks, temporal convolutional networks, deepfake detection