1. An audio-based anger detection algorithm using a hybrid artificial neural network and fuzzy logic model.
- Author
-
Surana, Arihant, Rathod, Manish, Gite, Shilpa, Patil, Shruti, Kotecha, Ketan, Selvachandran, Ganeshsree, Quek, Shio Gai, and Abraham, Ajith
- Abstract
Audio Emotion Recognition (AER) is an important factor for Human Emotion Analysis with or without any visual aiding components. Such audio has different modular parameters, such as rhythm, tone, and pitch. However, emotions are highly complex, and the way they get delivered to human ears with preconceived emotions are then instantly understood by humans, and this is something that has been perfected after thousands of years of human evolution. Artificial intelligence (AI) enabled AER has captured worldwide attention in the last couple of years and has gained increasing importance amongst AI researchers in various fields. It has become increasingly important in recent years, especially after the start of the Covid-19 pandemic that has resulted in work from home, online schooling, and online learning on a mass scale due to large-scale lockdowns and movement control orders around the world. The audio quality on online platforms differs from device to device and is dependent on the quality or the bandwidth of the Internet connection used in such applications. Therefore, as the world is recovering from the Covid-19 pandemic, an algorithm for anger detection proves necessary in maintaining public security and general safety and can also help in the early detection of mental health issues or anger management issues. This is because the presence of an angry person in public can pose a threat to the people around and may also impose a risk of damage to public property. As a result, detecting the presence of anger emotion through voices in all public places proves to be the first line of defense against any outbreaks of public nuisance or even violent crimes. Moreover, the more prominent the anger emotion of a person, the more amount of attention must be given to the person by the public security forces. This study uses a collection of audio files from the CREMA-D dataset as the input, where a collection of 364 audio files from 91 actors, each with three degrees of showing anger and a neutral emotion were used. All audio files in this collection use the script "It's eleven o'clock". A hybrid algorithm of artificial neural network (ANN) and fuzzy logic, along with a dedicated preprocessing technique specifically for handling audio files were introduced. A comprehensive discussion and analysis of the results was presented in which the proposed algorithm was compared with all the other audio classification algorithms that exist in literature, many of which merely deployed a readily made general purpose neural network-based algorithm. This brute force method of relying on an overly complicated computational structure proves too low in efficiency as the number of nodes involved in the computational process far surpasses the number of preprocessed inputs. On top of this, descriptions about preprocessing procedures for audio classification among all recent works are found to be unclear. Finally, the limitations and suggestions for improvements of the experimental setup, and the potential applications of the findings are also discussed and analyzed in the conclusion of this study. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF