Advancing voice health

  • Unique Paper ID: 164723
  • Volume: 10
  • Issue: 12
  • PageNo: 2612-2618
  • Abstract:
  • In our study, we propose an innovative method for early detection and intervention of vocal disorders. Our comprehensive dataset consists of voice samples from healthy individuals and those with voice pathologies. We consider acoustic features like fundamental frequency, jitter, shimmer, and Mel-frequency cepstral coefficients, which are analyzed using tree-based machine learning algorithms. Additionally, we extract modes from audio signals through Variational Mode Decompo- sition (VMD) and convert them into Mel spectrograms. These spectrograms are then processed by a Vision Transformer archi- tecture. With a focus on multi-class classification, we combine the outputs of the tree-based algorithms and Vision Transformer into an ensemble model to enhance predictive accuracy across all classes. The method yields good results, which achieves an overall accuracy of 93% along with strong performance on other metrics, demonstrating its potential for improving early detection techniques for voice disorders.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{164723,
        author = {KABIR SRINIDH and Arravapula Siddartha Reddy and Kabir Srinidh and Malgari Supriya},
        title = {Advancing voice health },
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {10},
        number = {12},
        pages = {2612-2618},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=164723},
        abstract = {In our study, we propose an innovative method for early detection and intervention of vocal disorders. Our comprehensive dataset consists of voice samples from healthy individuals and those with voice pathologies. We consider acoustic features like fundamental frequency, jitter, shimmer, and Mel-frequency cepstral coefficients, which are analyzed using tree-based machine learning algorithms. Additionally, we extract modes from audio signals through Variational Mode Decompo- sition (VMD) and convert them into Mel spectrograms. These spectrograms are then processed by a Vision Transformer archi- tecture. With a focus on multi-class classification, we combine the outputs of the tree-based algorithms and Vision Transformer into an ensemble model to enhance predictive accuracy across all classes. The method yields good results, which achieves  an overall accuracy of 93% along with strong performance on other metrics, demonstrating its potential for improving early detection techniques for voice disorders.},
        keywords = {Voice Disorders, tree based, machine learning, classification model, acoustic features, Mel-Frequency Cepstral Coefficients, Variational Mode Decompostion,VMD modes.},
        month = {},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 10
  • Issue: 12
  • PageNo: 2612-2618

Advancing voice health

Related Articles