AN AI POWERED MULTI MODAL AUDIO RETRIEVAL APP USING HUMMING, LYRICS, AND MELODIC FEATURES

  • Unique Paper ID: 179904
  • PageNo: 9096-9101
  • Abstract:
  • Even in today’s modern environment, where content is readily available on-demand, algorithmic music search techniques often fail to satisfy user needs due to users not knowing or only partially remembering the details of a song. This paper describes a new multimodal audio retrieval application that uses artificial intelligence to identify songs from humming, snippets of lyrics, or melodies. The system applies techniques from audio signal processing, deep learning, and natural language processing to the song, which involves decomposing audio into various components like vocals, melody, rhythm, harmony, and identifying that with a specific song in the database. Users can search songs by providing an audio clip, speaking or typing a lyric fragment, or singing a tune. For melody recognition, we utilize CNN-based spectrogram analysis; for lyrics, we use text-based neural retrieval, and for final ranking we apply a fusion model. The proposed solution achieves high accuracy and real-time performance when discovering musical pieces no matter the input formats.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{179904,
        author = {AJITH KUMAR M and ASIF RAHMAN M and DHEERAJ E and JEROME J and Mrs B. Bala Abirami},
        title = {AN AI POWERED MULTI MODAL AUDIO RETRIEVAL APP USING HUMMING, LYRICS, AND MELODIC FEATURES},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {9096-9101},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=179904},
        abstract = {Even in today’s modern environment, where content is readily available on-demand, algorithmic music search techniques often fail to satisfy user needs due to users not knowing or only partially remembering the details of a song. This paper describes a new multimodal audio retrieval application that uses artificial intelligence to identify songs from humming, snippets of lyrics, or melodies. The system applies techniques from audio signal processing, deep learning, and natural language processing to the song, which involves decomposing audio into various components like vocals, melody, rhythm, harmony, and identifying that with a specific song in the database. Users can search songs by providing an audio clip, speaking or typing a lyric fragment, or singing a tune. For melody recognition, we utilize CNN-based spectrogram analysis; for lyrics, we use text-based neural retrieval, and for final ranking we apply a fusion model. The proposed solution achieves high accuracy and real-time performance when discovering musical pieces no matter the input formats.},
        keywords = {Multimodal Audio Retrieval, Music Information Retrieval, Humming-Based Search, Lyrics Recognition, Melodic Feature Analysis, Spectrogram Classification, Deep Learning, Natural Language Processing, Real-Time Audio Processing, Django Framework, ReactJS Interface.},
        month = {May},
        }

Cite This Article

M, A. K., & M, A. R., & E, D., & J, J., & Abirami, M. B. B. (2025). AN AI POWERED MULTI MODAL AUDIO RETRIEVAL APP USING HUMMING, LYRICS, AND MELODIC FEATURES. International Journal of Innovative Research in Technology (IJIRT), 11(12), 9096–9101.

Related Articles