AI-BASED OCR SYSTEM FOR DIGITIZING HANDWRITTEN HISTORICAL DOCUMENTS IN REGIONAL LANGUAGES

  • Unique Paper ID: 174285
  • PageNo: 3435-3441
  • Abstract:
  • This project addresses the critical challenge of preserving and accessing historical documents written in regional languages, which are often at risk of deterioration and limited accessibility. We propose an AI-driven Optical Character Recognition (OCR) system leveraging Convolutional Neural Networks (CNNs) within the MATLAB environment. The system aims to accurately digitize handwritten texts, overcoming the complexities of varying handwriting styles and language-specific characters. A comprehensive image preprocessing pipeline, including noise removal, binarization, and segmentation, is implemented to enhance document quality and isolate text regions. The recognized characters are then converted into machine-readable text and further translated into modern regional languages, thereby broadening accessibility for researchers and historians. This initiative contributes significantly to the preservation of cultural heritage by providing a robust tool for accessing and studying invaluable historical information that would otherwise be lost.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{174285,
        author = {Ms.S.Shanthi and Kavishri B and Mahalakshmi R and Nivedhitha S},
        title = {AI-BASED OCR SYSTEM FOR DIGITIZING HANDWRITTEN HISTORICAL DOCUMENTS IN REGIONAL LANGUAGES},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {10},
        pages = {3435-3441},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=174285},
        abstract = {This project addresses the critical challenge of preserving and accessing historical documents written in regional languages, which are often at risk of deterioration and limited accessibility. We propose an AI-driven Optical Character Recognition (OCR) system leveraging Convolutional Neural Networks (CNNs) within the MATLAB environment. The system aims to accurately digitize handwritten texts, overcoming the complexities of varying handwriting styles and language-specific characters. A comprehensive image preprocessing pipeline, including noise removal, binarization, and segmentation, is implemented to enhance document quality and isolate text regions. The recognized characters are then converted into machine-readable text and further translated into modern regional languages, thereby broadening accessibility for researchers and historians. This initiative contributes significantly to the preservation of cultural heritage by providing a robust tool for accessing and studying invaluable historical information that would otherwise be lost.},
        keywords = {CNN, MATLAB, Handwritten Text Digitization, Regional languages, Historial Document Assessibility.},
        month = {March},
        }

Cite This Article

Ms.S.Shanthi, , & B, K., & R, M., & S, N. (2025). AI-BASED OCR SYSTEM FOR DIGITIZING HANDWRITTEN HISTORICAL DOCUMENTS IN REGIONAL LANGUAGES. International Journal of Innovative Research in Technology (IJIRT), 11(10), 3435–3441.

Related Articles