Evaluating the Efficacy of Text AI Detectors: Implications and Insight

  • Unique Paper ID: 182788
  • Volume: 12
  • Issue: 2
  • PageNo: 3760-3763
  • Abstract:
  • AI The rapid advancement of artificial intelligence (AI) in natural language processing (NLP) has led to an influx of AI-generated text. While this development has enabled various applications, it has also raised concerns regarding authenticity, misinformation, and ethical considerations. This paper explores a machine learning-based approach to detect AI-generated text using natural language processing techniques. Our dataset consists of 487,235 text samples labeled as either human-written or AI-generated. We employ preprocessing techniques such as tokenization, stopword removal, punctuation removal, and term frequency-inverse document frequency (TF-IDF) transformation. The classification model utilizes a Naïve Bayes classifier, achieving an accuracy of 95%. This paper presents an in-depth analysis of data preprocessing, model training, performance evaluation, and potential enhancements. Additionally, we compare our approach with existing text detection methods and discuss future improvements to enhance robustness and adaptability. In recent years, the ability of AI models to generate coherent and contextually appropriate text has significantly improved, making it increasingly challenging to distinguish between human-written and AI-generated content. This has profound implications across various domains, including academia, journalism, and social media. In academia, AI-generated content can lead to plagiarism concerns, while in journalism, it can contribute to the spread of misinformation. Businesses utilizing AI-generated text for customer interactions must ensure authenticity and trustworthiness. Therefore, developing an efficient and accurate detection system is crucial. Our study aims to address these challenges by proposing a systematic approach to AI text detection. The dataset used in this study is sourced from Kaggle, ensuring a diverse set of texts from various domains such as news, academic writing, social media, fiction, and technical documentation. The dataset includes 500 human-written blog posts and 500 AI-generated blog posts. Ensuring data diversity is crucial for model generalization. The preprocessing steps include text cleaning, tokenization, stopword removal, lemmatization, and vectorization. These steps are essential to transform raw text into a structured format suitable for machine learning models. We use TF-IDF to convert text into numerical features, which are then used to train the Naïve Bayes classifier.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{182788,
        author = {Naveen Krishna R and S Gokulakrishnan and Rinto A Varghese},
        title = {Evaluating the Efficacy of Text AI Detectors: Implications and Insight},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {2},
        pages = {3760-3763},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=182788},
        abstract = {AI The rapid advancement of artificial intelligence (AI) in natural language processing (NLP) has led to an influx of AI-generated text. While this development has enabled various applications, it has also raised concerns regarding authenticity, misinformation, and ethical considerations. This paper explores a machine learning-based approach to detect AI-generated text using natural language processing techniques. Our dataset consists of 487,235 text samples labeled as either human-written or AI-generated. We employ preprocessing techniques such as tokenization, stopword removal, punctuation removal, and term frequency-inverse document frequency (TF-IDF) transformation. The classification model utilizes a Naïve Bayes classifier, achieving an accuracy of 95%. This paper presents an in-depth analysis of data preprocessing, model training, performance evaluation, and potential enhancements. Additionally, we compare our approach with existing text detection methods and discuss future improvements to enhance robustness and adaptability.
In recent years, the ability of AI models to generate coherent and contextually appropriate text has significantly improved, making it increasingly challenging to distinguish between human-written and AI-generated content. This has profound implications across various domains, including academia, journalism, and social media. In academia, AI-generated content can lead to plagiarism concerns, while in journalism, it can contribute to the spread of misinformation. Businesses utilizing AI-generated text for customer interactions must ensure authenticity and trustworthiness. Therefore, developing an efficient and accurate detection system is crucial.
Our study aims to address these challenges by proposing a systematic approach to AI text detection. The dataset used in this study is sourced from Kaggle, ensuring a diverse set of texts from various domains such as news, academic writing, social media, fiction, and technical documentation. The dataset includes 500 human-written blog posts and 500 AI-generated blog posts. Ensuring data diversity is crucial for model generalization.
The preprocessing steps include text cleaning, tokenization, stopword removal, lemmatization, and vectorization. These steps are essential to transform raw text into a structured format suitable for machine learning models. We use TF-IDF to convert text into numerical features, which are then used to train the Naïve Bayes classifier.},
        keywords = {AI detection, Natural Language Processing, Machine Learning, Naïve Bayes, Text Classification, TF-IDF, Deep Learning},
        month = {July},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 2
  • PageNo: 3760-3763

Evaluating the Efficacy of Text AI Detectors: Implications and Insight

Related Articles