Evaluating the Efﬁcacy of Text AI Detectors: Implications and Insight

Q: How many days will it take for my paper to be published?

The review time for papers is not fixed. However, if the paper is accepted and the author completes the processing charges formalities, the paper will be published within a few working days.

Q: I would like to receive a hard copy of the journal materials. Are there any additional charges?

You can log in to the author portal and pay 500 INR to receive the hard copy materials.

Naveen Krishna R; S Gokulakrishnan; Rinto A Varghese

Evaluating the Efﬁcacy of Text AI Detectors: Implications and Insight

Authors: Naveen Krishna R, S Gokulakrishnan, Rinto A Varghese

Unique Paper ID: 182788
Volume: 12
Issue: 2
PageNo: 3760-3763

Keywords: AI detection Natural Language Processing Machine Learning Naïve Bayes Text Classification TF-IDF Deep Learning

Abstract:
AI The rapid advancement of artificial intelligence (AI) in natural language processing (NLP) has led to an influx of AI-generated text. While this development has enabled various applications, it has also raised concerns regarding authenticity, misinformation, and ethical considerations. This paper explores a machine learning-based approach to detect AI-generated text using natural language processing techniques. Our dataset consists of 487,235 text samples labeled as either human-written or AI-generated. We employ preprocessing techniques such as tokenization, stopword removal, punctuation removal, and term frequency-inverse document frequency (TF-IDF) transformation. The classification model utilizes a Naïve Bayes classifier, achieving an accuracy of 95%. This paper presents an in-depth analysis of data preprocessing, model training, performance evaluation, and potential enhancements. Additionally, we compare our approach with existing text detection methods and discuss future improvements to enhance robustness and adaptability. In recent years, the ability of AI models to generate coherent and contextually appropriate text has significantly improved, making it increasingly challenging to distinguish between human-written and AI-generated content. This has profound implications across various domains, including academia, journalism, and social media. In academia, AI-generated content can lead to plagiarism concerns, while in journalism, it can contribute to the spread of misinformation. Businesses utilizing AI-generated text for customer interactions must ensure authenticity and trustworthiness. Therefore, developing an efficient and accurate detection system is crucial. Our study aims to address these challenges by proposing a systematic approach to AI text detection. The dataset used in this study is sourced from Kaggle, ensuring a diverse set of texts from various domains such as news, academic writing, social media, fiction, and technical documentation. The dataset includes 500 human-written blog posts and 500 AI-generated blog posts. Ensuring data diversity is crucial for model generalization. The preprocessing steps include text cleaning, tokenization, stopword removal, lemmatization, and vectorization. These steps are essential to transform raw text into a structured format suitable for machine learning models. We use TF-IDF to convert text into numerical features, which are then used to train the Naïve Bayes classifier.

Download article

email to a friend

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{182788,
author = {Naveen Krishna R and S Gokulakrishnan and Rinto A Varghese},
title = {Evaluating the Efﬁcacy of Text AI Detectors: Implications and Insight},
journal = {International Journal of Innovative Research in Technology},
year = {2025},
volume = {12},
number = {2},
pages = {3760-3763},
issn = {2349-6002},
url = {https://ijirt.org/article?manuscript=182788},
abstract = {AI The rapid advancement of artificial intelligence (AI) in natural language processing (NLP) has led to an influx of AI-generated text. While this development has enabled various applications, it has also raised concerns regarding authenticity, misinformation, and ethical considerations. This paper explores a machine learning-based approach to detect AI-generated text using natural language processing techniques. Our dataset consists of 487,235 text samples labeled as either human-written or AI-generated. We employ preprocessing techniques such as tokenization, stopword removal, punctuation removal, and term frequency-inverse document frequency (TF-IDF) transformation. The classification model utilizes a Naïve Bayes classifier, achieving an accuracy of 95%. This paper presents an in-depth analysis of data preprocessing, model training, performance evaluation, and potential enhancements. Additionally, we compare our approach with existing text detection methods and discuss future improvements to enhance robustness and adaptability.
In recent years, the ability of AI models to generate coherent and contextually appropriate text has significantly improved, making it increasingly challenging to distinguish between human-written and AI-generated content. This has profound implications across various domains, including academia, journalism, and social media. In academia, AI-generated content can lead to plagiarism concerns, while in journalism, it can contribute to the spread of misinformation. Businesses utilizing AI-generated text for customer interactions must ensure authenticity and trustworthiness. Therefore, developing an efficient and accurate detection system is crucial.
Our study aims to address these challenges by proposing a systematic approach to AI text detection. The dataset used in this study is sourced from Kaggle, ensuring a diverse set of texts from various domains such as news, academic writing, social media, fiction, and technical documentation. The dataset includes 500 human-written blog posts and 500 AI-generated blog posts. Ensuring data diversity is crucial for model generalization.
The preprocessing steps include text cleaning, tokenization, stopword removal, lemmatization, and vectorization. These steps are essential to transform raw text into a structured format suitable for machine learning models. We use TF-IDF to convert text into numerical features, which are then used to train the Naïve Bayes classifier.},
keywords = {AI detection, Natural Language Processing, Machine Learning, Naïve Bayes, Text Classification, TF-IDF, Deep Learning},
month = {July},
}

Download .bib

Cite This Article

ISSN: 2349-6002
Volume: 12
Issue: 2
PageNo: 3760-3763

Evaluating the Efﬁcacy of Text AI Detectors: Implications and Insight

Available:https://ijirt.org/article?manuscript=182788

Impact Factor
8.01 (Year 2024)

An UGC-Compliant International Research Journal

Join Our IPN

IJIRT Partner Network

Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.

Join Now