A LITERATURE REVIEW OF VARIOUS TECHNIQUES FOR PERFORMING DOCUMENT CLUSTERING
Zenab Qureshi, Priyanka Dubey
Document Clustering, Term Frequency, Preprocessing, Stemming, Clustering Algorithms.
The measure of information recorded and the content data accessible on the web has been surprisingly expanding, gathering and enlarging with every day. Such information and data which is accessible in high voluminous structure is really not accessible in a structure which is reasonable for content handling as the information accessible is generally unclear, amorphous or unstructured. Content mining is a sub field of information mining which goes for investigating the valuable data from the recorded assets. Content mining has three significant difficulties. They are high dimensionality, embraced remove measures, accomplishing quality bunches and improved classifier exactnesses.
Grouping of archive is significant with the end goal of report association, rundown, subject extraction and data recovery in a proficient manner. At first, grouping is connected for upgrading the data recovery procedures. Recently, bunching strategies have been connected in the territories which include perusing the assembled information or in ordering the result given by the web indexes to the answer to the question raised by the clients. In this paper, we are giving an exhaustive review over the archive bunching.