Textual documents created and distributed on the Internet are ever changing in various forms. In this paper, in order to characterize and detect personalized and abnormal behaviors of Internet users, we propose Sequential Topic Patterns (STPs) and formulate the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. They are rare on the whole but relatively frequent for specific users, so can be applied in many real-life scenarios, such as real-time monitoring on abnormal user behaviors. Most of the existing works are devoted in this topic modeling and they system evolution of individual topics, while sequential relations of topics in successive documents published by a specific user are ignored. We present of a group of algorithms then to solve this innovative mining of problem through three phases these are preprocessing to extract topics and identify they sessions for different users, generating all the STP candidates with (expected) support values for each user by pattern-growth, and selecting URSTPs by making user-aware rarity analysis on derived STPs. Experiments on both real (Twitter) and synthetic datasets show that our approach can indeed discover special users and interpretable URSTPs effectively and efficiently, which significantly reflect users’ characteristics.
Article Details
Unique Paper ID: 145679
Publication Volume & Issue: Volume 4, Issue 11
Page(s): 136 - 140
Article Preview & Download
Share This Article
Join our RMS
Conference Alert
NCSEM 2024
National Conference on Sustainable Engineering and Management - 2024