Topical Anomaly Detection for Twitter Stream

TitleTopical Anomaly Detection for Twitter Stream
Publication TypeMiscellaneous
Year of Publication2012
AuthorsPramod Anantharam, Krishnaprasad Thirunarayan, Amit Sheth
KeywordsAnomaly detection and spam and off-topic content detection and binary classification and twitter stream analysis
Abstract

In this paper, we address the problem of finding topically anomalous tweets in twitter streams by analyzing the content of the document pointed to by the URLs in the tweets in preference to the textual content of the tweet. Existing approaches ignore such URLs thereby miss- ing additional opportunities to detect off-topic tweets. Specifically, we determine the divergence of claimed topic of a tweet as reflected by the hashtags and the actual topic as reflected by the document content. Our approach avoids the need for labeled samples by selecting documents from reliable sources gleaned from the URLs present in the tweets. These documents are used for comparison against documents from unknown URLs in incoming tweets improving both scalability and adapt- ability to rapidly changing topics. We evaluate our approach on three events and show that it can find topical inconsistencies not detectable by existing approaches.

Full Text

Pramod Anantharam, Krishnaprasad Thirunarayan, and Amit Sheth, 'Topical Anomaly Detection for Twitter Stream', In the Proceedings of ACM Web Science 2012, In Conjunction with NetSci 2012 Evanston, Illinois, June 22-24, 2012.
project: Twitris and http://knoesis.org/Project/SSW-_Semantic_Sensor_Web
place: ACM Web Science
year: 2012
hasURL: http://knoesis.wright.edu/library/download/websci12_submission_83.pdf