|Title||Topical Anomaly Detection for Twitter Stream|
|Year of Publication||2012|
|Authors||Pramod Anantharam, Krishnaprasad Thirunarayan, Amit Sheth|
|Keywords||Anomaly detection and spam and off-topic content detection and binary classification and twitter stream analysis|
In this paper, we address the problem of finding topically anomalous tweets in twitter streams by analyzing the content of the document pointed to by the URLs in the tweets in preference to the textual content of the tweet. Existing approaches ignore such URLs thereby miss- ing additional opportunities to detect off-topic tweets. Specifically, we determine the divergence of claimed topic of a tweet as reflected by the hashtags and the actual topic as reflected by the document content. Our approach avoids the need for labeled samples by selecting documents from reliable sources gleaned from the URLs present in the tweets. These documents are used for comparison against documents from unknown URLs in incoming tweets improving both scalability and adapt- ability to rapidly changing topics. We evaluate our approach on three events and show that it can find topical inconsistencies not detectable by existing approaches.
|Full Text|| |
Pramod Anantharam, Krishnaprasad Thirunarayan, and Amit Sheth, 'Topical Anomaly Detection for Twitter Stream', In the Proceedings of ACM Web Science 2012, In Conjunction with NetSci 2012 Evanston, Illinois, June 22-24, 2012.