Summarization via Pattern Utility and Ranking: A Novel Framework for Social Media Data Analytics

TitleSummarization via Pattern Utility and Ranking: A Novel Framework for Social Media Data Analytics
Publication TypeJournal Article
Year of Publication2013
AuthorsXintian Yang, Yiye Ruan, Srinivasan Parthasarthy, Amol Ghoting
JournalBulletin of the Technical Committee on Data Engineering
Volume36
Issue3
Pagination67-76
Abstract

The firehose of data generated by users on social networking and microblogging sites such as Facebook and Twitter is enormous. The data can be classified into two categories: the textual content written by the users and the topological structure of the connections among users. Real-time analytics on such data is challenging with most current efforts largely focusing on the efficient querying and retrieval of data produced recently. In this article, we present a dynamic pattern driven approach to summarize social network content and topology. The resulting family of algorithms relies on the common principles of summarization via pattern utilities and ranking (SPUR). SPUR and its dynamic variant (D-SPUR) relies on an in-memory summary while retaining sufficient information to facilitate a range of user-specific and topic-specific temporal analytics. We then follow up by describing variants that take the implicit graph of connections into account to realize the Graph-based SPUR variant (G-SPUR). Finally we describe scalable algorithms for implementing these ideas on a commercial GPU-based systems. We examine the effectiveness of the summarization approaches along the axes of storage cost, query accuracy, and efficiency using real data from Twitter.

Related Files: