- Carlos Castillo, Qatar Computing Research Institute (QCRI), Qatar
- Fernando Diaz, Microsoft Research NYC, USA
- Hemant Purohit, Kno.e.sis, Wright State, USA
Details: SDM-14 Tutorial program
Predecessor with extensive domain knowledge: ICWSM-2013 tutorial
Background: NSF SoCS project
There is an ever increasing number of users in social media (1B+ Facebook users, 500M+ Twitter users) and ubiquitous mobile access (6B+ mobile phone subscribers) who share their observations and opinions. In addition, the Web of Data and existing knowledge bases keep on growing at a rapid pace. In this scenario, we have unprecedented opportunities to improve crisis response by extracting social signals, creating spatio-temporal mappings, performing analytics on social and Web of Data, and supporting a variety of applications. Such applications can help provide situational awareness during an emergency, improve preparedness, and assist during the rebuilding/recovery phase of a disaster. Data mining can provide valuable insights to support emergency responders and other stakeholders during crisis. However, there are a number of challenges and existing computing technology may not work in all cases. Therefore, our objective here is to present the characterization of such data mining tasks, and challenges that need further research attention. This tutorial weaves two themes and corresponding relevant topics:
- Characterization of Citizen Sensing and the Web of Data: opportunity lies in an explosion of data resulting from the participation of millions of end users (citizen sensors) through a wide variety of means and forms during crises: SMS, tweets and Facebook posts, with time and optional GPS metadata, in addition to news, blogs, Wikipedia, existing Web knowledge, etc.
- Technical challenges and recent research for leveraging the citizen sensing with Web of Data to improve crisis response coordination: role of semantics (esp. background knowledge) enhanced techniques and analysis of casual/informal text for crisis response, data integration for enhanced situational awareness for responders, actionable information extraction for decision makers, crowdsourcing for information credibility and applicability of the existing methods with scope of research in the crisis response space.
Although broad in scope, this tutorial will present key research advances in depth. These include (i) understanding and analysis of informal text, including microblogs and post-disaster ad-hoc social media communities: e.g., information credibility in the crisis data streams, role of semantic/background knowledge enhanced techniques with Web of Data, (ii) enabling complex social media analytics/summarization platforms, (iii) performing interdisciplinary research involving computer and social sciences (e.g. linguistic analysis and coordination) to address important practical applications such as better and faster aid targeting and resource coordination and utilization, by computing methods. Technical insights will be coupled with identification of computational techniques and algorithms along with real-world examples.
AUDIENCE AND SPEAKER GOALS:
The tutorial will interest both Data Mining researchers as well as technologists since it will reveal research that is in its infancy with many examples and demonstrations vs. research that has matured. We shall provide an overview of both citizen sensing and crisis mapping for response coordination, therefore, attendees are not expected to have a specialized background and knowledge in these areas. Those with a background in text mining, graph mining, data summarization and geo-spatial data analysis will be able to better understand novel technical challenges dealing with short and informal text signals during crisis response as well as other aspects of combined analysis of social and Web of data.
The tutorial objective is to provide details about data characteristics and computing challenges for solving problems of crisis response, by showing problems with the existing methods and potential for new research in this space. The tutorial will give brief introduction of users and settings in the crisis response domain, the end users of crisis computing-- who are the agencies and potential end-users, what are their expectations, what tools do they currently have, etc. After understanding ‘what crisis responders want’, we shall present ‘what data can support’ and ‘how computing techniques can help’. While we describe various problems of interest for crisis response, for each problem we shall present existing work (not limited to authors’ prior work alone) on characterizing what data is available, and present some notes about what kind of operations this data may support. It will enable researchers to understand challenges in applying existing computational techniques. We will further illustrate new computing methods needed for this space, and what is required to leverage in the state of the art methods, while motivating for further research.
We specifically want to address the following problems and the characterization of data, existing work and need for further research in the crisis computing space: A.) Problems and methods: [Predecessor for background: ICWSM-2013 tutorial]
- Detection of crisis events
- Extraction of structured data from unstructured citizen reports
- Summarization of key events during the development of a crisis
- Information classification for situational awareness and rapid evaluation of crisis impact
- Coordination of donations and resources
- Human-assisted data mining: hybrid stream processing using crowdsourcing
B.) For each problem, we will present:
- What crisis responders and decision makers may need (problem description)
- What data there is available and its challenges (e.g. scale, heterogeneity, etc.)
- What research outputs and technologies have been used or may be used
- Potential scientific and technological challenges
Carlos Castillo (PhD) is a Senior Scientist in the Social Computing team at the Qatar Computing Research Institute (QCRI) in Doha. He is a web miner with a background on information retrieval, and has been influential in the areas of adversarial web search and web content quality and credibility. He is an active researcher with more than 40 publications in top-tier international conferences and journals and 4000+ citations. His current research focuses on the application of web mining methods to problems in the domain of online news and humanitarian crises. More about Carlos: http://www.chato.cl/research/
Fernando Diaz (PhD) is a researcher at the Microsoft Research NYC lab. His primary research interest is formal information retrieval models. His research experience includes distributed information retrieval approaches to web search, temporal aspects of information access, mouse-tracking, cross-lingual information retrieval, graph-based retrieval methods, and synthesizing information from multiple corpora. Currently, he is studying them in the context of unexpected crisis events. Fernando co-organized the SIGIR 2011 Workshop on Analysis of User Generated Content Under Crisis, the SIGIR 2012 and 2013 Workshop on Time-Sensitive Information Access, and the TREC 2013 Temporal Summarization track. He will be co-chairing the 2014 Web Search and Data Mining (WSDM) conference. More about Fernando: http://ciir.cs.umass.edu/~fdiaz/
Hemant Purohit is an interdisciplinary (Computer and Social Sciences) researcher at Kno.e.sis where he coordinates crisis informatics research under NSF SoCS project. He is ICCM-2013 fellow supported by Google and ICT4Peace foundation. He is pursuing a unique approach of people-content-network analysis guided by psycholinguistic theories of coordination to analyze big crisis data and answer: whom to coordinate, why to coordinate and how to coordinate. His work also involves problem spaces of community engagement & sustainability, expert detection & presentation, etc. He has served as PC and external reviewer for conferences and workshops including HICSS, SocInfo, ICWSM, WWW, Journal of CSCW, Journal of Knowledge Discovery & Data Mining, etc. He has also presented on citizen sensing and crisis response platform at some venues including World Usability Day. More about Hemant: http://knoesis.org/hemant
- H. Purohit, C. Castillo, P. Meier, A. Sheth. Crisis Mapping, Citizen Sensing and Social Media Analytics- Leveraging Citizen Roles for Crisis Response Coordination. The Seventh Int'l AAAI Conference on Weblogs and Social Media, ICWSM-2013 Tutorial. (details and domain references)
- C. Castillo. Finding Relevant and Credible Information on Social Media During Disasters. Conference on Big Data Systems, Applications, and Privacy , March 2013
- C. Castillo, Wei Chen, Laks V. S. Lakshmanan. Information and Influence Spread in Social Networks, KDD 2012 Tutorial.
- H. Purohit, A. Hampton, V. Shalin, A. Sheth, J. Flach, S. Bhatt. What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination. Computers in Human Behavior (CHB) journal, Vol. 29, Issue 6, Nov. 2013, P 2438–2447
- H. Purohit, A. Hampton, S. Bhatt, V. Shalin, A. Sheth, J. Flach. An Information Filtering and Management Model for Twitter Traffic to Assist Crises Response Coordination. Journal of CSCW, 2014 (To appear)
- H. Purohit, C. Castillo, F. Diaz, A. Sheth, P. Meier. Automatically Matching Needs and Offers during Emergency Response Coordination. First Monday, vol 19 (1), 2014
- A. Sheth, A. Jadhav, P. Kapanipathi, C. Lu, H. Purohit, G. A. Smith, W. Wang. Twitris- a System for Collective Social Intelligence. Encyclopedia of Social Network Analysis and Mining (ESNAM), Springer, 2014. (In printing)
- H. Purohit. Crisis Response Coordination in Online Communities. Doctoral Consortium, NSF SOCS Symposium, 2013
- H. Purohit and A. Sheth. Twitris v3: From Citizen Sensing to Analysis, Coordination and Action. The 7th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2013
- M. Imran, I. Lykourentzou, C. Castillo. Engineering Crowdsourced Stream Processing Systems. Technical Report, QCRI, 2013 (Under review)
- M. Imran, S. Elbassuoni, C. Castillo, F. Diaz, P. Meier. Extracting information nuggets from disaster-related messages in social media. In 10th International Conference on Information Systems for Crisis Response and Management, 2013.
- Y. Chang, A. Dong, P. Kolari, R. Zhang, Y. Inagaki, F. Diaz, H. Zha, and Y. Liu. Improving Recency Ranking Using Twitter Data. ACM Transactions Intelligent Systems Technology, 4(1):4:1--4:24, February 2013
- Q. Guo, F. Diaz, E. Yom-Tov. Updating users about time critical events. In Proceedings of the 35th European conference on Advances in Information Retrieval (ECIR'13), 483--494. [slightly extended]
- F. Diaz, D. Metzler, S. Amer-Yahia. Relevance and Ranking in Online Dating Systems. SIGIR 2010.
- A. Dong, R. Zhang, P. Kolari, J. Bai, F. Diaz, Y. Chang, Z. Zheng, H. Zha. Time is of the Essence: Improving Recency Ranking Using Twitter Data. WWW 2010
- C. Castillo, M. Mendoza, B. Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World wide web (WWW '11). ACM, New York, NY, USA, 675-684.
- C. Castillo, M. Mendoza, B. Poblete: Predicting Information Credibility in Time-Sensitive Social Media. Accepted for publication in Internet Research, special issue on The Predictive Power of Social Media. 2013.
- F. Shih, O. Seneviratne, D. Miao, I. Liccardi, L. Kagal, E. Patton, P. Meier, C. Castillo. Democratizing Mobile App Development for Disaster Management. To be presented at the IJCAI Workshop on Semantic Cities. Beijing, China, 2013.
- A. Popoola, D. Krasnoshtan, A. Toth, V. Naroditskiy, C. Castillo, Patrick Meier and Iyad Rahwan. Information Verification during Natural Disasters. In SWDM workshop. Rio de Janeiro, Brazil, 2013.
- P. Meier. Does the Humanitarian Industry Have a Future in the Digital Age? International Journal of Media and Information Policy, (1), 2012.
- P. Meier. Crisis Mapping in Areas of Limited Statehood. In Information and Communication Technologies in Areas of Limited Statehood, ed. Steven Livingston and Gregor Walter-Drop. Oxford University Press, forthcoming, 2013.
- P. Meier. New Information Technologies and their Impact on the Humanitarian Sector. International Review of the Red Cross, 93(883), 2012.
- K. Starbird and J. Stamberger (2010). Tweak the Tweet: Leveraging Microblogging Proliferation with a Prescriptive Grammar to Support Citizen Reporting. Short paper presented at the 7th International Information Systems for Crisis Response and Management Conference (Seattle, Washington, USA, May 2010). ISCRAM 2010
- Vieweg, Sarah, Amanda L. Hughes, Kate Starbird, and Leysia Palen. (2010). A Comparison of Microblogging Behavior in Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness. Proceedings of the ACM 2010 Conference on Human Factors in Computing Systems (CHI 2010), Atlanta, GA, pp. 1079-1088
- L. Palen, K. M. Anderson, G. Mark, J. Martin, D. Sicker, M. Palmer, and D. Grunwald. A vision for technology-mediated support for public participation and assistance in mass emergencies and disasters. In Proceedings of the 2010 ACM-BCS Visions of Computer Science Conference (Edinburgh, United Kingdom, April 14 – 16, 2010). ACM-BCS Visions of Computer Science. British Computer Society, Swinton, UK, 1-12.
- A. Sheth. Citizen Sensing, Social Signals, and Enriching Human Experience. IEEE Internet Computing, pp. 80-85, July/August 2009.
- D. Gruhl, M. Nagarajan, J. Pieper, C. Robson, A. Sheth. Multimodal Social Intelligence in a Real-Time Dashboard System, special issue on 'Data Management and Mining for Social Networks and Social Media', the VLDB Journal, 2010.
- Application of authors’ work: http://irevolution.net/2013/05/29/analyzing-tweets-tornado/, http://techpresident.com/news/wegov/24082/twitris-taking-crisis-mapping-next-level
- Additional Reference publications: (Humanitarian Computing Bibliography)
- Ushahidi Platform: http://ushahidi.com, http://www.nytimes.com/2010/03/14/weekinreview/14giridharadas.html
- Twitris: Web App Analyzes Tweets in Real Time for a Record of Historic Events, Mashable, Feb 2012.
- Election 2012 Prediction: Semantic Web Recap, http://semanticweb.com/election-2012-the-semantic-recap_b33278 (shows Twitris’ significant advances compared to other systems for Social Media Analytics).
- E. Ruiz, V. Hristidis, C. Castillo, A. Gionis and A. Jaimes: Correlating Financial Time Series with Micro-Blogging Data. In WSDM, Seattle, Washington. pp. 513-522, ACM Press. 2012.
- Anagnostopoulos, C. Castillo, A, Gionis, L. Becchetti, S, Leonardi: Online Team Formation in Social Networks. In Proc. of WWW, pp. 839-848. Lyon, France, 2012. ACM Press.
- Meier, Patrick. 2012. Crisis Mapping in Action: How Open Source Software and Global Volunteer Networks Are Changing the World, One Map at a Time. Journal of Map and Geography Libraries.
- Bailard, Catie et al. 2012. Mapping the Maps: A Meta-Level Analysis of Ushahidi and Crowdmap. Report of the Internews Center for Innovation & Learning, May 2012, Washington DC.
- Land, Molly et al. 2012. #ICT4HR: ICT for Human Rights. World Bank Report, Washington DC, December 2012.
- Heinzelman, Jessica and Patrick Meier. 2012. Crowdsourcing for Human Rights Monitoring: Challenges and Opportunities for Verification. In Human Rights and Information Communication Technologies: Trends and Consequences of Use, ed. John Lannon. IGI Global.
- Meier, Patrick. 2012. The Role of Ushahidi as a Liberation Technology in Egypt and Beyond. In Liberation Technology: Social Media and the Struggle for Democracy. Johns Hopkins University Press.
- Y. Ruan, H. Purohit, D. Fuhry, S. Parthasarthy, A. Sheth. Prediction of Topic Volume on Twitter. 4th Int'l ACM Conference of Web Science (WebSci), 2012.
- H. Purohit, A. Hampton, V. Shalin, A. Sheth, J. Flach. Framework for the Analysis of Coordination in Crisis Response, Collaboration and Crisis Informatics, CSCW-2012.
- P. Mendes, P. Kapanipathi, and A. Passant. Twarql: Tapping into the Wisdom of the Crowd, Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 1-3 September 2010. (Winner of Triplification Challenge 2010).
- Mendes PN, Passant A, Kapanipathi P, Sheth AP. Linked Open Social Signals. WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010.
- M. Nagarajan. Understanding User-Generated Content on Social Media, Understanding User-Generated Content on Social Media, Ph.D. Dissertation, Wright State University, 2010.
- Jadhav, H. Purohit, P. Kapanipathi, P. Ananthram, A. Ranabahu, V. Nguyen, P. Mendes, A. G. Smith, M. Cooney, A. Sheth. Twitris 2.0 : Semantically Empowered System for Understanding Perceptions From Social Data, ISWC 2010 Semantic Web Application Challenge.
- M. Nagarajan, H. Purohit, A. Sheth. A Qualitative Examination of Topical Tweet and Retweet Practices, 4th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2010..
- Semantics driven Analysis of Social Media, Kno.e.sis Social Computing group report.
- H. Purohit, A. Dow, O. Alonso, L. Duan, K. Haas. User Taglines: Alternative Presentations of Expertise and Interest in Social Media , First ASE International Conference on Social Informatics, 2012.
- H. Purohit, J. Ajmera, S. Joshi, A. Verma, A. Sheth. Finding Influential authors in Brand-page Communities, 6th Int'l AAAI Conference on Weblogs and Social Media, ICWSM 2012.