2014 SIAM InternationalConf on Data Mining (SDM-2014) Tutorial: Leveraging Social Media and Web of Data for Crisis Response Coordination


Details: SDM-14 Tutorial program

Predecessor with extensive domain knowledge: ICWSM-2013 tutorial

Background: NSF SoCS project

Slides :



There is an ever increasing number of users in social media (1B+ Facebook users, 500M+ Twitter users) and ubiquitous mobile access (6B+ mobile phone subscribers) who share their observations and opinions. In addition, the Web of Data and existing knowledge bases keep on growing at a rapid pace. In this scenario, we have unprecedented opportunities to improve crisis response by extracting social signals, creating spatio-temporal mappings, performing analytics on social and Web of Data, and supporting a variety of applications. Such applications can help provide situational awareness during an emergency, improve preparedness, and assist during the rebuilding/recovery phase of a disaster. Data mining can provide valuable insights to support emergency responders and other stakeholders during crisis. However, there are a number of challenges and existing computing technology may not work in all cases. Therefore, our objective here is to present the characterization of such data mining tasks, and challenges that need further research attention. This tutorial weaves two themes and corresponding relevant topics:

  • Characterization of Citizen Sensing and the Web of Data: opportunity lies in an explosion of data resulting from the participation of millions of end users (citizen sensors) through a wide variety of means and forms during crises: SMS, tweets and Facebook posts, with time and optional GPS metadata, in addition to news, blogs, Wikipedia, existing Web knowledge, etc.
  • Technical challenges and recent research for leveraging the citizen sensing with Web of Data to improve crisis response coordination: role of semantics (esp. background knowledge) enhanced techniques and analysis of casual/informal text for crisis response, data integration for enhanced situational awareness for responders, actionable information extraction for decision makers, crowdsourcing for information credibility and applicability of the existing methods with scope of research in the crisis response space.

Although broad in scope, this tutorial will present key research advances in depth. These include (i) understanding and analysis of informal text, including microblogs and post-disaster ad-hoc social media communities: e.g., information credibility in the crisis data streams, role of semantic/background knowledge enhanced techniques with Web of Data, (ii) enabling complex social media analytics/summarization platforms, (iii) performing interdisciplinary research involving computer and social sciences (e.g. linguistic analysis and coordination) to address important practical applications such as better and faster aid targeting and resource coordination and utilization, by computing methods. Technical insights will be coupled with identification of computational techniques and algorithms along with real-world examples.



Level: Intermediate.
The tutorial will interest both Data Mining researchers as well as technologists since it will reveal research that is in its infancy with many examples and demonstrations vs. research that has matured. We shall provide an overview of both citizen sensing and crisis mapping for response coordination, therefore, attendees are not expected to have a specialized background and knowledge in these areas. Those with a background in text mining, graph mining, data summarization and geo-spatial data analysis will be able to better understand novel technical challenges dealing with short and informal text signals during crisis response as well as other aspects of combined analysis of social and Web of data.


The tutorial objective is to provide details about data characteristics and computing challenges for solving problems of crisis response, by showing problems with the existing methods and potential for new research in this space. The tutorial will give brief introduction of users and settings in the crisis response domain, the end users of crisis computing-- who are the agencies and potential end-users, what are their expectations, what tools do they currently have, etc. After understanding ‘what crisis responders want’, we shall present ‘what data can support’ and ‘how computing techniques can help’. While we describe various problems of interest for crisis response, for each problem we shall present existing work (not limited to authors’ prior work alone) on characterizing what data is available, and present some notes about what kind of operations this data may support. It will enable researchers to understand challenges in applying existing computational techniques. We will further illustrate new computing methods needed for this space, and what is required to leverage in the state of the art methods, while motivating for further research.

We specifically want to address the following problems and the characterization of data, existing work and need for further research in the crisis computing space: A.) Problems and methods: [Predecessor for background: ICWSM-2013 tutorial]

  • Detection of crisis events
  • Extraction of structured data from unstructured citizen reports
  • Summarization of key events during the development of a crisis
  • Information classification for situational awareness and rapid evaluation of crisis impact
  • Coordination of donations and resources
  • Human-assisted data mining: hybrid stream processing using crowdsourcing

B.) For each problem, we will present:

  • What crisis responders and decision makers may need (problem description)
  • What data there is available and its challenges (e.g. scale, heterogeneity, etc.)
  • What research outputs and technologies have been used or may be used
  • Potential scientific and technological challenges



Carlos Castillo (PhD) is a Senior Scientist in the Social Computing team at the Qatar Computing Research Institute (QCRI) in Doha. He is a web miner with a background on information retrieval, and has been influential in the areas of adversarial web search and web content quality and credibility. He is an active researcher with more than 40 publications in top-tier international conferences and journals and 4000+ citations. His current research focuses on the application of web mining methods to problems in the domain of online news and humanitarian crises. More about Carlos: http://www.chato.cl/research/ 

Fernando Diaz (PhD) is a researcher at the Microsoft Research NYC lab. His primary research interest is formal information retrieval models. His research experience includes distributed information retrieval approaches to web search, temporal aspects of information access, mouse-tracking, cross-lingual information retrieval, graph-based retrieval methods, and synthesizing information from multiple corpora. Currently, he is studying them in the context of unexpected crisis events. Fernando co-organized the SIGIR 2011 Workshop on Analysis of User Generated Content Under Crisis, the SIGIR 2012 and 2013 Workshop on Time-Sensitive Information Access, and the TREC 2013 Temporal Summarization track.  He will be co-chairing the 2014 Web Search and Data Mining (WSDM) conference. More about Fernando: http://ciir.cs.umass.edu/~fdiaz/

Hemant Purohit is an interdisciplinary (Computer and Social Sciences) researcher at Kno.e.sis where he coordinates crisis informatics research under NSF SoCS project. He is ICCM-2013 fellow supported by Google and ICT4Peace foundation. He is pursuing a unique approach of people-content-network analysis guided by psycholinguistic theories of coordination to analyze big crisis data and answer: whom to coordinate, why to coordinate and how to coordinate. His work also involves problem spaces of community engagement & sustainability, expert detection & presentation, etc. He has served as PC and external reviewer for conferences and workshops including HICSS, SocInfo, ICWSM, WWW, Journal of CSCW, Journal of Knowledge Discovery & Data Mining, etc. He has also presented on citizen sensing and crisis response platform at some venues including World Usability Day. More about Hemant: http://knoesis.org/hemant


Advisory Members: Dr. Patrick Meier (QCRI), Prof. Amit Sheth (Kno.e.sis)