About Me

I am a final year PhD student at the Knoesis Center, Wright State University. My adviser is Prof. Amit Sheth. I have defended my proposal and preparing to graduate early 2016.

I work in the Semantic Web Lab. I am broadly intereseted in User Modeling; Personalization; Recommendation; Knowledge Graphs; Linked Data; Semantic Web; Text Mining; Information Retrieval; Applied Machine Learning; Social Data Analysis. My disseration topic is "Personalized and Adaptive Semantic Information Filtering for Social Media".

A summary of my professional activities over the years include: Interships and collaborations with DERI (Research Intern, 2011), Google, IBM TJ Watson Research Center (Research Intern, 2013) and Samsung Research America (Research Intern, 2014) resulting in successful top tier publications; Participated in 3 award winning grants from NSF and NIH; A winning entry in the open track for the Triplification Challenge at I-Semantics 2010; Program Committe Member of conferences such as IJCAI, ESWC, HT, WI confenrence; Co-developer of open source projects Twarql and SMOB.


-- October-2015: Presented on "Semantic Filtering as an example of Semantic technologies for real time analysis" at Big Data Tutorial in Wright State University. [ppt]
-- October-2015: Invited as Program Comittee Member for ESWC 2016, IJCAI 2016.
-- August-2015: 2 NSF proposals that I contributed to got funded (1) Hazards SEES: Social and Physical Sensing Enabled Decision Support ($1.97M -- Collaboration with Ohio State University); (2) Market Driven Innovations and Scaling up of Twitris($200K).
-- August-2015: Successfully defended my proposal. Title: Personalized and Adaptive Semantic Information Filtering for Social Media.
-- May-2015: Invited to present my research at EMC CTO Office, Bangalore. (as a part of Invited Speaker Series Lecture). [ppt]
-- Mar-2015: Our paper, "Knowledge Enabled Approach to Predict the Location of Twitter Users", is accepted at ESWC 2015.
-- Mar-2015: Invited as Program Comittee Member for HT 2015, ICWS 2015, ESWC 2015.
-- December-2014: Invited to present my research for Wright State University, Graduate School External Advisory Board.[ppt]
-- October-2014: Invited talk at Frontiers of Cloud Computing and Big Data Workshop 2014, organized and held at IBM TJ Watson Research Center.[poster][ppt]
-- September-2014: NIH-R01 grant proposal that I significantly contributed to, "Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use", got funded to Kno.e.sis, CITAR (Wright State University). [wiki]


  • Kno.e.sis Center, Wright State University
    Doctor of Philosophy in Computer Science
  • Dayton, OH
  • Kno.e.sis Center, Wright State University
    Master of Science in Computer Science
  • Dayton, OH
  • Visvesvaraya technological University
    Bachelor of Engineering in Computer Science
  • Bangalore, India


  • Dept of Computer Science and Engg, Wright State University
    Graduate Research/Teaching Assistant
  • Dayton, OH
    Jan 2010-Present
    • Led and contributed to multiple research projects, (1) Hazards SEES, (2) Twarql, and (3) Twitris
    • Contributed to three award winning proposals from NSF and NIH.
    • Collaborating with co-Phd and Masters students on research on knowledge-graphs and social data. Research problems include: (1) Domain specific knowledge base generation; (2) Entity recommendation using hierarchical knowledge bases; and (3) Predicting location of a Twitter user
    • Research projects and proposals involved interdisciplinary, local and international collaborations such as Ohio State University, CITAR (Wright State University), Digital Enterprise Research Institute (now Insight, Galway), raunhofer IAIS (University of Bonn, Germany).
    • Teaching assistant for “Introduction to Java” for three semesters.
  • Samsung Research America
    Research Intern
  • San Jose, CA
    May 2014-Dec 2014
    • Developed a semantic enrichment engine for trajectory data. The semantic enrichment engine harnessed relevant, open knowledge bases to enhance user experience (Patent pending).
    • Predicted location specific activities of interest from social data. The approach builds a probabilistic model leveraging Wikipedia as a source of background knowledge.
  • IBM TJ Watson Research Center
    Research Intern
  • Yorktown Heights, NY
    May 2013-Aug 2013
    • Modelled Twitter users’ preferences as a hierarchy inferred from a knowledge graph (Hierarchical Interest Graph).The methodology utilizes an adaptation of spreading activation algorithm to score the concepts in the hierarchical interest graph that captures users’ interests.
    • Prototyped a recommendation system that harnesses the hierarchical interest graph for recommending tweets to users. Evaluation of our approach against the state of the art tweet recommendation techniques such as Support Vector Machines, and Latent Dirichlet Allocation has shown superior performance.
    • Internship resulted in a poster and research paper at top conferences.
  • Digital Enterprise Research Institute
    Research Intern
  • Galway, Ireland
    Apr 2011-Aug 2011
    • Collaborated with Google for extending their Pubsubhubbub protocol to a privacy aware Semantic Hub.
    • Semantic Hub disseminates information based on publisher/user’s preferences. Semantic web technologies such as RDF, OWL, and SPARQL were utilized for determining and representing user preferences. This system was developed in Python requiring familiarity with Google app engine.
    • Semantic Hub is used by (1) SMOB (an open source, distributed, semantic microblogging framework). (2) Personalized Filtering of social stream.
    • Internship resulted in 4 publications at workshops and top conferences.
  • Accenture
  • Bangalore, India
    Jul 2007-Mar 2009
    • Primarily worked with two clients (1) Drugstore, and (2) SFR.
    • Drugstore.com: The objective of the project was to increase the performance of the web application running for drugstore.com. Eleven member team, we migrated the embedded SQL queries in the VC++ code to corresponding stored procedure calls.
    • SFR: Lead role in the development of new features and modules for CRM applications. Applications were developed in Java and required knowledge of IBM websphere (Application server) and Epiphany (CRM tool).
  • Robert Bosch
    Project Trainee
  • Bangalore, India
    Feb 2007-Jun 2007
    • Developed a Test-program compiler in C#. The test program is used in the calibration process of fuel injection pumps.


  • User Modeling and Recommendation for Social Media
  • Collaboration - IBM Research, Samsung Research

    Understanding users on the web and modeling their interests is important for two tasks: (1) customizing users' experience on the web; and (2) reducing information overload. To determine user interests various user generated content has been analyzed, for example, Google harness users' search history and click behavior for ranking search results, and Amazon leverages users' browsing and shopping history for recommending new items. While diverse information about users is available, the introduction and popularity of social media platforms, where users create significant content, has provided new opportunities to analyze data for understanding users. However, processing social media data introduces its own challenges such as noisy, informal text, and lack of semantic context due to its short-text nature. Addressing such challenges to model users online is the goal of this project, which also forms the core of my dissertation.

    • Hierarchical Interest Graphs: This work introduces a novel representation of user interests as a hierarchy termed as hierarchical interest graph. Interests are determined from users' social media posts and the hierarchical representation is inferred from Wikipedia hierarchy. The wikipedia hierarchy create for this work has been utilized by more than 5 research labs world wide. While a user study demostrates the quality of hierarchical interest graphs, we also prototyped a recommendation system that harnesses the hierarchical interest graph for recommending tweets to users. Our approach outperforms the existing techniques for content-based recommendation systems such as SVM and LDA.
    • Home Location Prediction of Social Media users: Location of a social media user adds value to various applications such as demographic based recommendation, and disaster management. However, only 4% of users on Twitter share their geographic information. Existing techniques are supervised and requires time intensive process of creating training data for each location to be predicted which also makes it harder for the techniques to adapt to new locations. To address these challenges, this work introduces a novel, knowledge-base driven, unsupervised methodology for predicting user home location.
    • Determining User Activities of Interest: Mundane activities (hobbies) such as running, watching football, and hiking that users share on social media can enhance exising recommendation systems. Existing techniques for extracting activities from social media data focus on extracting major events, ignoring mundane activities of users. In this work, we built a probabilistic model for determining user activities of interest from social data. The approach also maps the determined activities to Wikipedia concepts to enable the use of background knowledge associated with the concepts.

  • Hazards SEES
  • Collaboration - Ohio State University

    Infrastructure systems are a cornerstone of civilization. Damage to infrastructure from natural disasters such as an earthquake (e.g. Haiti, Japan), a hurricane (e.g. Katrina, Sandy) or a flood (e.g. Kashmir floods) can lead to significant economic loss and societal suffering. Human coordination and information exchange are at the center of damage control. This project seeks to radically reform decision support systems for managing rapidly changing disaster situations by the integrated exploitation of social, physical and hazard modeling capabilities.

    I play the role of the lead co-ordinator for the project. I am contributing to design novel, multi-dimensional cross-modal aggregation and inference methods to compensate for the uneven coverage of sensing modalities across an affected region. I am also working on using machine learning with knowledge base driven techniques to continuously adapt the filters for crawling social data relevant to dynamically evolving events. Preliminary evaluation shows that our technique can improve the recall of the tweets being streamed by maintaining good precision relevant to the event of interest

  • Twitris+: 360 degree Social Media Analytics platform
  • Social media has had unprecedented growth in the recent times. This gives opportunity to decision makers– from corporate analysts to coordinators during emergencies, to answer questions or take actions related to a broad variety of activities and situations: who should they really engage with, how to prioritize posts for actions in the voluminous data stream, what are the needs and who are the resource providers in emergency event, how is corporate brand performing, and does the customer support adequately serve the needs while managing corporate reputation etc. We demonstrate these capabilities using Twitris+ by multi-faceted anlaysis along dimensions of Spatio-Temporal-Thematic (STT), People-Content-Network (PCN), and Subjectivity: Emotion-Sentiment-Intent (ESI). Twitris’ diversity and depth of analysis is unprecedented.

    My work in this project involved porting the Twarql infrastructure to Twitris and allowing complex querying to answer interesting questions on Twitter data. The complex query answering enables SPARQL queries with DBpedia as the background knowledge on event relevant tweets.

  • Twarql: Twitter feeds through SPARQL
  • Twitter has become a prominent medium to share opinions, observations and suggestions in real-time. Insights from these microposts (”Wisdom of the Crowd”) has proved to be invaluable for businesses and researchers around the world. However, the microblog data published is increasing in numbers with the popularity and growth of Twitter. This has induced challenges in filtering these microblog data to cater the needs for aggregation and collective analysis for sense-making. Twarql addresses these challenges by leveraging Semantic Web technologies to enable a flexible query language for filtering microblog posts.

    I was involved in designing and developing end to end Twarql system. The system includes (1) a pipeline of information extraction techniques on tweets to extract metadata, (2) an annotation module to transform the raw metadata to RDF, and (3) a flexible filtering and broadcasting component for users. Technologies used include Java, Storm, RDF, SPARQL, Hadoop, and PIG. We published two papers with one winning the Triplification challenge in 2010.

  • SemPUSH: Controlled Content Dissemination in Social Networks
  • Collaboration - DERI, Google

    Users of traditional microblogging platforms such as Twitter face drawbacks in terms of (1) Privacy of status updates as a followee -- reaching undesired people (2) Information overload as a follower -- receiving uninteresting microposts from followees. In this project we have implemented a privacy-aware version of google's PuSH protocol (Semantic Hub) for distributed and user-controlled dissemination of microposts using SMOB (semantic microblogging framework). The approach leverages users' Social Graph to dynamically create group of followers who are eligible to receive micropost. The restrictions to create the groups are provided by the followee based on the hastags in the micropost. Both SMOB and Semantic Hub are available as open source

Publications and Patents (Google Scholar)

    Work in Progress

  • Pavan Kapanipathi, Krishnaprasad Thirunarayan, Fabrizio Orlandi, Amit Sheth, Pascal Hitzler. A Real-Time #approach for Continuous Crawling of Events on Twitter by Leveraging Wikipedia. (In Progress).[wiki]
  • Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth, Derek Doran. Hierarchical Knowledge Bases to Identify User Interests on Social Media. (Journal in Progress).[wiki]
  • Siva Kumar, Pavan Kapanipathi, Derek Doran, Prateek Jain, Amit Sheth. Exploring Taxonomical Interests for Entity Recommendations. Technical report, 2015.
  • Sarasi Sarangi, Pavan Kapanipathi, Amit Sheth. Domain-specific Sub graph Generation. Technical report, 2015.
  • Published Work

  • Raghava Mutharaju, and Pavan Kapanipathi. Are We Really Standing on the Shoulders of Giants? 1st Workshop on Negative or Inconclusive Results in Semantic Web 2015, ESWC.
  • Siva Kumar Chekula, Pavan Kapanipathi, Derek Doran, Amit Sheth. Entity Recommendations Using Hierarchical Knowledge Bases. 4th International Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, 2015.
  • Pavan Kapanipathi, Revathy Krishnamurthy (Joint first author), Amit Sheth, Krishnaprasad Thirunarayan. Knowledge Enabled Approach to Predict the Location of Twitter Users. Extended Sematic Web Conference, 2015. (acceptance rate 23%)[wiki]
  • Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth. User Interests Identification on Twitter Using a Hierarchical Knowledge Base. In Extended Semantic Web Conference 2014, Crete Greece (23% acceptance)[wiki][ppt][video]
  • Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth. Hierarchical Interest Graph from Twitter. In 23rd International conference on World Wide Web companion 2014 (WWW 2014), Seoul, South Korea.[poster]
  • Fabrizio Orlandi, Pavan Kapanipathi, Alexandre Passant, Amit Sheth. Characterising concepts of interest leveraging Linked Data and the Social Web. The 2013 IEEE/WIC/ACM International Conference on Web Intelligence, Atlanta, USA, United States, 2013.[ppt]
  • Pavan Kapanipathi, Fabrizio Orlandi, Amit Sheth, Alexandre Passant. Personalized Filtering of the Twitter Stream. 2nd workshop on Semantic Personalized Information Management at ISWC 2011, September 2011.[wiki] [ppt]
  • Pavan Kapanipathi, Julia Anaya, Amit Sheth, Brett Slatkin, Alexandre Passant. Privacy-Aware and Scalable Content Dissemination in Distributed Social Network. \textit{10th International Semantic Web Conference 2011}, Bonn, Germany, September 2011. (acceptance rate 22%)[wiki] [ppt][video]
  • Pavan Kapanipathi, Julia Anaya, Alexandre Passant. SemPuSH: Privacy-Aware and Scalable Broadcasting for Semantic Microblogging. 10th International Semantic Web Conference 2011, Bonn, Germany, September 2011.[poster]
  • Alexandre Passant, Julia Anaya, Owen Sacco, Pavan Kapanipathi. SMOB: The Best of Both Worlds. Federated Social Web Europe Conference, Berlin, June 3rd -5th 2011.
  • Alexandre Passant, Owen Sacco, Julia Anaya, Pavan Kapanipathi. Privacy-By-Design in Federated Social Web Applications. Websci 2011, Koblenz, Germany June 14-17, 2011.
  • Pablo Mendes, Pavan Kapanipathi, Alexandre Passant. Twarql: Tapping into the Wisdom of the Crowd. Triplification Challenge 2010 at 6th International Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 1-3 September 2010. (Winner of Triplification Challenge 2010)[wiki] [ppt][video demo]
  • Pablo Mendes, Alexandre Passant, Pavan Kapanipathi, Amit Sheth. Linked Open Social Signals. WI2010 IEEE/WIC/ACM International Conference on Web Intelligence (WI-10), Toronto, Canada, Aug. 31 to Sep. 3, 2010. (16.6% acceptance)[wiki] [ppt]
  • Pablo Mendes, Pavan Kapanipathi, Delroy Cameron, Amit Sheth. Dynamic Associative Relationships on the Linked Open Data Web. In: Proceedings of the WebSci10: Extending the Frontiers of Society On-Line, April 26-27th, 2010, Raleigh, NC: US.
  • Patents

  • Edwin Heredia, Joakhim Soderberg, Pavan Kapanipathi, Glenn Algie, Rodrigo Laiola Guimaraes, Alan Messer. A Semantic Enrichment System for Multimedia Compositions of Life-Logging Data (pending)

Proposals and Grants

  • NSF - (3 years, $1.97M) [wiki] -- limited contribution
    • Title: Social and Physical Sensing Enabled Decision Support.
    • Awardees: Wright State University (Kno.e.sis), Ohio State University
    • Project: Hazards SEES
  • NSF - (3 years, $200.000) [wiki] -- limited contribution
    • Title: Market Driven Innovations and Scaling up of Twitris.
    • Awardees: Wright State University (Kno.e.sis)
    • Project: Twitris Commercialization
  • NIH-R01 - (3 years, $1.6M) [wiki] -- significant contribution
    • Title: Trending: Social media analysis to monitor cannabis and synthetic cannabinoid use
    • Awardees: Wright State University (Kno.e.sis, CITAR), University of Massachusetts - Amherst
    • Project: EdrugTrends

Awards and Service

  • Invited to present my research on Semantic Filtering at the Big Data tutorial at Wright State University.
  • Invited to present my research at EMC CTO Office, Bangalore. (as a part of Invited Speaker Series Lecture). [ppt]
  • Invited to present my research for Wright State University, Graduate School External Advisory Board.[ppt]
  • Invited talk at Frontiers of Cloud Computing and Big Data Workshop, 2014, organized by and held at IBM TJ Watson Research Center.[poster][ppt]
  • Awarded scholarship of $1500 to attend the 10th International Semantic Web Conference at Bonn, Germany.
  • Winner of the Triplification Challenge: “Twarql: Tapping into the Wisdom of the Crowd” won the Triplification Challenge in the open track at 6th International Conference on Semantic Systems in September 2010.
  • PC Member: IJCAI2016, ESWC2016, ICWS2015, HT2015, ESWC2015, ESWC2014, WI2014, WoLE2013(WWW Workshop), WI2013, DERIVE2012 (ISWC Workshop).
  • External Reviewer: AAAI2015, ICWSM2015, WWW2015, AAAI2013, ESWC2013, WWW2013, ISWC2011, IJSWIS, etc.


CV, Resume