You are here

Thesis

Do you want to lose weight fast but do not know how? Are you tired of big belly? To lose weight quickly you need to follow the rules how to lose weight fast and how to lose weight fast for women.
Eat less harmful products, get exercise, then to not ask yourself how to lose weight fast for men, try all sorts of fast diets, including detox diet. Love your body and do not overeat to be thin.

Semantics Enriched Service Environments

Abstract

During the past seven years services centric computing has emerged as the preferred approach to architect complex software. Software is increasingly developed by integrating remotely existing components, popularly called services. This architectural paradigm, also called Service Oriented Architecture (SOA), brings with it the benefits of interoperability, agility and flexibility to software design and development. One can easily add or change new features to existing systems, either by the addition of new services or by replacing existing ones. Two popular approaches have emerged for realizing SOA. The first approach is based on the SOAP protocol for communication and the Web Service Description Language (WSDL) for service interface description. SOAP and WSDL are built over XML, thus guaranteeing minimal structural and syntactic interoperability. In addition to SOAP and WSDL, the WS-* (WS-Star) stack or SOAP stack comprises other standards and specification that enable features such as security and services integration. More recently, the RESTful approach has emerged as an alternative to the SOAP stack. This approach advocates the use of the HTTP operations of GET/PUT/POST/DELETE as standard service operations and the REpresentational State Transfer (REST) paradigm for maintaining service states. The RESTful approach leverages on the HTTP protocol and has gained a lot of traction, especially in the context of consumer Web applications such as Maps.

Despite their growing adoption, the stated objectives of interoperability, agility, and flexibility have been hard to achieve using either of the two approaches. This is largely because of the various heterogeneities that exist between different service providers. These heterogeneities are present both at the data and the interaction levels. Fundamental to addressing these heterogeneities are the problems of service Description, Discovery, Data mediation and Dynamic configuration. Currently, service descriptions capture the various operations, the structure of the data, and the invocation protocol. They however, do not capture the semantics of either the data or the interactions. This minimal description impedes the ability to find the right set of services for a given task, thus affecting the important task of service discovery. Data mediation is by far the most arduous task in service integration. This has been a well studied problem in the areas of workflow management, multi-database systems and services computing. Data models that describe real world data, such as enterprise data, often involve hundreds of attributes. Approaches for automatic mediation have not been very successful, while the complexity of the models require considerable human effort. The above mentioned problems in description, discovery and data mediation pose considerable challenge to creating software that can be dynamically configured.
This dissertation is one of the first attempts to address the problems of description, discovery, data mediation and dynamic configuration in the context of both SOAP and RESTful services. This work builds on past research in the areas of Semantic Web, Semantic Web services and Service Oriented Architectures. In addition to addressing these problems, this dissertation also extends the principles of services computing to the emerging area of social and human computation. The core contributions of this work include a mechanism to add semantic metadata to RESTful services and resources on the Web, an algorithm for service discovery and ranking, techniques for aiding data mediation and dynamic configuration. This work also addresses the problem of identifying events during service execution, and data integration in the context of socially powered services.


Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

Guozhou Dong, Ph.D.

Michael L. Raymer, Ph.D.

Lakshmish Ramaswamy, PhD

Krishnaprasad Thirunarayan, PhD

Shu Schiller, PhD

Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery

Abstract

Semantic Provenance: Modeling, Querying, and Application in Scientific Discovery Provenance metadata, describing the history or lineage of an entity, is essential for ensuring data quality, correctness of process execution, and computing trust values. Traditionally, provenance management issues have been dealt with in the context of workflow or relational database systems. However, existing provenance systems are inadequate to address the requirements of an emerging set of applications in the new eScience or Cyberinfrastructure paradigm and the Semantic Web. Provenance in these applications incorporates complex domain semantics on a large scale with a variety of uses, including accurate interpretation by software agents, trustworthy data integration, reproducibility, attribution for commercial or legal applications, and trust computation. In this dissertation, we introduce the notion of 'semantic provenance' to address these requirements for eScience and Semantic Web applications. In addition, we describe a framework for management of semantic provenance by addressing the three issues of, (a) provenance representation, (b) query & analysis, and (c) scalable implementation. First, we introduce a foundational model of provenance called Provenir to serve as an upper-level reference ontology to facilitate provenance interoperability. Second, we define a classification scheme for provenance queries based on the query characteristics and use this scheme to define a set of specialized provenance query operators. Third, we describe the implementation of a highly scalable query engine to support the provenance query operators, which uses a new class of materialized views based on the Provenir ontology, called Materialized Provenance Views (MPV), for query optimization. We also define a novel provenance tracking approach called Provenance Context Entity (PaCE) for the Resource Description Framework (RDF) model used in Semantic Web applications. PaCE, defined in terms of the Provenir ontology, is an effective and scalable approach for RDF provenance tracking in comparison to the currently used RDF reification vocabulary. Finally, we describe the application of the semantic provenance framework in biomedical and oceanography research projects.

Slideshow


Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

Olivier Bodenreider, Ph.D.

Michael L. Raymer, Ph.D.

Nicholas V. Reo, Ph.D.

Krishnaprasad Thirunarayan, PhD

William S. York, Ph.D.

A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data

Abstract

Spatial and temporal data are critical components in many applications. This is especially true in analytical applications ranging from scientific discovery to national security and criminal investigation. The analytical process often requires uncovering and analyzing complex thematic relationships between disparate people, places and events. Fundamentally new query operators based on the graph structure of Semantic Web data models, such as semantic associations, are proving useful for this purpose. However, these analysis mechanisms are primarily intended for thematic relationships. This dissertation proposes a framework built around the RDF data model for analysis of thematic, spatial and temporal relationships between named entities. We present a spatiotemporal modeling approach that uses an upper-level ontology in combination with temporal RDF graphs. A set of query operators that use graph patterns to specify a form of context are formally defined, and an extension of the W3C-recommended SPARQL query language to support these query operators is presented. We also describe an efficient implementation of the framework that extends a state-of-the-art commercial database system. We demonstrate the scalability of our approach with a performance study using both synthetic and real-world RDF datasets of over 25 million triples.

Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

Krishnaprasad Thirunarayan, PhD

Soon Chung, Ph.D.

Christopher Barton, PhD

Kate Beard, PhD

Understanding User-generated Content on Social Media

Abstract

Over the last few years, there has been a growing public and enterprise fascination with 'social media' and its role in modern society. At the heart of this fascination is the ability for users to participate, collaborate, consume, create and share content via a variety of platforms such as blogs, micro-blogs, email, instant messaging services, social network services, collaborative wikis, social bookmarking sites, and multimedia sharing sites. Today, in addition to any factual information, we are also able to access conversations, opinions and emotions that these facts evoke among other users. We are able to ask questions such as, what are people saying about any news-worthy event or entity? Can we use this information to assess a population<92>s preference? Can we study how these preferences propagate in a network of friends? Are such crowd-sourced preferences a good substitute for traditional polling methods?This dissertation is devoted to understanding informal user-generated textual content on social media platforms and using the results of the analysis to build Social Intelligence Applications.The body of research presented in this thesis focuses on understanding what a piece of user- generated content is <91>About<92> via two sub-goals of Named Entity Recognition and Key Phrase Ex- traction on informal text. In light of the poor context and informal nature of content on social media platforms, we investigate the role of contextual information from documents, domain mod- els and the social medium to supplement and improve the reliability and performance of existing text mining algorithms for Named Entity Recognition and Key Phrase Extraction.In all cases we find that using multiple contextual cues together lends to reliable inter-dependent decisions, better than using the cues in isolation and that such improvements are robust across domains and content of varying characteristics, from micro-blogs like Twitter, social networking forums such as those on MySpace and Facebook, and blogs on the Web.Finally, we showcase two deployed Social Intelligence applications that build over the results of Named Entity Recognition and Key Phrase Extraction algorithms to provide near real-time information about the pulse of an online populace. Specifically, we describe what it takes to build applications that wish to exploit the <91>wisdom of the crowds<92> <96> highlighting challenges in data collection, processing informal English text, metadata extraction and presentation of the resulting information.

Slideshow


Video


Convocation

Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

John M. Flach, Ph.D.

Daniel Gruhl, Ph.D.

Kevin Haas, M.S.

Michael L. Raymer, Ph.D.

Shaojun Wang, Ph.D.

Extracting, Representing and Mining Semantic Metadata from Text: Facilitating Knowledge Discovery in Biomedicine

Abstract

The information access paradigm offered by most contemporary text information systems is a search-and-sift paradigm where users have to manually glean and aggregate relevant information from the large number of documents that are typically returned in response to keyword queries. Expecting the users to glean and aggregate information has lead to several inadequacies in these information systems. Owing to the size of many text databases, search-and-sift is a very tedious often requiring repeated keyword searches refining or generalizing queries terms. A more serious limitation arises from the lack of automated mechanisms to aggregate content across different documents to discover new knowledge. This dissertation focuses on processing text to assign semantic interpretations to its content (extracting Semantic metadata) and the design of algorithms and heuristics to utilize the extracted semantic metadata to support knowledge discovery operations over text content. Contributions in extracting semantic metadata in this dissertation cover the extraction of compound entities and complex relationships connecting entities. Extraction results are represented using a standard Semantic Web representation language (RDF) and are manually evaluated for accuracy. Knowledge discovery algorithms presented herein operate on RDF data. To further improve access mechanisms to text content, applications supporting semantic browsing and semantic search of text are presented.


Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

Dr. Vassant Honavar, Ph.D.

Michael L. Raymer, Ph.D.

Dr. Thaddeus Tarpey, Ph.D.

Shaojun Wang, Ph.D.

Linked Open Data Alignment and Querying

Abstract

The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 295 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.

This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.



Video link - Prateek Jain dissertation defense

Committee Members


Amit P. Sheth, Ph.D.(Advisor)

Dr. Pascal Hitzler

Dr. Krishnaprasad Thirunarayan, PhD

Dr. Kunal Verma

Dr. Peter Z. Yeh

Abstraction Driven Application and Data Portability in Cloud Computing

Abstract

Cloud computing has changed the way organizations create, manage, and evolve their applications. While many organizations are eager to use the cloud, tempted by substantial cost savings and convenience, the implications of using clouds is not well understood yet. One of the major concerns in cloud adoption is the vendor lock-in of applications, caused by the heterogeneity of the numerous cloud service offerings. Vendor locked applications are difficult, if not impossible , to port from one cloud system to another. This forces cloud service consumers to use undesired or suboptimal solutions and makes it difficult to incorporate the redundancy needed by some organization for high availability.

Given the current state-of-the-art, supporting multiple cloud systems require multiple development efforts, thus avoiding vendor lock-in is an expensive proposition. In the long run, this problem negatively affects the adoption of cloud technologies.

This dissertation investigates a comprehensive solution to address the issue of application lockin in cloud computing. Our primary principle is the use of carefully designed abstractions in a manner that makes the heterogeneity of the clouds invisible. The first part of this dissertation investigates the development of cloud applications using abstract specifications. Given the domain specific nature of many cloud workloads, we focused on using Domain Specific Languages (DSLs). We applied DSL based development techniques to two domains with different characteristics and learnt that our solution indeed results in significant savings in cost and effort when building portable cloud applications. The second part of this dissertation presents the use of process abstractions for application deployment and management in clouds. Many cloud service consumers are focused on specific application oriented tasks, thus we provided abstractions for the most useful cloud interactions via a middleware layer. Our middleware system, Altocumulus not only provided the independence from the various process differences, but also provided the means to reuse known best practices. The success of Altocumulus also influenced a commercial product, the IBM Workload Deployer (http://www-01.ibm.com/software/webservers/workload-deployer/) .

Finally, we showcase two publicly hosted Web tools, MobiCloud (http://mobicloud.knoesis.org/) and SCALE (http://metabolink.knoesis.org/SCALE), that encapsulate the abstractions in every step of the application life-cycle. These tools allow domain experts to quickly create applications and deploy them to clouds, irrespective of the target cloud system, highlighting the applicability of our solutions in practice.


Video


Committee Members


Amit P. Sheth, Ph.D.(Advisor)

Dr. Keke Chen

Dr. E. Michael Maximilien (IBM Research)

Krishnaprasad Thirunarayan, PhD

Knowledge Acquisition in a System

Abstract

I present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information. I present the metaphor of the “Circle of Knowledge on the Web”. In this context., knowledge acquisition on the web is seen as analogous to the way scientific disciplines gradually increase the knowledge available in their field. Here, formal models of interest domains are created automatically or manually and then validated by implicit and explicit validation methods before the statements in the created models can be added to larger knowledge repositories, such as the Linked open Data cloud. This knowledge is then available for the next iteration of the knowledge acquisition cycle. I will both give a theoretical underpinning as well as practical methods for the acquisition of knowledge in collaborative systems. I will cover both the Knowledge Engineering angle as well as the Information Extraction angle of this problem. Unlike traditional approaches, however, this dissertation will show how Information Extraction can be incorporated into a mostly Knowledge Engineering based approach as well as how an Information Extraction-based approach can make use of engineered concept repositories. Validation is seen as an integral part of this systemic approach to knowledge acquisition. The centerpiece of the dissertation is a domain model extraction framework that implements the idea of the “Circle of Knowledge” to automatically create semantic models for domains of interest. It splits the involved Information Extraction tasks into that of Domain Definition, in which pertinent concepts are identified and categorized, and that of Domain Description, in which facts are extracted from free text that describe the extracted concepts. I then outline a social computing strategy for information validation in order to create knowledge from the extracted models.

Slideshow

Video


Committee Members


Amit P. Sheth, Ph.D.
(Advisor)

Dr.Pascal Hitzler

Pankaj Mehra, PhD

Dr. Shaojun Wang

Dr.Gerhard Weikum

A Semantics-based Approach to Machine Perception


Abstract

Machine perception can be formalized using semantic web technologies in order to derive abstractions from sensor data using background knowledge on the Web, and efficiently executed on resource-constrained devices.

Advances in sensing technology hold the promise to revolutionize our ability to observe and understand the world around us. Yet the gap between observation and understanding is vast. As sensors are becoming more advanced and cost-effective, the result is an avalanche of data of high volume, velocity, and of varied type, leading to the problem of too much data and not enough knowledge (i.e., insights leading to actions). Current estimates predict over 50 billion sensors connected to the Web by 2020.1 While the challenge of data deluge is formidable, a resolution has profound implications. The ability to translate low-level data into high-level abstractions closer to human understanding and decision-making has the potential to disrupt data-driven interdisciplinary sciences, such as environmental science, healthcare, and bioinformatics, as well as enable other emerging technologies, such as the Internet of Things.

The ability to make sense of sensory input is called perception; and while people are able to perceive their environment almost instantaneously, and seemingly without effort, machines continue to struggle with the task. Machine perception is a hard problem in computer science, with many fundamental issues that are yet to be adequately addressed, including: (a) annotation of sensor data, (b) interpretation of sensor data, and (c) efficient implementation and execution. This dissertation presents a semantics-based machine perception framework to address these issues.

Slideshow


A Semantics-based Approach to Machine Perception

Video



Committee Members


Amit P. Sheth, Ph.D. (Advisor)

Krishnaprasad Thirunarayan

Satya Sahoo

Payam Barnaghi

John Gallagher

Adaptive Semantic Annotation of Entity an Concept Mentions in Text


Abstract

The recent years have seen an increase in interest for knowledge repositories that are useful across applications, in contrast to the creation of ad hoc or application-specific databases. These knowledge repositories figure as a central provider of unambiguous identifiers and semantic relationships between entities. As such, these shared entity descriptions serve as a common vocabulary to exchange and organize information in different formats and for different purposes. Therefore, there has been remarkable interest in systems that are able to automatically tag textual documents with identifiers from shared knowledge repositories so that the content in those documents is described in a vocabulary that is unambiguously understood across applications.

Tagging textual documents according to these knowledge bases is a challenging task. It involves recognizing the entities and concepts that have been mentioned in a particular passage and attempting to resolve eventual ambiguity of language in order to choose one of many possible meanings for a phrase. There has been substantial work on recognizing and disambiguating entities for specialized applications, or constrained to limited entity types and particular types of text. In the context of shared knowledge bases, since each application has potentially very different needs, systems must have unprecedented breadth and flexibility to ensure their usefulness across applications. Documents may exhibit different language and discourse characteristics, discuss very diverse topics, or require the focus on parts of the knowledge repository that are inherently harder to disambiguate. In practice, for developers looking for a system to support their use case, is often unclear if an existing solution is applicable, leading those developers to trial-and-error and ad hoc usage of multiple systems in an attempt to achieve their objective.

In this dissertation, I propose a conceptual model that unifies related techniques in this space under a common multi-dimensional framework that enables the elucidation of strengths and limitations of each technique, supporting developers in their search for a suitable tool for their needs. Moreover, the model serves as the basis for the development of flexible systems that have the ability of supporting document tagging for different use cases. I describe such an implement-tation, DBpedia Spotlight, along with extensions that we performed to the knowledge base DBpedia to support this implementation. I report evaluations of this tool on several well known data sets, and demonstrate applications to diverse use cases for further validation.


Publications

  • 2013, Pablo N. Mendes, Dirk Weissenborn, Chris Hokamp: DBpedia Spotlight at the MSM2013 Challenge. #MSM 2013 : 57-61, at WWW 2013.
  • 2012, Pablo N. Mendes, Peter Mika, Hugo Zaragoza, and Roi Blanco. Measuring website similarity using an entity-aware click graph. In 21st ACM International Conference on Information and Knowledge Management (CIKM’12), pages 1697–1701, 2012.
  • 2012, Mihály Héder, and Pablo N. Mendes. Round-trip semantics with sztakipedia and dbpedia spotlight. In Proceedings of the 21st World Wide Web Conference, WWW 2012 (Companion Volume), pages 357–360, 2012.
  • 2012, Pablo N. Mendes, Joachim Daiber, Rohana Rajapakse, Felix Sasaki, and Christian Bizer. Evaluating the impact of phrase recognition on concept tagging. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 2012.
  • 2011, Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. Dbpedia spotlight: shedding light on the web of documents. In Proceedings the 7th International Conference on Semantic Systems, I-SEMANTICS 2011, pages 1–8, 2011.
  • 2011, Pablo N. Mendes, Joachim Daiber, Max Jakob, and Christian Bizer. Evaluating DBpedia Spotlight for the TAC-KBP entity linking task. In Proceedings of the TAC-KBP 2011 Workshop, 2011.
  • 2010, Pablo N. Mendes, Alexandre Passant, Pavan Kapanipathi: Twarql: tapping into the wisdom of the crowd. I-SEMANTICS 2010
  • 2010, Pablo N. Mendes, Alexandre Passant, Pavan Kapanipathi, Amit P. Sheth: Linked Open Social Signals. Web Intelligence 2010: 224-231

Committee Members


Amit P. Sheth, Ph.D. (Advisor)

Krishnaprasad Thirunarayan

Dr. Shaojun Wang

Sören Auer (University of Leipzig)

© 2012 Kno.e.sis | 377 Joshi Research Center, 3640 Colonel Glenn Highway, Dayton, OH 45435 | (937 - 775 - 5217)