Wednesday, October 8, 2008
Relationship Web
to relate multimodal content across the Web. Following the first
generation of Web content access characterized by keyword driven document-retrieval, and the more recent process in entity awareness, we believe this third generation of relationship centric framework will support insight elicitation, semantic analytics and knowledge discovery over Web resources not possible so far. Relationship web incorporate the vision of trail blazing outlined by Dr. Bush in 1945!
More in our Internet Computing article: Relationship Web Blazing Semantic Trails between Web Resources (also available here).
Wednesday, October 24, 2007
What is Semantic Computing?
Phil Sheu in ICSC2007 cfp described it as:
"The field Semantic Computing applies technologies in natural languageprocessing, data and knowledge engineering, software engineering, computer systems and networks, signal processing and pattern recognition, and anycombination of the above to extract, access, transform and synthesize the semantics (contents) of multimedia, texts, services and structured data."
Here is my take:
Semantic computing is a vision of computing based on semantics shared between machines and people. It supports and exploits intrinsic, intended, and emergent meanings (content) in all aspects of computing, encompassing programming, algorithms, information management, and human interactions within devices, as part of communications, and across the Web. Semantics involves the use of formal descriptions, languages, and models, often encoded in metadata, knowledge, and representation of agreements (as in ontologies) to capture the content of multimedia, texts, services, and structured data so that it may be extracted, shared, synthesized and transformed. Semantic techniques foster the development emerging forms of computing, such as semantic Web, and entirely new forms, such as bio-inspired computing, as well as enhance traditional techniques of information retrieval, management of data (including multimedia and multimodal) and artificial intelligence (e.g., natural language processing machine learning, and computational intelligence), leading to more efficient and scalable information processing and higher-quality computer-human interaction.
[1] mid 80s to early 90s: So far yet so Near, Schematic and Semantic Similarities between Database Objects
[2] early 90s to about 1998: Semantic Information Brokering, InfoHarness, InfoQuilt, OBSERVER
Tuesday, September 4, 2007
SAWSDL becomes a W3C recommendation
Interested readers can find more details in the Internet Computing column I wrote with my former advisee and a key early technical contributor to our work in this area, Kunal Verma (Semantically Annotating a Web Service), by giving this tutorial at Semantic Technology conference (Using SAWSDL for Semantic Service Interoperability), or by playing with implementations and test suites.
So what's next? My own group has defined SA-REST (more on this soon), and Charles Petrie and I have started (with healthy dose of encouragement from Dieter Fensel) a W3C SWS Testbed Incubator Group to develop the awareness and further agreements on post-SAWSDL issues that we have to address before more developers will find Semantic Web Services ready for the prime time.
Thursday, May 17, 2007
Marrying Social Media with Semantic Media
A lot has been written about social networking and we can see a number of success stories around this phenomenon. Flickr exemplified taking social networking to media (photos in this case), and hence the emerging focus on social media (see The Flickrization of Yahoo). The power of users playing the roles of authors and editors is undeniable, but the lack of organization it creates is an antithesis to how Yahoo! organized the early web around a directory (that we can now replace by ontologies) and human cataloging (which we can to a good extent replace by automatic and ontology-assisted semi-automatic semantic annotation). Is it possible to get such organization back into naturally unconstrained and entropic collection of stuff (in this case media and associated human given tags without the supporting nomenclature)? I think it quite is, and the current approaches for and technologies behind the Semantic Web provide the most promising paths.
Indeed there is already an independent set of activities in developing semantic multimedia and semantic media. What can we attain by marrying semantic media with social media? Quite a lot. What we get is a multiplicative outcome that combines social media organized to make it much easier to search, integrate and exploit, with semantic media enhanced by the power of people. This will also make it much easier to integrate all shared information whether in text or in digital media of any format. It will be easier to serve up multimedia and multimodal applications. And it will unlock a lot of untapped potential with targeted advertisements and refined personalization that underlying semantic infrastructure could provide, significantly enhancing the business potential associated with shared media.
Here is just one enticing example. You are looking a photo of a church in Innsbruck. And you want to sell an advertisement that takes you on a landing page on Priceline that has a flight from US gateways to Munich. (I had a chance to checked out what types of Google ads are servied today in this context of Innsbruck, and it did not seem to me that anyone would likely close a sale, and especially a sale with a high value). What will tell you that the nearest airport with transatlantic flights is Munich, do not bother to try to sell a ticket to Innsbruck especially from a non European distination, and how to get the specific page on Priceline for North America to Munich flights? Context, semantic metadata, ontologies that have a load of real world and factual knowledge, and a set of rules.
Some of the lovers of information retrieval and dumb keyword search would tell you all this is too complex, not scalable or not maintainable. It is not. Contact me if you want to see a demonstration of how we are working to realize this promise.
Can semantics make Web Services more useful for businesses
There is a lot of discussion on what Web services can or cannot help business achieve [0]. Earlier simplistic views of Web services as silver bullet are being replaced by more temperate views, as exemplified by Andrew McAfee's recent article [1]. This posting is primarily focused on sharing some thoughts related to [1].
Prof. McAfee offers two insights: "Web services allow construction of modular and interchangeable building blocks software" is happening, but "how companies collaborate and compete" is not happening. He also observed that "the application-integration challenges that remain unaddressed by Web services are the really difficult ones and can only be overcome by the work of managers and leaders, not technologies and consortia".
Prof. McAfee is right on mark when he says "Web services, however, will not create this world," but I disagree when he says "nor will any technology on the horizon". My main argument is that if we add semantics to the mix, I think we will see more progress than what Prof. McAfee has seen so far. I will give a technical (computer scientist) view on these points that is validated by real-world deployments by the company I co-founded (Semagix), some collaborations in bioinformatics between the LSDIS lab and the biologists at UGA, and by some interactions with industry collaborators, mainly at IBM.
A different perspective inteoperability
Interoperability is the key to collaboration, integration and interoperability (between application and application, as well as human and application/system). It comes at four levels: infrastructure/system, syntax, structure/representation and semantics [2] [3]. I feel that this dimension is more meaningful when looking at technical issues compared to the three levels of transport, payload, and process used in [1].
XML and Web services deal with infrastructure, syntax and some representational issues. Concurrently, in several scientific and business domains (e.g., biology [OBO], health care [OpenClinical], risk & compliance [4], digital content applications on mobile networks and so on; also see- Why are we still pushing Semantic Web?), we are finding increasing success in developing and exploiting ontologies. These ontologies are the key enabler of semantic technologies, and embody human-agreement such that machines can interpret (mimic, reuse or in a rather limited form, "understand") this agreement to replace some of the human interactions (when such agreements exist). While I will spare the details, ontologies related semantic techniques and technologies (e.g., disambiguation with approximate/fuzzy matching, probabilistic relationships, etc.) enable semantic integration that can help deal with inconsistencies and other problems pointed out by Prof. McAfee. And while the difficulty in technological solution in Prof. McAfee's IBM case study was attributed to "Midrange systems were simply too complex", human bodies or biological description (e.g., pathways) are at least as complex, and yet it has been possible to develop very useful ontologies in these areas. Interestingly, parts of RosettaNet PIPs (the subject of the Case Study in [1]) has been represented as an ontology by us and others [RosettaNet ontology].
Using semantics to the mix
The need for painstaking work does not disappear when developing ontologies, but but many tools and techniques are available to build and maintain populated ontologies. Furthermore, ontologies defined in formal representation languages such as OWL are highly sharable and reusable form of knowledge representation. Ontologies, policies and rules provide a medium for capturing and reusing the knowledge and experience gained from prior integration efforts and negotiated agreements, thereby leading to greater level of automation at the semantic level,. This can enable companies such as IBM to reuse its experience for more efficient interactions and integration in the future.
To capture the breadth of issues necessary in modeling complex systems, our approach to semantics for Web services/processes consists of four components: functional (the capabilities of a business), data (how to talk to it), non-functional/QoS (policies, rules, ontologies capturing domain knowledge from previous experiences and business goals), and execution (issues of run time behavior, eg errors/exceptions) [5]. This gives a framework to deal with the complexity, capturing some of the semantic level inconsistencies, and dealing with run-time issues in application-application interactions (implemented as part of Web processes).
Commercial successes of Semantic Web Services/Processes are not evident because we are in a rather early technical and technological development process for this emerging technology. Nevertheless, evolutionary approaches to extend WSDL (the current standardard for Web service description) with semantics (e.g., WSDL-S) have been conceived, and I predict that we will be able to address the limitations identified by Prof. McAfee in next 3-5 years. I also expect good case studies from early adoption to be available within next 12 months.
[0] S. Staab, R. Benjamins, C. Bussler, D. Gannon, A. Sheth, W. van der Aalst. Web services: Been there, Done that? IEEE Intelligent Systems, Trends & Controversies, 18(1), Jan/Feb 2003.
[1] A. McAfee: Will Web Services Really Transform Collaboration http://sloanreview.mit.edu/smr/issue/2005/winter/16/
[2] A. Sheth: Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics, 1998
[3] A. Sheth: Semantic Meta Data For Enterprise Information Integration
[4] A. Sheth, Enterprise Applications of Semantic Web: The Sweet Spot of Risk and Compliance
[5] A. Sheth, Semantic Web Proicess Lifecycle: Role of Semantics in Annotation, Discovery, Composition and Orchestraction, 2003 Abstract, Presentation
Semantic Web: A different perspective on what works and what doesn't
Peter Norvig's view published on AlwaysOn seems to be colored by a decidedly web search engine perspective. If we start looking at Enterprise Semantic Applications (semantic applications developed for targeted enterprise/corporate/scientific/engineering user base, whether the data comes from the enterprises, or is a syndicated/licensed content, or is a open Web content), you can start to see some exciting alternative perspectives and realties. Let us review Peter's belief that the point that there is not enough RDF and SW content. This is growing at an extremely rapid pace (I am sure others will put out numbers; and there are specialized search engines such as Swoggle that focus on Semantic Web content with rapidly growing size of indexed documents: check ). More importantly, the promise of Semantic Web is closely tied to having the tools for semantic annotations of heterogeneous content, i.e., create semantic metadata automatically. This is much easier to do when you have high quality domain ontologies that bound the scope of automatic extraction. And I really do not see content suppliers putting the metadata in (as we do not see Web page authors using metatags), or at least this will be optional and just one form of input. Instead, metadata will be created with respect to potential use (e.g., there are some definite concepts when we deal with WorldNews, USNews, TechnologyNews, and so on). Commercial technologies (example) can process millions of pages per day and extract semantic metadata, and all these can be represented as RDF (and that is a good idea because of the benefits esp. for high end semantic applications such as analytics). Granted this is hard to do on a panWeb scale where you have no single domain or even a limited set of domains, and huge diversity of users who may need to see the content from different perspectives. Even here, I believe much can be done, but it will take a little more time - maybe 3 years.
Let me next respond to the comment about ontologies. There are many cases within Enterprises and even for consumer applications (e.g., see examples; also expect to see use of Ontologies soon by Amazon and this types of companies). Rather than focusing on common sense or general purpose ontologies, the immediate future is with domain and task/application specific ontologies (latter are appropriate for enterprises, eg., SOX or Anti-Money Laundering (AML) applications). This is done now successfully, with line of business applications (e.g., AML application deployed at one of the largest banks). These types of ontologies routinely have millions of instances (look at SWETO, NCI ontology, GlycO, as very different types of examples: think of SWETO with about a million instance as a poor sample of what is being done for real world Enterprise class Semantic Applications representing only a 10th of population and with diffused focus that a typical enterprise deployment; with 767 classes, think of GlycO has well-focused scientific ontology developed by domain experts; NCI ontology involves more community involvement). Additional thoughts on SW adoption are in my earlier item on this blog.
Postscript: Just before posting this, I stumbled across Danny Ayer's views in response to Peter Norvig's item; I agree with those view almost completely. After my initial posting, I came across another set of comments by Tim Finin.
Why are we still pushing Semantic Web?
This was the question a panelist asked at the W3C Advisory Committee meeting that I attending at the beginning of December 2004. In other words, the panelist and others discussing this question were wondering, why is it taking so long for the industry to get it (its importance)? Or that, by now, we would have expected it to have seen much wider adoption, a clear indication that the Semantic Web is here for good, transforming the Web into its next logical incarnation.
The essence of my comment at that time was that the rate of progress is quite robust and pervasive, and there are prominent signs that the Semantic Web is not just a fad, that this time, semantics as applied to information (which predates the Semantic Web as defined today) is indeed likely to affect many businesses in not-too-distant a future, and even common Web user in intermediate future. Here is an extended perspective on the adoption of the Semantic Web, which also incorporates a nice dinner discussion that some of the Semantic Web technology/product vendors (who are members of the W3C) had with the W3C Semantic Web team members (Eric Miller and others).
Research
Although funding from NSF, DARPA and the premier funding agencies have now waned, DAML program gave excellent and timely start to the Semantic Web research in the US. The funding initiative moved to Europe with Framework V, and is firmly entrenched with Framework VI. The number of new conferences, conference attendance, sessions related to the Semantic Web in older and more established conferences, number of published papers and new scientific journals devoted to the Semantic Web (such as Web Semantics, Semantic Web & Information Systems, and Applied Ontology) all point to broad and increasingly entrenched interest in this new area.
Standards
One of the nicest things that have happened to our area is timely standards activity. Note the emphasis on "timely", as it is helpful to have basic standards before the area matures and before industry interest peaks, reducing the chances of clashes between the entrenched interests. Not having activities being taken to competing standards bodies, as is the case in Web Services area, helps too.
Technology and Products
One of the most exciting things to have happened in our area is the number of technologies commercialized from academic research (Taalee's MediaAnywhere A/V Semantic Search and Semagix's Freedom from University of Georgia's SCORE technology, Network Inference's relationship with University of Manchester, Ontoprise's relationship with Karlsruhe, to name a few). Now, at least twenty vendors claim to use or support Semantic Web technologies, and the list is growing quite rapidly. And perhaps most importantly, scientific and business communities are building targeted (i.e., with clear purpose) and large ontologies at an impressive pace.
Industry Recognition
The informative panel at the W3C 10th anniversary celebrations (http://www.w3.org/2004/09/W3C10-Program.html) on the "Web of Meaning" illustrated how the thought leaders and industry executives buy into the vision of the Semantic Web. Panelists Tim O'Reilly (O'Reilly Media, talk) and Bill Ruh (Cisco Systems, talk) presented a fairly encouraging perspective on how Semantic (Web) technologies are needed for key applications, such as Regulatory Compliance, B2B Exchange, Workflow and BPM, and Business Intelligence. What is interesting is that some of these are "selling aspirin" rather than "selling vitamins", something that does better in low to moderate economic growth environments.
I would add several other fields of rapid adoption, including life sciences (see the W3c workshop in Semantic Web for Life Sciences), bioinformatics, healthcare, content management, national intelligence and homeland security. Just look at the number of large ontologies that cover the broad range of schema size, descriptionbases (instances) and expressiveness of representation, developed by community or a small number of domain experts, that are now being put to practical use. Some illustrious examples are NCI Cancer Ontology with over 17,000 concepts, or GlycO ontology for complex Carbohydrates with 767 Classes that is up to 11 levels deep and utilizes all expressive power of OWL, or ontologies with over 10 million instances developed for enterprise semantic applications using Semagix Freedom. Researchers interested in finding ontoloiges to play with can consider TAP or SWETO that are based on real-world facts, or get their hands on software to generate synthetically generated ontologies.
At the industry events, such as those organized by TopQuadrant and MITRE, or the user group initiated events, such as those for the US Department of Defense or the Life Science Community, 100 to 300+ people have shown up, which indicated fairly high level of industry and user group interest.
Industry Deployment and Early Successes
Since some very early deployment examples that were discussed at the WWW2004 Developer's day, there are now increasing number of examples of deployments both in Enterprises (e.g., see my KMWorld talk) or for more 'common' web users. It is this topic what garnered the main attention during our dinner discussions (mentioned above). One exciting observation that came up is the stealth inclusion of the Semantic Web technologies in applications. Eric Miller gave the example of Creative Common's use of RDF (also see Shelly Parker's earlier article). This is an example of simpler SW applications involving embedding license metadata and validating it so millions of content items would in essence be using at least limited Semantic Web technology for enforcing licenses! Another example is that of semantic annotation of syndicated contents and Web Services (e.g., the WSDL-S semantic proposal (early draft, currently being revised in an academic-industry partnership) and corresponding tools (e.g., MWSAF and ASSAM) for annotation of Web Services). Such applications can quickly lead to a wide spread and pervasive use of RDF in a fairly short time. What is interesting is that some of the applications are not being deployed by early adoptors; instead the SW technologies have been part of the pain killer types of main-stream IT applications and solutions (such as Anti-Money Laundering, compliance and risk management)! Enecdotal successes are starting to come. For example, a compliance related semantic application(implemented with a semantic technology platform from Semagix) is live at one of the largest banks in the world in the line of business. And I have heard of companies such as Amazon inserting ontologies in their main stream applications, so we can expect to see large scale consumer centric applications exploiting essential components of Semantic Web in the near future.
Final thoughts
One perspective that some in the community, particularly Tim Berners-Lee-TBL, seem to promote it that Semantic Web is "not interesting in the smaller scale". As more and more things connected by a "semantic way" it becomes more and more important. This makes sense from the perspective of global scale Web and non-enterprise applications. But from an industry perspective, I believe Semantic Web is equally interesting at the intra- and inter-Enterprise scales, and for Enterprise applications. This view is the same as the adoption and importance of Web technologies in Intranets. If at all, given the ability to constrain or limit the domain, deeper domain semantics can be put to use, agreements to build ontologies can be reached faster, industry specific metadata standards can be readily used, and facts and knowledge to populate ontologies can be obtained more easily. Today's enterprises have millions of documents, and access to massive amounts of high-quality or targeted syndicated contents and data (e.g., through Lexis-Nexis, ChoicePoint, NewsML and RSS News Feeds, and so on). The ontologies developed to support targeted enterprise scale Semantic Applications are currently exploiting ontologies with millions to tens of millions entity and relationship instances. And yes, the promise of scaling these Enterprise and industry scale islands by interconnecting them (and achieve what TBL called network effect) exists anyways.