TaxaMiner: An Experimental Framework for Automated Taxonomy Bootstrapping

TitleTaxaMiner: An Experimental Framework for Automated Taxonomy Bootstrapping
Publication TypeJournal Article
Year of Publication2005
AuthorsVipul Kashyap, Amit Sheth, Cartic Ramakrishnan, Christopher Thomas

Ontologies are a central component of the Semantic Web (SW) infrastructure. The design and construction of domain ontologies and taxonomies is a human intensive process which requires allocation of huge resources in terms of cost and time. For the SW to scale and become feasible, approaches that reduce human effort and resource commitments need to be investigated urgently. Towards this end, we present a framework for automated taxonomy construction based on a large corpus of documents, a first step towards large scale, automated ontology construction. Our approach involves: (a) generation of a document cluster hierarchy; (b) extraction of a topic hierarchy from this cluster hierarchy; and (c) assignment of labels to nodes in the topic hierarchy. We draw upon a suite of clustering and NLP techniques and identify parameters which form the basis of an experimentation framework. We also propose metrics to measure quality of the resulting topic hierarchy and evaluate the impact of various parameters on these quality metrics. The MEDLINE&#174 database is used as the document corpus and the MeSH thesaurus as the gold standard. Insights from these experiments are presented and discussed.

Full Text

Vipul Kashyap,Cartic Ramakrishnan, Christopher Thomas, and Amit Sheth, 'TaxaMiner: An Experimental Framework for Automated Taxonomy Bootstrapping,'International Journal of Web and Grid Services, 'Semantic Web and Mining Reasoning' special issue, 1 (no.2), 2005, pp. 240-266.
pages: pp. 240 - 266
publisher: International Journal of Web and Grid Services
year: 2005
hasEditor: Xiaohua Hu (Ed.)
related resource url:
hasBookTitle: Special Issue: Semantic Web and Mining Reasoning

Related Files: