|
NCRR |
Bioinformatics for Glycan Expression
Integrated Technology Resource for Biomedical Glycomics: Technological Research and Development Project IV, A Project funded by NIH
|
Introduction
|
The goal of the Bioinformatics Project 4 of this glycomics resource is to
develop a suite of databases along with computational tools that facilitate
efficient acquisition, description, analysis, sharing and dissemination of
the data contained therein. This represents a major challenge, as the
potential of this data to explain important biological phenomena will only
be fully realized if it is examined in the context of the vast amounts of other
data that are becoming available. Therefore, a major emphasis will be placed
on data structures and tools that have a high degree of interoperability with
the computational infrastructure now being developed for the storage and analysis
of genomics and proteomics data. The specific aims of this research are
as follows.
Develop and implement efficient workflow tools for tracking physical
samples and for automating data collection, data verification, compression,
and storage. These will include tools for automatic identification of
glycan structures and/or glycan structural families from mass spectral data.
Build an integrated database termed GlycomBin that describes the
populations of specific glycan structures and structural families of glycopeptides
and glycolipids in different cell lines. For example, the database will
provide the foundation for a detailed, quantitative understanding of the ( N -
and O -linked) glycosylation patterns of specific glycoproteins. In
this context, it will include both structural and quantitative information
( i.e ., the identities and relative populations of the various glycan
structures and/or glycan structural families that are attached to each glycopeptide). Similar
information regarding the identities and populations of glycosphingolipids
will also be included, as will biochemical information obtained by diverse
techniques, such as the distribution of glycan epitopes in different cell lines.
Develop tools that facilitate interoperability of the databases with
existing proteomics tools that can be used, for example, to identify and quantitate
the expression level of each glycopeptide's parent protein and the expression
levels of the proteins involved in glycan biosynthesis. Also support open standards-based
access of GlycomBin and its interoperability with external databases. This
will include Web Services enabled access to data and computational resources.
Develop tools that facilitate the description, classification, and
clustering of glycopeptides and glycolipids, including ontology based semantic
descriptions of glycan structure, biosynthesis, and biological context.
Develop tools for semantic data analysis and discovery, including
tools for finding correlations between glycosylation patterns and patterns
of gene expression within a cell line or between different cell lines. These
will include a blended ontology-supported browsing and querying interface.
This approach will provide a highly flexible environment for the development
of distributed and semantic bioinformatics approaches for analysis of glycosylation
patterns and their biological relevance
|