You are here

Data Intensive Analysis and Computing (DIAC) Lab

Do you want to lose weight fast but do not know how? Are you tired of big belly? To lose weight quickly you need to follow the rules how to lose weight fast and how to lose weight fast for women.
Eat less harmful products, get exercise, then to not ask yourself how to lose weight fast for men, try all sorts of fast diets, including detox diet. Love your body and do not overeat to be thin.

Recent advances in computing, communication, and digital storage technologies have enabled incredible volumes of data to be accessible remotely across geographical and administrative boundaries. There is an increasing demand on summarizing, understanding, monitoring, learning, and collaboratively mining from large, evolving, and possibly private data stores. In DIAC lab, we study the research problems and applications related to such large datasets.

Faculty:

Keke Chen, Amit Sheth

Students:

Fengguang Tian, Shumin Guo, Jim Powers, Huiqi Xu, Zhen Li, Gaurish Anand

Projects:

  • Data Analytics with the Cloud

    Data clouds, consisting of hundreds or thousands of cheap multi-core PCs and disks, are available for rent in low cost (e.g., Amazon EC2 and S3 services). Many cloud-based applications generate large amount of data in the cloud, which needs to be processed with cloud-based data analytics tools. Powered with the distributed file system, e.g., hadoop distributed file system, and MapReduce programming model, the cloud becomes an economical and scalable platform for performing large-scale data analytics. We study the visual cluster exploration framework (CloudVista) for analyzing the large data hosted in the cloud and the cost model for resource-aware cloud computing.

  • Clustering Large/Streaming Numerical/Categorical Data

    Large datasets are also characterized by high complexity and uncertainty. Clustering is an effective tool for understanding this complexity and uncertainty. In DIAC lab, we investigate novel techniques that combine visual analytics and statistical analysis to help better understanding the clustering patterns in large datasets. In particular, we are interested in visually exploring and validating clustering patterns in large multi-dimensional datasets (VISTA, iVIBRATE), finding the optimal number of clusters in categorical ACE and BestK) and transactional datasets(Weight CoverageDensity and DMDI), and monitoring the change of clusteringpatterns in categorical data streams (CatStream).

  • Privacy Preserving Computing, Trustworthy Computing

    When large datasets are shared crossing boundaries, privacy and trust have become the major concerns. In DIAC we study the privacy issues in distributed data intensivecomputing, in particular, privacy preserving OLAP and mining on outsourced data, and privacy preserving multiparty collaborative data mining.We have proposed the geometric data perturbation (GDP), which can be usedto fully preserve data utility in terms of classification and clustering modeling, whileproviding satisfactory privacy guarantee. The GDP method can also be applied to privacy preserving multipartycollaborative mining (Multiparty GDP).Recent developments have been focused on the theoretical study on the family of geometric perturbation methods and its application on privacy preserving OLAP on outsourced data, and privacy and trust in social networks.

  • Web Science: Ranking and Adaptation

    For large-scale complicated learning problems, it is very expensive to collect sufficient amount of labeled training data. Learning to rank in web search is one of such problems. There are multiple ways to extend training dataset, such as leveraging largeamount of unlabeled data (i.e., semi-supervised learning), or searching over the large amount of unlabeled data to find the most effective candidate examples forlabeling (i.e., active learning). In learning to rank, we study some novel strategies to enhance the training data. Concretely, we develop newalgorithms to utilize pairwise preference training data mined from implicit userfeedback (GBRank), to adapt the model trained with small amountof labeled data to the pairwise preference data ( ClickAdapt), and to adapt aranking function trained on one search domain to another (Tree Adaptation or Trada). Recent developmentsinclude the understanding of effectiveness of Tree Adaptation for Ranking and tree adaptation methods for pairwise data.

Publications

© 2012 Kno.e.sis | 377 Joshi Research Center, 3640 Colonel Glenn Highway, Dayton, OH 45435 | (937 - 775 - 5217)