Cloud Computing Collaboratory

We are entering a world where data is pumping into our homes and businesses at an astonishing pace and computing is undergoing a sequence of revolutions. Clouding computing is emerging as a powerful new computing revolution, which many predict well not only reshape business, society and culture in a profound way, but also provide an electrifying impact on the way how we do research and education in science and engineering. It is the most recent evolution of distributed and scalable computing that uses internet-based ("cloud") computing. As described in Wikipedia, 'It is a style of computing in which IT-related capabilities are provided "as a service," allowing users to access technology-enabled services from the Internet ("in the cloud") without knowledge of, expertise with, or control over the technology infrastructure that supports them." It incorporates a number of recent Web-based computing trends such as software as a service and Web2.0, and is quickly replacing cluster and grid computing that has been the preferred method of meeting needs for high end computing such as what is needed for many scientific computing applications.

A team of faculty members in the Computer Science & Engineering (CSE) department have set up a shared infrastructure for use by our students.*

Current 18 node installation uses open-source software, such as hadoop. One very powerful server with 128GB main memory and 5TB disk is also available for comparing cloud computing with traditional computing options. The students will use it to study the novel techniques for solving extremely large scale computing problems, as well as the systems issues for enabling cloud computing such as dynamic network layer protocols for advanced cloud computing, text mining from large corpus such as Wikipedia, senor data fusion and situational awareness applications, resource management, trust, security, privacy, and so on.

Following courses plan to use this shared infrastructure.

CS 271 - Introduction to BioinformaticsProf. Michael Raymer
This course introduces students to the application of computational techniques to problems in the life sciences. Utilizing the cloud infrastructure we will implement a local service for searching the human genome for specific sequence patterns. This will allow laboratory assignments in which student from both computer science and the life sciences explore the human genome in its entirety. This learning experience would not be possible without a cloud infrastructure due to the memory limitations of most modern PCs and workstations.
CEG498 Team Projects (Senior Design Experience)Prof. John Gallagher
This is a summative computer engineering practicum that requires students to characterize problems, then specify, implement, and test engineered computational solutions. Many student projects include significant components of verification in simulation. Project scope is often limited by lack of sufficient computational resources to complete simulation runs within the 20 week (2 quarter) duration of the class. The availability of cloud computing would increase the size of problems students could attempt and better prepare them to address real world problems immediately upon graduation.
CS490/690: Cloud Computing Prof. Keke Chen
The course will introduce the basic concepts of cloud computing, teach how to use the available cloud computing techniques, and study the advanced research topics in cloud computing. Specifically, it will cover the following topics: map-reduce programming, distributed file systems, hardware virtualization, reliability issues, security and privacy issues, and AJAX - the techniques for the interactive front-end of clouds.
CEG436/636 Mobile ComputingProf. Yong Pei
Increasingly, people, computers and microelectronic devices are being linked together to bring to life the communications mantra: anybody, anything, anytime, anywhere. This junior/senior/graduate course helps engineering and computer science students establish a solid foundation in concepts, architecture, design, and performance evaluation of mobile computing principle, protocols and applications.
CS 471 - Algorithms for BioinformaticsProf. Michael Raymer
This capstone course pairs senior undergraduate students from the life science disciplines with computer science and engineering students. These interdisciplinary student groups conduct cutting edge research projects under the guidance of the course instructors. Utilizing the proposed cloud infrastructure will enable this research to be conducted on a scale that is currently impossible. For example, entire genomes can be compared between multiple species to look for patterns and relationship. Currently, these projects are limited to a single genome or even a single chromosome. Broadening the scale of the research will enable more ambitious projects.
CS 475/675 Web-based Information Systems (was CS 499/699)Prof. Amit Sheth
This course covers advanced topics in managing Web-based resources, with a focus on building applications involving heterogeneous data. It exposes students to topics, techniques, and technologies of data, metadata, information, knowledge, and ontologies including the semantic aspects of data, data architectures, Web search and information integration, Web standards, Web 2.0, Semantic Web, and Web 3.0. Cloud computing is one of the preferred implementation environments for many of these services and capabilities and will be invaluable for class projects.
CS 499 Independent StudiesProf. T. K. Prasad
This undergraduate course covers foundations of information retrieval, to enable design, analysis and implementation of IR systems. It also includes topics of contemporary interest related to search engine design and architectures for large scale computations, such as map-reduce architecture, cloud computing, Lucene, etc.
CEG 702 Advanced Computer NetworksProf. Bin Wang
This is a graduate level course on advanced computer communication and networking technologies. The course involves both a reading/lecture/discussion component and a project component. We will read papers on various aspects of advanced computer networking: LAN/WAN technologies, trust, security and privacy, congestion/flow control, self-similar traffic analysis, queuing theory, link scheduling, routing, internetworking, multicast, wireless technologies, quality of services, and peer-to-peer networks. Various technical and research issues involved will be studied in depth.
CS766 Evolutionary ComputationProf. Mateen Rizki
The focus of this course is the application of soft computing techniques including genetic algorithms, genetic programming, particle swarm algorithms and ant colony algorithms to large scale, high dimensional search and optimization problems. Soft computing techniques involve computationally complex, realistic assignments that clearly demonstrate the advantage of soft computing techniques over traditional search techniques. Problems typically have to be scaled down to match the available computational resources. Cloud computing would allow students to solve complex multi-modal problems typical of the type found in real-world applications in a more scalable environment.
CS 771 Natural Language Processing TechniquesProf. Shaojun Wang
This course introduces state-of-the-art statistical techniques for automatic analysis of natural (human) language data to help students have an in-depth understanding of both algorithms available for processing linguistic information as well as the underlying computational properties of natural languages. It provides students first-hand opportunity on using this shared infrastructure to handle web-scale corpus (trillion tokens) under map-reduce framework.
CS 790 Data-Intensive Scalable ComputingProf. Shaojun Wang
This course teaches students how to use this shared infrastructure to run Hadoop, an open source Java mplementation of Google's map-reduce framework. It will become a nexus for cloud computing at WSU and connect a variety of academic departments with large-data problems, such as computer science, mechanical and material engineering, biomedical engineering, biochemistry and molecular biology, and environmental sciences, and will bring students with strong competencies in Java and C++ programming in these departments together to learn how to explore large-data issues.
CEG 802 Emerging NetworksProf. Bin Wang
This is a graduate level course on emerging networking technologies. The course involves a reading/lecture/presentation/discussion component, paper review component, and a project component. It will provide an in-depth study on a number of focused areas: dense wavelength division multiplexing optical networks, optical burst switching networks, peer-to-peer networks, cloud computing, and wireless mobile networks (including Ad-hoc wireless networks, sensor networks). Various technical and research issues involved will be studied. These areas of emerging networking technologies will play central roles in future communication networks.
CS707 Information RetrievalProf. T. K. Prasad
This graduate level course covers models for information retrieval, techniques for indexing and searching, algorithms for classification and clustering, latent semantic indexing, link analysis and ranking. The programming assignments in this course require students to build and evaluate text search engines from scratch, to better appreciate low-level indexing issues, and to use mature search engine APIs such as Lucene to develop higher-level applications. The availability of Cloud Computing and software infrastructure such as Hadoop will significantly improve this course by enabling students to work with realistic datasets and practical large scale problems.
CS 765 Foundations of NeurocomputingProf. John Gallagher
This course covers a broad range of neural-network systems and their application to practical problems. Due to limited computational capacity available on desktop computers, only the smallest of toy problems can actually be attempted by students in the class. The availability of cloud computing will dramatically increase the size and scope of student projects and better prepare them to attack real world problems immediately upon graduation.
CEG 730: Distributed Computing PrinciplesProf. Prabhaker Mateti
This course explores principles of concurrency and in particular send/receive based distributed computing. It includes two lectures on cluster computing. The practical part of the course has three projects that relates to cloud computing: one each on RPC, RMI, and Hadoop.
CS 790: Optimizing Compiler for Modern ArchitectureProf. Meilin Liu
This course studies compiler optimization for modern architectures, including multi-core computer architecture, parallel computers, clusters. Between parsing the input program and generating the target machine code, optimizing compilers perform a wide range of program analysis and transformations on a program to achieve program optimization and parallelization. The course will cover cluster computing fundamentals, and how to transform the programs to run in parallel on a cluster. This course will also cover some key concepts behind Mapreduce, which is closely related to "cloud computing", including the scheduling of data-local jobs, and the use of distributed storage. The students will use the infrastructure to implement program parallelization, to do research in the area of high performance computing through a Mapreduce framework.
CS 790: Services ComputingProf. Amit Sheth
Services Science encompasses numerous areas relating to the increasing role of services in the world economy. In this course we will focus primarily on services computing, or the technical aspects of services science, and secondarily on the allied economic, business, and organizational aspects. Cloud computing is a powerful new paradigm that uses as well as supports services computing principles, and will both be studied as well as used in this course.
CEG 830: Distributed Computing SystemsProf. Prabhaker Mateti
This course focuses on various distributed systems and languages. Specific details change as we discuss the latest which includes cloud computing.

* PI: Prof. Amit Sheth; Coordinator: Prof. Keke Chen; Project Faculty Team: All faculty members listed above.