Mahdieh Zabihimayvan

Dept. of Computer Science & Engineering, Wright State University


A PhD student of Computer Science at Wright State University, I am working under supervision of Dr. Derek Doran as a Graduate Research Assistant in the Web and Complex System Lab (WacS), a part of Kno.e.sis Research Center. My research interests are Soft Computing, Data Mining, Machine Learning, Web Mining, and Data Characterization. Indeed, I am interested in using Data Mining and Soft Computing techniques to analyze the Web traffic of Web servers and enhance their quality in providing services for Web visitors.
Doing several Web projects during undergraduate internship and thesis, I became interested in Web networks and systems. Continuing my education as a master students in Software Engineering, I was teaching assistant of Data Mining, Advanced Engineering Mathematics in Software Engineering, Computer Performance Evaluation and Modeling courses for M.S. students and Algorithm Design course for B.S. students, which made me more passionate in a new aspect of Web systems. My master thesis, A Proposed Algorithm Based on Markov Clustering for Web Robot Detection, was a new Soft Computing method using Data Mining techniques to improve Web server performance by better detection of Web robots.

Selected Projects

Web Traffic characterization (Current project)

In this project, we try to characterize the traffic created by web visitors of real web servers. First, we need some Data Mining techniques as pre-processing steps to prepare the data. Then, we do session identification and feature extraction which are main foundations of this project. Since the real traffic of a web server can be notably huge, we use multiprocessing computations using clusters for this step. Finally, we statistically analyze the features to discover hidden patterns and characteristics existing in this data.

Feature Selection based on Fuzzy Rough Set theory

Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. Therefore, in this research, a new algorithm is proposed based on Fuzzy Rough Set (RST) theory to better characterize and cluster Web visitors of three real Web sites. RST describes how a collection of data may be separated based on a decision boundary and an indiscernibility relation (Pawlak, 1982).The report of this research is resulted in “A soft computing approach for benign and malicious web robot detection” (Source code) published in Expert Systems with Applications journals (Impact factor: 3.928)

Comparing Neural Network and DBSCAN in Web users clustering

Today world’s dependence on the Internet and the emerging of Web 2.0 applications significantly increased the requirement of web robots crawlingthe sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of studies, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, in this research we present a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets and compare its efficiency with the performance of Self-Organizing Map as a Neural Network technique. For more information about the results of this research, please refer to “Detection of Web site visitors based on fuzzy rough sets” published in Soft Computing (Impact factor: 2.472).

Using Markov Clustering for Web robot detection

The MCL algorithm (Van Dongen, 2001) is a powerful method to cluster data points by simulating stochastic flows over an input graph. MCL has seen success in a variety of domains, such as social network analysis, knowledge base enrichment, community detection, and bioinformatics. The MCL algorithm is specified by an M(k×k) column stochastic matrix, representing probabilities of transitions within a complete graph on k nodes. Nodes of this graph correspond to a data point (i.e. a web session) while transition probabilities (specified by the matrix element mij) specify the strength of a relationship or the degree of similarity among them. MCL finds a clustering of nodes in the graph by transforming M with iterative applications of three operations, namely expand, inflate, and prune, until a convergence criterion is reached.

Defining new navigational features to describe web crawlers

To describe web visitors, we suggest new features based on the behavioral patterns recorded in access log files of web servers. For instance, Penalty is a numerical attribute proposed based on the navigational patterns of humans which involve a large number of frequent back-and-forward movements and loops. Having a view restricted by the structure of links of a site to find the required information, “back” and “forward” option in web browser’s history and disorienting the humans during their visits are some reasons that cause such navigational patterns. While after the first crawl of a site, robots can detect where the required information resides and restrict their next requests to specific areas of that site. Penalty attribute penalizes each back-andforward navigation or loop, and it is reasonable to expect a larger value for this attribute among human users than web robots. If interested, please refer to our paper, “A density based clustering approach for web robot detection”, published in International Conference on Computer and Knowledge Engineering (ICCKE), IEEE, (Accept Rate:22%)

Using Fuzzy Inference System based on Decision Trees for web robot classification

Web administrators should pay special attention and closely inspect web sessions that correspond to web robots; because the traffic of these autonomous systems occupies the bandwidth, reduces the performance of web servers and in some cases, threaten the security of human users. In this research, we propose a novel fuzzy algorithm based on the decision trees. In order to overcome the curse of dimensionality issue and facilitate the designing of the fuzzy inference system, we use a correlation analysis to eliminate some features. For converting each filtered attribute to a fuzzy variable, a C4.5 decision tree is used. It is worth mentioning that making a decision tree is based on choosing the best feature with the most information gain metric in each level of the tree. Therefore, we can reduce the number of attributes again. Finally, the fuzzy rules are extracted from the C4.5 decision tree and the fuzzy inference model is made.

Selected Publications

Honors and Awards

  • Referee: Computers & Security (Elsevier, Impact Factor: 2.84)
  • Selected as Top researcher of M.Sc. students of Software Engineering, 2016
  • Ranked 2nd place among all M.Sc. students of Software Engineering, 2014
  • Ranked Top 1% among over 300,000 participants in a nationwide universities entrance exam for undergraduate education, 2006

Copyright © Mahdieh Zabihimayvan