Mahdieh Zabihimayvan

Dept. of Computer Science & Engineering, Wright State University


I am a PhD student of computer science at Wright State University, and am working under supervision of Dr. Derek Doran as a graduate research assistant in the Web and Complex System Lab (WacS), a part of Kno.e.sis Research Center. My research interests are machine learning, anomaly detection, soft computing, and data characterization.
Through several Web projects during undergraduate internship and thesis, I became interested in Web networks and systems. Continuing my education as a master students in software engineering, I was teaching assistant of data mining, advanced engineering mathematics in software engineering, computer performance evaluation and modeling courses for M.S. students and algorithm design course for B.S. students, which made me more passionate in a new aspect of Web systems. Focusing on anomaly detection techniques in Web traffic, my master thesis, A Proposed Algorithm Based on Markov Clustering for Web Robot Detection, was a new soft computing method using data mining techniques to distinguish human Web traffic from Web robot traffic on Web servers.

Selected Projects

Web traffic characterization

Understanding the qualities of Web robot traffic is essential to build mechanisms for mitigating the impact of their traffic on Web systems. This project presents an updated characterization of the navigational and session patterns of Web robot traffic across three Web servers in the United States, Europe, and the Middle East under 30 different features. The results indicate that some features may be fitted to the same heavy-tailed model across the Web servers, but the best fitting models for other features depend on the Web server. Due to some different tasks of Web robots and security policies set by website administrators, there are thus some features of Web robot traffic that cannot be universally modeled. The paper titled “Some (Non-)Universal Features of Web Robot Traffic” which presents the report of this project has been accepted at 52th Annual Conference on Information Sciences and Systems (CISS).

Feature selection based on Fuzzy Rough Set theory

Despite emerging of Web 2.0 applications and increasing requirements to well-behaved Web robots, malicious ones can reveal irreparable risks for Web sites. Regardless of behavior of Web robots, they may occupy bandwidth and reduce performance of Web servers. In spite of many prestigious researches trying to characterize Web visitors and classify them, there is a lack of concentration on feature selection to dynamically choose attributes used to describe Web sessions. Therefore, in this research, a new algorithm is proposed based on Fuzzy Rough Set (RST) theory to better characterize and cluster Web visitors of three real Web sites. RST describes how a collection of data may be separated based on a decision boundary and an indiscernibility relation (Pawlak, 1982).The report of this research is resulted in “A soft computing approach for benign and malicious web robot detection” (Source code) published in Expert Systems with Applications journals (Impact factor: 3.928).

Comparing Neural Network and DBSCAN in clustering Web users

Today world’s dependence on the Internet and the emerging of Web 2.0 applications significantly increased the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of studies, there is no accurate method for classifying huge data sets of web visitors in a reasonable amount of time. Moreover, this technique should be insensitive to the ordering of instances and produce deterministic accurate results. Therefore, in this research we present a density-based clustering approach using Density-Based Spatial Clustering of Applications with Noises (DBSCAN), to classify web visitors of two real large data sets and compare its efficiency with the performance of Self-Organizing Map as a Neural Network technique. For more information about the results of this research, please refer to “Detection of Web site visitors based on fuzzy rough sets” published in Soft Computing (Impact factor: 2.472).

Using Markov Clustering for Web robot detection

The MCL algorithm (Van Dongen, 2001) is a powerful method to cluster data points by simulating stochastic flows over an input graph. MCL has seen success in a variety of domains, such as social network analysis, knowledge base enrichment, community detection, and bioinformatics. The MCL algorithm is specified by an M(k×k) column stochastic matrix, representing probabilities of transitions within a complete graph on k nodes. Nodes of this graph correspond to a data point (i.e. a web session) while transition probabilities (specified by the matrix element mij) specify the strength of a relationship or the degree of similarity among them. MCL finds a clustering of nodes in the graph by transforming M with iterative applications of three operations, namely expand, inflate, and prune, until a convergence criterion is reached.

Defining new navigational features to describe web crawlers

To describe web visitors, we suggest new features based on the behavioral patterns recorded in access log files of web servers. For instance, Penalty is a numerical attribute proposed based on the navigational patterns of humans which involve a large number of frequent back-and-forward movements and loops. Having a view restricted by the structure of links of a site to find the required information, “back” and “forward” option in web browser’s history and disorienting the humans during their visits are some reasons that cause such navigational patterns. While after the first crawl of a site, robots can detect where the required information resides and restrict their next requests to specific areas of that site. Penalty attribute penalizes each back-and-forward navigation or loop, and it is reasonable to expect a larger value for this attribute among human users than web robots. If interested, please refer to our paper, “A density based clustering approach for web robot detection”, published in International Conference on Computer and Knowledge Engineering (ICCKE), IEEE, (Accept Rate:22%).

Using Fuzzy Inference System based on decision trees for Web robot classification

Web administrators should pay special attention and closely inspect web sessions that correspond to web robots; because the traffic of these autonomous systems occupies the bandwidth, reduces the performance of web servers and in some cases, threaten the security of human users. In this research, we propose a novel fuzzy algorithm based on the decision trees. In order to overcome the curse of dimensionality issue and facilitate the designing of the fuzzy inference system, we use a correlation analysis to eliminate some features. For converting each filtered attribute to a fuzzy variable, a C4.5 decision tree is used. It is worth mentioning that making a decision tree is based on choosing the best feature with the most information gain metric in each level of the tree. Therefore, we can reduce the number of attributes again. Finally, the fuzzy rules are extracted from the C4.5 decision tree and the fuzzy inference model is made.

Selected Publications

Honors and Awards

  • Awarded a fully paid travel to attend the CRA-W Grad Cohort Workshop in San Francisco, CA, April 2018
  • Selected as Top researcher of M.Sc. students of Software Engineering, 2016
  • Best paper award for publishing the paper titled "Detection of Web site visitors based on fuzzy rough sets", 2016
  • Ranked 2nd place among all M.Sc. students of Software Engineering, 2014
  • Ranked Top 1% among over 300,000 participants in a nationwide universities entrance exam for undergraduate education, 2006

Research service

Copyright © Mahdieh Zabihimayvan