Plenary Lecture

On Measuring Distance Between Distributions of Biological and Chemical Data

Professor Sung-Hyuk Cha
Seidenberg School of Computer Science and Information Systems
Pace University
USA
E-mail: scha@pace.edu

Abstract: Similarity or distance measure between two distributions plays important role in many problems in biology, chemistry, clustering, pattern recognition, statistics, and machine learning areas. The traditional minimum cost flow problem has been utilized as a distance measure between two distributions (transportation problem) such as the earth mover' distance (EMD). If the distributions have b number of bins or taxa, the cost matrix is b × b square matrix. While generic algorithms such as Simplex method or Hungarian method to compute the EMD take too long for users to wait for the output, efficient algorithms are known for several special histogram types such as nominal Θ(b), ordinal Θ(b), modulo O(b2). In this presentation, a variety of special classes of cost matrices such as star, linear, tree and ring cost matrices are formally defined and generalized and their respective efficient algorithms to compute the EMD shall be presented. A pendant arc elimination algorithm shall be also demonstrated to compute the EMD with a phylogenetic network in Θ(b). Algorithms to test whether a given cost matrix belongs to one of special topological classes of cost matrices will be also described. Main objective of this presentation pertains to reducing the computational complexity of EMD by analyzing topological patterns in cost matrices of various biological and chemical data.

Brief Biography of the Speaker: Dr. S.-H. Cha received his Ph.D. in Computer Science from State University of New York at Buffalo in 2001 and B.S. and M.S. degrees in Computer Science from Rutgers, the State University of New Jersey in 1994 and 1996, respectively. From 1996 to 1998, he was working in the area of medical information systems such as PACS, teleradiology, and telemedicine at Information Technology R&D Center, Samsung SDS. During his PhD years, he was affiliated with the Center of Excellence for Document Analysis and Recognition (CEDAR). Major contribution made at CEDAR includes dichotomy model to establish the individuality of handwriting, distance measures on histograms and strings, a nearest neighbor search algorithm, apriori algorithm, etc. supervised by Prof. Sargur N. Srihari. He has been a faculty member of Computer Science department at Pace University since 2001. His main interests include computer vision, data mining, pattern matching & recognition. He is a member of AAAI, IEEE and its Computer Society, and naun. He has over one hundred publications in highly rated ISI journals and conference proceedings with an h-index of 19.

Bulletin Board

Currently:

The conference program is online.

The Conference Guide is online.

The paper submission deadline has expired. Please choose a future conference to submit your paper.


WSEAS Main Site


NAUN Main Site

Publication Ethics and Malpractice Statement