Trent UniversityResearch Interests

Key words:

  • data mining and data analysis,
  • astroinformatics,
  • data visualization,
  • computational astrophysics,
  • high-performance computing,
  • distributed systems.

During the past decade, data mining has gained a solid foothold in a variety of application domains. Currently, the amount of data, available in an increasing range of formats, is growing at an unprecedented speed. Furthermore, both data and computational resources are increasingly distributed over geographical locations. Most current data-mining techniques assume data to be homogeneous and contained in a centralized location, which opens a wide range of research opportunities to address the above issues.

My short-term research goals include, but are not limited to, the application of data-mining techniques to open research questions in astronomy (Notes for ADASS 2007 Tutorial on Data Mining in Astronomy , Additional Handout). The extension of the distributed clustering scheme developed for my thesis, which enables scalable, efficient and privacy-preserving data mining for geographically-distributed data, will target additional clustering techniques and the problem of clustering tasks in sensor networks. The application of data-mining techniques to open research problems in astronomy, such as a multi-wavelength galaxy classification scheme, includes clustering and matrix decomposition techniques, for example to detect three-dimensional structures in data obtained from the Canadian Galactic Plane Survey.

Long-term research goals include development of new data-analysis and data-mining techniques capable to deal with special issues associated with data in certain application domains. For example, astronomical data obtained from multiple surveys are measured at varying resolutions, may contain noisy data and multiple measurements, require techniques for cross-identification of objects detected in various surveys, and contain measurements influenced by measurement errors and selection effects. These characteristics correspond directly to characteristics of data obtained in other application domains including health care, economics, bioinformatics, security, and sensor network, therefore opening numerous opportunities for research and collaboration.

I am also interested in visualization techniques for data, both in a distributed setting and for high-dimensional data, the parallelization and optimization of existing data-mining techniques, the development and optimization of astrophysical simulation and modeling code, and the integration of data-mining approaches into the Virtual Observatory.

I don't believe in mathematics Albert Einstein