Rishi Rakesh Sinha

Graduate Research Assistant
Room 2119B
Siebel Center for Computer Science
University of Illinois
at Urbana-Champaign


201 N. Goodwin
Urbana IL 61801, USA
(217) 244 3570
Fax: (217) 265 6494



About Me

Currently I am a PhD candidate working with Prof. Marianne Winslett in the Database and Information Systems lab, in the Department of Computer Science at University of Illinois, Urbana - Champaign.

CV (pdf)


Research Interest

Even as far back as 1619, Johannes Kepler  was using data painstakingly gathered by his mentor Tycho Brahe to advance science further through his famous `Three Laws of Planetary Motion.' Since then progresses in sensor, automation and computation technologies have enabled scientists to generate data at incredible rates. A not so desirable effect of this large amounts of data is that science is getting drowned in this sea of data.

My research focuses on developing data management technologies to enable scientists to do science more efficiently.

My current work can be subdivided into three main categories:

  • Format Agnostic Data Management: Scientists have developed strong affinity to specifically developed storage formats that they are very reluctant to shift from those formats. Yet the data management facilities associated with the storage APIs are pretty primitive lacking in most cases any sort of indexing, metadata management and concurrency control facilities and at best provide in file buffer and cache management. In order to allow scientists to concentrate on science we aim to provide scientists with a set of format independent, loosely coupled modules that can sit on top of any format (with a little bit of help from the scientist).
  • Indexing Schemes for Scientific Data: With the large amounts of data being generated due to advances in sensor, automation and computational technologies, looking for science is like looking for a needle in a haystack. What indexes provide is a pointer to a small haystack, where to look for the needle. While scientists still need to find out their needles with indexing support the we can divide the haystack into a set of smaller manageable haystacks and allow scientists to select appropriate haystacks. I am building on the bitmap index technology and extending it to handle the specific requirements of scientific data, namely high cardinality, low disk space availability and requirement for returning closed objects rather than points.
  • Efficient Storage for Bioinformatics Data: Traditionally bioinformatics data has been stored in ASCII files, offering greater ease of use. While this was acceptable when the amount of data was small, with the large amounts of data being produced in single resequencing experiments, viability of ASCII files in terms of performance has become a big problem today. In this project I am trying to explore the use of HDF5 in efficiently storing Gene Resequencing, Linkage Disequilibrium and HapMap data.

Recent Publications

2007

  • Maitri: Managing Large Scale Scientific Data. Rishi Rakesh Sinha, Arash Termehchy, Soumyadeb Mitra, Marianne Winslett, John Norris. Demo at CIDR 2007.pdf

2006

  • Maitri: Managing Large Scale Scientific Data. Rishi Rakesh Sinha, Arash Termehchy, Soumyadeb Mitra, Marianne Winslett. Poster paper at MWDBRS 2006. pdf, ppt
  • Multi-Resolution Bitmap Indexes. Rishi Rakesh Sinha, Marianne Winslett. Poster paper at MWDBRS 2006. pdf, ppt
  • Bitmap indexes for large scientific data sets: A case study. Rishi Rakesh Sinha, Soumyadeb Mitra, Marianne Winslett. IPDPS, 2006. pdf, ps

2005

  • Maitri: A Format independent Data Management System for Scientific Data. Rishi Rakesh Sinha, Soumyadeb Mitra, Marianne Winslett. SNAPI workshop at PACT, 2005. pdf, ps
  • An Efficient, Non Intrusive, Log Based I/O Mechanism for Scientific Simulations on Clusters. Soumyadeb Mitra, Rishi R Sinha, Marianne Winslett, Xiangmin Jiao, Cluster 2005 Boston. pdf, ps

2004

  • Context Based Entity Matching and Integration. Anhai Doan et. al., POSTER at MWDBRS 2004. ppt, pdf.
A more detailed list can be found here.


Selected Awards & Honors
Talks
Courses

  • CS 598RPE: Rapid Prototyping and Evaluation, Fall 2005
  • CS 511: Design of Database Systems. Spring 2005
  • CS 598DNR: Machine Learning in Natural Language Processing. Fall 2004
  • CS 423: Operating Systems, Fall 2003
  • CS 433: Computer Architecture, Fall 2003
  • CS 421: Introduction to Compilers and Programming Languages
  • CS 446: Machine Learning, Spring 2003
  • CS 598AD: Hot Topics in Data Integration, Spring 2003.
  • CS 598JH: Principles of Data Mining, Fall 2003
  • CS 473: Analysis of Algorithms, Fall 2003
  • A bunch of Independent studies and Seminars

Misc.

rsinha@uiuc.edu