Rishi Rakesh Sinha
Graduate Research Assistant Room
2119B Siebel Center for Computer Science University of
Illinois at Urbana-Champaign
201 N. Goodwin Urbana
IL 61801, USA (217) 244 3570 Fax: (217) 265
6494 |
|
About Me
Currently I am a PhD candidate working with Prof.
Marianne Winslett in the Database
and Information Systems lab, in the Department of Computer Science at University of Illinois, Urbana -
Champaign.
CV (pdf)
Research Interest
Even as far back as 1619, Johannes Kepler was
using data painstakingly gathered by his mentor Tycho Brahe to advance
science further through his famous `Three Laws of Planetary Motion.'
Since then progresses in sensor, automation and computation technologies
have enabled scientists to generate data at incredible rates. A not so
desirable effect of this large amounts of data is that science is
getting drowned in this sea of data.
My research focuses on developing data management technologies to enable
scientists to do science more efficiently.
My current work can be subdivided into three main categories:
- Format Agnostic Data Management:
Scientists have developed strong affinity to specifically developed
storage formats that they are very reluctant to shift from those
formats. Yet the data management facilities associated with the storage
APIs are pretty primitive lacking in most cases any sort of indexing,
metadata management and concurrency control facilities and at best
provide in file buffer and cache management. In order to allow
scientists to concentrate on science we aim to provide scientists with a
set of format independent, loosely coupled modules that can sit on top
of any format (with a little bit of help from the scientist).
- Indexing Schemes for Scientific Data:
With the large amounts of data being generated due to advances in
sensor, automation and computational technologies, looking for science
is like looking for a needle in a haystack. What indexes provide is a
pointer to a small haystack, where to look for the needle. While
scientists still need to find out their needles with indexing support
the we can divide the haystack into a set of smaller manageable
haystacks and allow scientists to select appropriate haystacks. I am
building on the bitmap index technology and extending it to handle the
specific requirements of scientific data, namely high cardinality, low
disk space availability and requirement for returning closed objects
rather than points.
- Efficient Storage for Bioinformatics Data:
Traditionally bioinformatics data has been stored in ASCII files,
offering greater ease of use. While this was acceptable when the amount
of data was small, with the large amounts of data being produced in
single resequencing experiments, viability of ASCII files in terms of
performance has become a big problem today. In this project I am trying
to explore the use of HDF5 in efficiently storing Gene Resequencing,
Linkage Disequilibrium and HapMap data.
Recent Publications
2007
- Maitri: Managing Large Scale Scientific Data. Rishi Rakesh Sinha,
Arash Termehchy, Soumyadeb Mitra, Marianne Winslett, John Norris. Demo
at CIDR 2007.pdf
2006
- Maitri: Managing Large Scale Scientific Data. Rishi Rakesh Sinha,
Arash Termehchy, Soumyadeb Mitra, Marianne Winslett. Poster paper at
MWDBRS 2006. pdf,
ppt
- Multi-Resolution Bitmap Indexes. Rishi Rakesh Sinha, Marianne
Winslett. Poster paper at MWDBRS 2006. pdf,
ppt
- Bitmap indexes for large scientific data sets: A case study. Rishi
Rakesh Sinha, Soumyadeb Mitra, Marianne Winslett. IPDPS, 2006. pdf,
ps
2005
- Maitri: A Format independent Data Management System for Scientific
Data. Rishi Rakesh Sinha, Soumyadeb Mitra, Marianne Winslett. SNAPI
workshop at PACT, 2005. pdf,
ps
- An Efficient, Non Intrusive, Log Based I/O Mechanism for Scientific
Simulations on Clusters. Soumyadeb Mitra, Rishi R Sinha, Marianne
Winslett, Xiangmin Jiao, Cluster 2005 Boston. pdf,
ps
2004
- Context Based Entity Matching and Integration. Anhai Doan et. al.,
POSTER at MWDBRS 2004. ppt, pdf.
A more detailed list can be found here.
Selected Awards &
Honors
Talks
Courses
- CS 598RPE: Rapid Prototyping and Evaluation, Fall 2005
- CS 511: Design of Database Systems. Spring 2005
- CS 598DNR: Machine Learning in Natural Language Processing. Fall
2004
- CS 423: Operating Systems, Fall 2003
- CS 433: Computer Architecture, Fall 2003
- CS 421: Introduction to Compilers and Programming Languages
- CS 446: Machine Learning, Spring 2003
- CS 598AD: Hot Topics in Data Integration, Spring 2003.
- CS 598JH: Principles of Data Mining, Fall 2003
- CS 473: Analysis of Algorithms, Fall 2003
- A bunch of Independent studies and Seminars
Misc.
|