om muruga
(Time is God and In god I trust)
Arun S. Jagatheesan
|
Research interests
- Internet and Grid Computing (Distributed Computing)
- Data Grid Management Systems (Collaborative data storage management)
- Information Infrastructure Networks (Infrastructure for XL-size information management)
- Workflow and automation for data storage management
Affiliations/Membership Roles:
- Dataflow specialist (Data Grids), San Diego Supercomputer Center, University of California, San Diego
- Co-Chair, Grid File Systems Working Group, Global Grid Forum
- Lead, SDSC Matrix Project
- (Also the important role of being just another human being who wants to add value to things around me ;-)
LUSciD coLLaboration: LUSciD collaboration is a joint collaborative effort between Lawrence Livermore National Laboratory (LLNL), University of California and San Diego Supercomputer Center.
The objective of the coLLaboration is to apply advanced scientific data management technologies to improve the conduct of large scale science.
Most of the these requirements could be categorized as what i call as "rebels and misfits of existing technology".
If we analyze the history of computer science (especially data storage) there may be few accidental discoveries.
But, most of the new technologies are a result of some requirements that were trying to push the limits of an existing technology.
These esoteric requirements were usually from high-end users who wanted "more" in terms of performance, throughput or scalability.
The requirements of these high-end users could not be satisfied by the existing technology.
From the existing technology’s perspective, these users could be considered as "rebels or misfits of technology" who were pushing the envelope too much and are trying to overthrow the existing technology.
Related Links: LLNL, SDSC SRB, What is a Grid?, History of the Grid |
LSST Project: The Large Synoptic Survey Telescope (LSST) is a proposed ground-based 8.4-meter, 10 square-degree-field telescope that will provide digital imaging of faint astronomical objects across the entire sky, night after night. The IT part here is to manage Petabytes of data. Nature's Editorial cites this project to be "Steering the future of Computing". We might be storing and managing over 150 Petabytes of digital data. Our work here is two fold (Hardware infrastructure and Software infrastructure). SDSC is the data access center for LSST providing hardware infrastructure (we just worked on estimating its cost). We are also involved in middleware for data management, working on the software for LSST. I started to work on this project voluntarily, based on the request from LLNL. I now take part in the data management and architecture discussions (usually from a data grid perspective). A good group of people to work with. Related Links: LSST in BBC, LSST in Forbes, LSST in Nature, Data Challenge, NCSA, |
LLNL GDO: This is not a real big project now. But, i hope it will turn out to be something big or useful at LLNL. The task here is to design a data management architecture for data sharing between LLNL and its external partners. Apart from the IT perspective, we need to make sure a pragmatic approach is designed so that users can use the resources without exploiting them. There are several security and access control restrictions that need to be met. All data in GDO are considered to be public or UNCLASSIFIED (so i can mention this project here). I collaborate with Jeff Long from LLNL on this work. Related Links: LLNL GDO, SDSC SRB |
SDSC Matrix Project: SDSC Matrix is a Grid Workflow process management system. Matrix provides the protocols and software infrastructure needed by Inter-organizational data management services to create, access and manage grid workflow pipelines. Matrix uses the Data Grid Language, which can be used to describe, query and control process-flow pipelines. Matrix provides the software mechanisms to define and execute long lived datagrid administrative tasks. While data grids like SRB provide logical namespaces to manage unstructured inter-organizational data, Matrix provides mechanisms to provide mappings from logical namespace to process namespace and vice-versa. (e.g) an insert or delete on a logical namespace could trigger a process that has to be operated in the datagrid. Related Links: SDSC Matrix Project Page |
Data Grid Management System (DGMS): Each organization in a data grid needs a system composed of services that will enable it to dynamically form or join communities and coordinate the management of inter/intra-organizational data and resources. DGMS is a P2P middleware that provides a logical view of inter/intra-organizational data and resources to its applications. The key difference between the DBMS and DGMS is that whereas the physical organization of resources (storage) is hidden in DBMS, it is provided as another logical view to the applications in DGMS along with data. Grid Applications can use the logical layer of distributed data as they do in DBMS (with out being worried about the physical location of data). In addition they can use the logical view of the shared, distributed and heterogeneous resources in the grid environment. The basic operations (like simple query plan or ingestion of data) in the data grid can use this logical view resources (storage) to decide on which shared physical resources to use. More protocols and more challenges lie ahead for DGMS. Related Link: SDSC SRB |
Grid File System (GGF-WG): A standard mechanism to describe and organize file-based data is essential for facilitating access to this large amount of data. The GGF Grid File System Working Group (GFS-WG) will provide specifications of Grid File System Directory Services and Architecture of Grid File System Services. The GFS standards might serve as the common denominator for different datagrid systems. GFS is a collaborative effort along with Storage Networking Industry Association (SNIA) |
My Interests & Activities
|
|
|