Data Grid tutorials with hands-on experience (2008)
Donno, Flavia, Stockinger, Heinz, Puccinelli, Roberto, Stockinger, Kurt
Grid technologies are more and more used in scientific as well as in industrial environments but often documentation and the correct usage are either not sufficient or not too well understood....
Adelmann, Andreas, Gsell, Achim, Oswald, Benedikt, Schietinger, Thomas, Bethel, Wes, Shalf, John, ...
Significant problems facing all experimental and computationalsciences arise from growing data size and complexity. Common to allthese problems is the need to perform efficient data I/O on...
Performances of Multi-Level and Multi-Component Compressed Bitmap Indices (2007)
Wu, Kesheng, Stockinger, Kurt, Shoshani, Arie
This paper presents a systematic study of two large subsets of bitmap indexing methods that use multi-component and multi-level encodings. Earlier studies on bitmap indexes are either empirical or...
Efficient Analysis of Live and Historical Streaming Data and its Application to Cybersecurity (2007)
Reiss, Frederick, Stockinger, Kurt, Wu, Kesheng, Shoshani, Arie, Hellerstein, Joseph M.
Applications that query data streams in order to identify trends, patterns, or anomalies can often benefit from comparing the live stream data with archived historical stream data. However, searching...
Using Bitmap Indexing Technology for Combined Numerical and Text Queries (2006)
Stockinger, Kurt, Cieslewicz, John, Wu, Kesheng, Rotem, Doron, Shoshani, Arie
In this paper, we describe a strategy of using compressed bitmap indices to speed up queries on both numerical data and text documents. By using an efficient compression algorithm, these compressed...
Detecting Distributed Scans Using High-Performance Query-Driven Visualization (2006)
Stockinger, Kurt, Bethel, E. Wes, Campbell, Scott, Dart, Eli, Wu, Kesheng
Modern forensic analytics applications, like network traffic analysis, perform high-performance hypothesis testing, knowledge discovery and data mining on very large datasets. One essential strategy...
Accelerating Network Traffic Analytics Using Query-Driven Visualization (2006)
Bethel, E. Wes, Campbell, Scott, Dart, Eli, Stockinger, Kurt, Wu, Kesheng
Realizing operational analytics solutions where large and complex data must be analyzed in a time-critical fashion entails integrating many different types of technology. This paper focuses on an...
FastBit -- Helps Finding the Proverbial Needle in a Haystack (2006)
Wu, Kesheng "John", Stockinger, Kurt, Shoshani, Arie, Wes, Bethel
FastBit is a software package designed to meet the searching and filtering needs of data intensive sciences. In these applications, scientists are trying to find nuggets of information from petabytes...
Bethel, E. Wes, Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng
This work focuses on research and development activities that bridge a gap between fundamental data management technology index, query, storage and retrieval and use of such technology in...
High Performance Visualization using Query-Driven Visualization and Analytics (2006)
Bethel, E. Wes, Campbell, Scott, Dart, Eli, Shalf, John, Stockinger, Kurt, Wu, Kesheng
Query-driven visualization and analytics is a unique approach for high-performance visualization that offers new capabilities for knowledge discovery and hypothesis testing. The new capabilities akin...
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices (2006)
Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng, Bethel, Wes
Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices (2006)
Rotem, Doron, Stockinger, Kurt, Wu, Kesheng
Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work...
Stockinger, Kurt, Rotem, Doron, Shoshani, Arie, Wu, Kesheng
FastBit is an efficient, compressed bitmap indexing technology that was developed in our group. In this report we evaluate the performance of MySQL and FastBit for analyzing the email traffic of the...
HDF5-FastQuery: Accelerating Complex Queries on HDF Datasets Using Fast Bitmap Indices (2005)
Gosink, Luke, Shalf, John, Stockinger, Kurt, Wu, Kesheng, Bethel, Wes
Large scale scientific data is often stored in scientific data formats such as FITS, netCDF and HDF. These storage formats are of particular interest to the scientific user community since they...
Interactive Analysis of Large Network Data Collections Using Query-Driven Visualization (2005)
Bethel, E. Wes, Campbell, Scott, Dart, Eli, Lee, Jason, Smith, Steven A., Stockinger, Kurt, ...
Realizing operational analytics solutions where large and complex data must be analyzed in a time-critical fashion entails integrating many different types of technology. Considering the extreme...
Towards Optimal Multi-Dimensional Query Processing with Bitmap Indices (2005)
Rotem, Doron, Stockinger, Kurt, Wu, Kesheng
Bitmap indices have been widely used in scientific applications and commercial systems for processing complex, multi-dimensional queries where traditional tree-based indices would not work...
Network Traffic Analysis With Query Driven VisualizationSC 2005 HPC Analytics Results (2005)
Stockinger, Kurt, Wu, Kesheng, Campbell, Scott, Lau, Stephen, Fisk, Mike, Gavrilov, Eugene, ...
Our analytics challenge is is to identify, characterize, and visualize anomalous subsets of large collections of network connection data. We use a combination of HPC resources, advanced algorithms,...
Bitmap Indices for Fast End-User Physics Analysis in ROOT (2005)
Stockinger, Kurt, Wu, Kesheng, Brun, Rene, Canal, Philippe
Most physics analysis jobs involve multiple selection steps on the input data. These selection steps are called \it cuts or \it queries. A common strategy to implement these queries is to read all...
RRS: Replica Registration Service for Data Grids (2005)
Shoshani, Arie, Sim, Alex, Stockinger, Kurt
Over the last few years various scientific experiments and Grid projects have developed different catalogs for keeping track of their data files. Some projects use specialized file catalogs, others...
Optimizing Candidate Check Costs for Bitmap Indices (2005)
Rotem, Doron, Stockinger, Kurt, Wu, Kesheng
In this paper, we propose a new strategy for optimizing the placement of bin boundaries to minimize the cost of query evaluation using bitmap indices with binning. For attributes with a large number...
Stockinger, Kurt, Shalf, John, Bethel, Wes, Wu, Kesheng
We describe a new approach to scalable data analysis that enables scientists to manage the explosion in size and complexity of scientific data produced by experiments and simulations. Our approach...
Efficient binning for bitmap indices on high-cardinality attributes (2004)
Rotem, Doron, Stockinger, Kurt, Wu, Kesheng
Bitmap indexing is a common technique for indexing high-dimensional data in data warehouses and scientific applications. Though efficient for low-cardinality attributes, query processing can be...
Improved searching for spatial features in spatio-temporal data (2004)
Scientific data analysis often requires mining large databases or data warehouses to find features in space. One important task is to find regions of interest such as stellar objects in astrophysics...
Evaluation Strategies for Bitmap Indices with Binning (2004)
Stockinger, Kurt, Wu, Kesheng, Shoshani, Arie
Bitmap indices are efficient data structures for querying read-only data with low attribute cardinalities. To improve the efficiency of the bitmap indices on attributes with high cardinalities, we...
UK Grid Simulation with OptorSim (2003)
David G. Cameron, Ruben Carvajal-schiano, A. Paul Millar, Caitriana Nicholson, Kurt Stockinger
As the computational and data handling requirements of large scientific collaborations grow, Grid computing is rapidly emerging as a feasible solution to these requirements. Optimising the use of...
EUDataGridDataManagementServices (2003)
Diana Bosio, Akos Frohner, Leanne Guy, Peter Kunszt, Erwin Laure, Sophie Lemaitre, ...
this article are as follows:
Simulation of Dynamic Grid Replication Strategies in OptorSim (2003)
William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger
Computational Grids normally deal with large computationally intensive problems on small data sets. In contrast, Data Grids mostly deal with large computational problems that in turn require...
Next-Generation EU DataGrid Data Management Services (2003)
Diana Bosio, Akos Frohner, Leanne Guy, Peter Kunszt, Erwin Laure, Sophie Lemaitre, ...
this article) is now in its third and final year and within the data management work package we have developed a second generation of data management services that will be deployed in EDG release...
Next-Generation EU DataGrid Data Management Services (2003)
Bosio, Diana, Casey, James, Frohner, Akos, Guy, Leanne, Kunszt, Peter, Laure, Erwin, ...
We describe the architecture and initial implementation of the next-generation of Grid Data Management Middleware in the EU DataGrid (EDG) project. The new architecture stems out of our experience...
Evaluation of an Economy-Based File Replication Strategy for a Data Grid (2003)
William H. Bell, David G. Cameron, Ruben Carvajal-schiaffino, A. Paul Millar, Kurt Stockinger
Optimising the use of Grid resources is critical for users to effectively exploit a Data Grid. Data replication is considered a major technique for reducing data access cost to Grid jobs. This paper...
Strategies for Processing ad hoc Queries on Large Data Warehouses (2002)
Kurt Stockinger, Kesheng Wu, Arie Shoshani
As data warehousing applications grow in size, existing data organizations and access strategies, such as relational tables and B-tree indexes, are becoming increasingly ine#ective. The two primary...
Strategies for Processing ad hoc Queries on Large Data (2002)
Kurt Stockinger, Kesheng Wu, Arie Shoshani
As data warehousing applications grow in size, existing data organizations and access strategies, such as relational tables and B-tree indexes, are becoming increasingly ine#ective. The two primary...
Author(s) Luigi Capozza, Kurt Stockinger, Floriano Zini
One of the major problems in a Data Grid is the optimal distribution and replication of data files in the Grid sites, in order to improve and maintain over time a high overall throughput of Grid jobs...
Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschek, Adriana Iamnitchi, ...
of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location...
Giggle: A Framework for Constructing Scalable Replica Location Services (2002)
Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschek, Adriana Iamnitchi, ...
of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location...
Replica Management in Data Grids (2002)
Leanne Guy, Peter Kunszt, Erwin Laure, Heinz Stockinger, Kurt Stockinger
Providing fast, reliable and transparent access to data to all users within a community is one of the the most crucial functions of data management in a Grid environment. User communities are...
Giggle: A Framework for Constructing Scalable Replica Location Services (2002)
Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschek, Adriana Iamnitchi, ...
This paper makes the following contributions to our understanding of Data Grid systems and data replication: . We introduce the notion of a RLS as a distinct component and characterize RLS...
Simulation of Dynamic Grid Replication Strategies in OptorSim (2002)
William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger
Computational Grids normally deal with large computationally intensive problems on small data sets. In contrast, Data Grids mostly deal with large computational problems that in turn require...
Bitmap Indices for Speeding Up High-Dimensional Data Analysis (2002)
Bitmap indices have gained wide acceptance in data warehouse applications and are an effcient access method for querying large amounts of read-only data. The main trend in bitmap index research...
Models for Replica Synchronisation and Consistency in a Data Grid (2002)
Dirk Dullmann, Wolfgang Hoschek, Javier Jaen-martinez, Ben Segal, Asad Samar, Heinz Stockinger, ...
Data Grids are currently proposed solutions to large scale data management problems including efficient file transfer and replication. Large amounts of data and the world-wide distribution of data...
Towards an Economy-Based Optimisation of File Access (2002)
Mark Carman, Floriano Zini, Luciano Serafini, Kurt Stockinger
We are working on a system for the optimised access and replication of data on a Data Grid. Our approach is based on the use of an economic model that includes the actors and the resources in the...
Giggle: A Framework for Constructing Scalable Replica Location Services (2002)
Ian Foster, Adriana Iamnitchi, Matei Ripeanu, Ann Chervenak, Ewa Deelman, Carl Kesselman, ...
Within high-performance, large-scale wide area computing environments, data replication provides an important mechanism for managing data locality while increasing the reliability of access to...
Design and Implementation of Bitmap Indices for Scientific Data (2001)
Bitmap indices are efficient multi-dimensional index data structures for handling complex adhoc queries in read-mostly environments. They have been implemented in several commercial database systems...
Models for Replica Synchronisation and Consistency in a Data Grid (2001)
Dirk Dullmann, Wolfgang Hoschek, Javier Jaen-martinez, Ben Segal, Asad Samar, Heinz Stockinger, ...
Data Grids are currently proposed solutions to large scale data management problems including efficient file transfer and replication. Large amounts of data and the world-wide distribution of data...
Agent-Based Query Optimisation in a Grid Environment (2001)
Luciano Serafini, Heinz Stockinger, Kurt Stockinger, Floriano Zini
The next generation experiments in High Energy Physics are the driving force for setting up an International Data Grid at CERN, the European Organization for Nuclear Research. Hundreds of Petabytes...
Data Management in an International Data Grid Project (2000)
Wolfgang Hoschek, Javier Jaen-martinez, Asad Samar, Heinz Stockinger, Kurt Stockinger
. In this paper we report on preliminary work and architectural design carried out in the "Data Management" work package in the International Data Grid project. Our aim within a time scale of three...
Improving the Performance of High-Energy Physics Analysis through Bitmap Indices (2000)
Kurt Stockinger, Dirk Duellmann, Wolfgang Hoschek, Erich Schikuta
Bitmap indices are popular multi-dimensional data structures for accessing read-mostly data such as data warehouse (DW) applications, decision support systems (DSS) and on-line analytical processing...
ViMPIOS, a "Truly" Portable MPI-IO Implementation (2000)
Kurt Stockinger, Erich Schikuta
We present ViMPIOS, a novel MPI-IO implementation based on ViPIOS, the Vienna Parallel Input Output System. ViMPIOS inherits the defining characteristics of ViPIOS, which makes it a client-server...
Design and Analysis of Parallel Disk Accesses in ViPIOS (1999)
Kurt Stockinger, Erich Schikuta, Thomas Fuerle, Helmut Wanek
Due to the shift from CPU-bound to I/O bound problems the performance of the disk I/O accesses of parallel programs is a key factor for the success of solution approaches. The Vienna Parallel Input...
On the Implementation of a Portable, Client-Server Based MPI-IO Interface (1999)
Thomas Fuerle, Erich Schikuta, Christoph Loeffelhardt, Kurt Stockinger, Helmut Wanek
. In this paper we present the MPI-IO Interface kernel in the Vienna Parallel Input Output System (ViPIOS), which is a client-server based parallel I/O system. Compared to the already existing...
On the Implementation of a Portable, Client-Server Based MPI-IO Interface (1998)
Thomas Fuerle, Erich Schikuta, Christoph Loeffelhardt, Kurt Stockinger, Helmut Wanek
. In this paper we present the MPI-IO Interface kernel in the Vienna Parallel Input Output System (ViPIOS), which is a client-server based parallel I/O system. Compared to the already existing...