BioBench: A Benchmark Suite of Bioinformatics Applications (2005)
Albayraktaroglu, Kursad, Jaleel, Aamer, Wu, Xue, Franklin, Manoj, Jacob, Bruce, Tseng, Chau-Wen, ...
Recent advances in bioinformatics and the significant increase in computational power available to researchers have made it possible to make better use of the vast amounts of genetic data that has...
BioBench: A Benchmark Suite of Bioinformatics Applications (2005)
Albayraktaroglu, Kursad, Jaleel, Aamer, Wu, Xue, Franklin, Manoj, Jacob, Bruce, Tseng, Chau-Wen, ...
Recent advances in bioinformatics and the significant increase in computational power available to researchers have made it possible to make better use of the vast amounts of genetic data that has...
ESTmapper: Eciently Clustering EST Sequences Using Genome Maps (2004)
Xue Wu, Damayanti Gupta, Chau-wen Tseng
Expressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs...
ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps (2004)
Wu, Xue, Lee, Woei-Jyh (Adam), Gupta, Damayanti, Tseng, Chau-Wen
Expressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs...
ESTmapper: Efficiently Clustering EST Sequences Using Genome Maps (2004)
Wu, Xue, Lee, Woei-Jyh (Adam), Gupta, Damayanti, Tseng, Chau-Wen
Expressed sequence tags (ESTs) are short transcribed nucleotide sequences that can be used to discover new genes and measuring gene expression. Because individual ESTs are short and error-prone, ESTs...
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng, Scott Warren
Fortran D and High Performance Fortran are languages designed to support efficient data-parallel programming on a variety of parallel architectures. The goal of the D Editor is to provide a tool that...
Preliminary Experiences with the (2003)
Fortran D is a version of Fortran enhanced with data decomposition specifications. Case studies illustrate strengths and weaknesses of the prototype Fortran D comprier when compiling hnear algebra...
Evaluating the Impact of Memory System Performance on (2003)
Aneesh Aggarwal, Chau-wen Tseng, Donald Yeung
Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. Using the SimpleScalar simulator, we evaluate the impact of memory bandwidth...
Dorit Naishlos, Joseph Nuzman, Chau-wen Tseng
Explicit-multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. XMT introduces a computational framework with 1) a simple programming style that relies on...
Evaluating the XMT Parallel Programming Model (2001)
Dorit Naishlos, Joseph Nuzman, Chau-wen Tseng, Uzi Vishkin
Explicit-multithreading (XMT) is a parallel programming model designed for exploiting on-chip parallelism. Its features include a simple thread execution model and an efficient prefix-sum instruction...
Aneesh Aggarwal, Donald Yeung, Chau-wen Tseng
Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we evaluate the impact of memory trends on the effectiveness...
Compiler Optimizations for Eliminating Cache Conflict Misses (2001)
Gabriel Rivera, Chau-wen Tseng
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor...
Improving Locality For Adaptive Irregular Scientific Codes (2001)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Enhancing Software DSM for Compiler-Parallelized Applications (2001)
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications.
Efficient Machine-Independent Programming of High-Performance Multiprocessors (2001)
rmance, mainly because the cost of interprocessor communication is too great compared to computation and local memory accesses [74, 77]. To achieve high performance, COSMIC will perform...
A Comparison of Parallelization Techniques for Irregular Reductions (2001)
A large class of scientific applications are comprised of irregular reductions on large data sets. On shared-memory multiprocessors these reductions are typically parallelized by computing partial...
Evaluating the XMT Parallel Programming Model (2001)
Dorit Naishlos, Joseph Nuzman, Chau-wen Tseng
. Explicit-multithreading (XMT) is a parallel programming model designed for exploiting on-chip parallelism. Its features include a simple thread execution model and an efficient prefix-sum...
Dorit Naishlos, Joseph Nuzman, Chau-wen Tseng
Explicit-multithreading (XMT) is a parallel programming approach for exploiting on-chip parallelism. As such, XMT introduces a computational framework with 1) a simple programming style that relies...
Locality Optimizations For Adaptive Irregular Scientific Codes (2001)
Irregular scientific codes experience poor cache performance due to their memory access patterns. We examine several data and computation locality transformations including GPART, a new technique...
Aneesh Aggarwal, Chau-wen Tseng, Donald Yeung
Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. In this paper, we evaluate the impact of memory trends on the effectiveness...
Tiling Optimizations for 3D Scientific Computations (2001)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show iterative solvers for partial differential equations (PDEs) in three dimensions...
Locality Optimizations For Adaptive Irregular Scientific Codes (2000)
Irregular scientific codes experience poor cache performance due to their memory access patterns. We examine several data and computation locality transformations including GPART, a new technique...
A Comparison of Locality Transformations for Irregular Codes (2000)
. Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes. We experimentally compare their performance and present gpart, a new...
Tiling Optimizations for 3D Scientific Computations (2000)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show iterative solvers for partial differential equations (PDEs) in three dimensions...
Evaluating Locality Optimizations For Adaptive Irregular Scientific Codes (2000)
Irregular scientific codes experience poor cache performance due to their memory access patterns. Researchers have proposed several data and computation transformations to improve locality in...
1. Overview My research is in the field of software support for high-performance computing. My goal is to develop compilation techniques which enable programs to efficiently exploit architectural...
Aneesh Aggarwal, Chau-wen Tseng, Donald Yeung
Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. Using the SimpleScalar simulator, we evaluate the impact of memory bandwidth...
Software Support For Improving Locality in Advanced Scientific Codes (2000)
Programs can achieve good performance only if they possess data locality, This paper describes our proposal to develop and evaluate software support for improving locality for advanced scientific...
Tiling Optimizations for 3D Scientific Computations (2000)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality for many scientific programs. In this paper, we show iterative solvers for partial differential equations (PDEs) in three dimensions...
Improving Locality For Adaptive Irregular Scientific Codes (2000)
Irregular scientific codes experience poor cache performance due to their memory access patterns. In this paper, we examine three issues for locality optimizations for irregular computations. First,...
Aggarwal, Aneesh, Badawy, Abdel-Hameed A., Tseng, Chau-Wen, Yeung, Donald
Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. Using the SimpleScalar simulator, we evaluate the impact of memory bandwidth...
Aggarwal, Aneesh, Badawy, Abdel-Hameed A., Tseng, Chau-Wen, Yeung, Donald
Software prefetching and locality optimizations are techniques for overcoming the gap between processor and memory speeds. Using the SimpleScalar simulator, we evaluate the impact of memory bandwidth...
Compile-time Synchronization Optimizations for Software DSMs (2000)
Software distributed-shared-memory (DSM) systems provide a desirable target for parallelizing compilers due to their flexibility. However, studies show synchronization and load imbalance are...
Software Support For Improving Locality in Advanced Scientific Codes (2000)
Scientists today rely on powerful computers to perform simulations critical for research and development. Modern microprocessors provide high performance by exploiting data locality with carefully...
Software Support For Improving Locality in Advanced Scientific Codes (2000)
Scientists today rely on powerful computers to perform simulations critical for research and development. Modern microprocessors provide high performance by exploiting data locality with carefully...
A Comparison of Locality Transformations for Irregular Codes (2000)
Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes. We experimentally compare their performance and present GPART, a new...
Efficient Compiler and Run-Time Support for Parallel Irregular Reductions (2000)
Many scientific applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using...
Software Support For Improving Locality in Scientific Codes (1999)
Hwansoo Han, Gabriel Rivera, Chau-wen Tseng
We propose to develop and evaluate software support for improving locality for advanced scientific applications. We will investigate compiler and run-time techniques needed to achieve high...
Compiler and Run-time Support for Improving Locality in Scientific Codes (extended ) (1999)
Hwansoo Han, Gabriel Rivera, Chau-wen Tseng
Introduction Modern microprocessors provide high performance by exploiting data locality with carefully designed multi-level caches. However, advanced scientific computations have features such as...
Improving Locality For Adaptive Irregular Scientific Codes (1999)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Improving Locality For Adaptive Irregular Scientific Codes (1999)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Improving Locality for Adaptive Irregular Scientific Codes (1999)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Improving Locality For Adaptive Irregular Scientific Codes (1999)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Improving Locality For Adaptive Irregular Scientific Codes (1999)
An important class of scientific codes access memory in an irregular manner. Because irregular access patterns reduce temporal and spatial locality, they tend to underutilize caches, resulting in...
Locality Optimizations for Multi-Level Caches (1999)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality of scientific programs. In this paper, we examine the impact of multi-level caches on data locality optimizations. We find nearly all...
Locality Optimizations for Multi-Level Caches (1999)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality of scientific programs. In this paper, we examine the impact of multi-level caches on data locality optimizations. We find nearly all...
Locality Optimizations for Multi-Level Caches (1999)
Gabriel Rivera, Chau-wen Tseng
Compiler transformations can significantly improve data locality of scientific programs. In this paper, we examine the impact of multi-level caches on data locality optimizations. We find nearly all...
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes (1999)
. Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buffers...
A Comparison of Compiler Tiling Algorithms (1999)
Gabriel Rivera, Chau-wen Tseng
. Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding conflict misses in low associativity...
A Comparison of Compiler Tiling Algorithms (1999)
Gabriel Rivera, Chau-wen Tseng
. Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding conflict misses in low associativity...
A Comparison of Compiler Tiling Algorithms (1999)
Gabriel Rivera, Chau-wen Tseng
. Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding conflict misses in low associativity...
Compiler Optimizations for Eliminating Cache Conflict Misses (1998)
Gabriel Rivera, Chau-wen Tseng
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor...
Enhancing Software DSM for Compiler-Parallelized Applications (1998)
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Efficient Machine-Independent Programming of High-Performance Multiprocessors (1998)
mainly because the cost of interprocessor communication is too great compared to computation and local memory accesses [74, 77]. To achieve high performance, COSMIC will perform communicationanalysis...
Enhancing Software DSM for Compiler-Parallelized Applications (1998)
Current parallelizing compilers for message-passingmachines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs (1998)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Improving the Compiler/Software DSM Interface: Preliminary Results (1998)
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Enhancing Software DSM for Compiler-Parallelized Applications (1998)
Current parallelizing compilers for message-passingmachines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Compiling Fortran D for MIMD Distributed-Memory Machines (1998)
Seema Hiranandani, Ken Kennedy, Chau-wen Tseng
Fortran D, a version of Fortran extended with data decomposition specifications, is designed to provide a machine-independent data-parallel programming model. This paper describes analysis,...
Improving the Compiler/Software DSM Interface: Preliminary Results (1998)
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Reducing Synchronization Overhead for Compiler-Parallelized Codes on Software DSMs (1998)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Reducing Synchronization Overhead for Compiler-Parallelized Codes on Software DSMs (1998)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs (1998)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Improving Compiler and Run-Time Support for Irregular Reductions (1998)
Compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems or relying on the sharedmemory interface supported by...
Improving Compiler and Run-Time Support for Adaptive Irregular Codes (1998)
Irregular reductions form the core of adaptive irregular codes. On distributed-memory multiprocessors, they are parallelized either using sophisticated run-time systems (e.g., CHAOS, PILAR) or the...
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs (1998)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Improving Compiler and Run-Time Support for Adaptive Irregular Codes (1998)
Irregular reductions form the core of adaptive irregular codes. On distributed-memory multiprocessors, they are parallelized either using sophisticated run-time systems (e.g., CHAOS, PILAR) or the...
Compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems or relying on the sharedmemory interface supported by...
Hwansoo Han, Chau-wen Tseng, Pete Keleher
this paper we investigate a number of compiler techniques for reducing synchronization overhead and load imbalance. Our techniques are evaluated in a prototype compiler/runtime system [5] using the...
Hwansoo Han, Chau-wen Tseng, Pete Keleher
this paper we investigate a number of compiler techniques for reducing synchronization overhead and load imbalance. Our techniques are evaluated in a prototype compiler/runtime system [5] using the...
Hwansoo Han, Chau-wen Tseng, Pete Keleher
this paper we investigate a number of compiler techniques for reducing synchronization overhead and load imbalance. Our techniques are evaluated in a prototype compiler/runtime system [5] using the...
Eliminating Conflict Misses for High Performance Architectures (1998)
Gabriel Rivera, Chau-wen Tseng
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter- and intra-variable padding, can eliminate many conflict...
Eliminating Conflict Misses for High Performance Architectures (1998)
Gabriel Rivera, Chau-wen Tseng
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter- and intra-variable padding, can eliminate many conflict...
Eliminating Conflict Misses for High Performance Architectures (1998)
Gabriel Rivera, Chau-wen Tseng
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. Two data-layout transformations, inter- and intra-variable padding, can eliminate many conflict...
Data Transformations for Eliminating Conflict Misses (1998)
Gabriel Rivera, Chau-wen Tseng
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. We examine two compile-time data-layout transformations for eliminating conflict misses,...
Data Transformations for Eliminating Conflict Misses (1998)
Gabriel Rivera, Chau-wen Tseng
Many cache misses in scientific programs are due to conflicts caused by limited set associativity. We examine two compile-time data-layout transformations for eliminating conflict misses,...
Compile-time Synchronization Optimizations for Software DSMs (1998)
Software distributed-shared-memory (DSM) systems provide a desirable target for parallelizing compilers due to their flexibility. However, studies show synchronization and load imbalance are...
Compile-time Synchronization Optimizations for Software DSMs (1998)
Software distributed-shared-memory (DSM) systems provide a desirable target for parallelizing compilers due to their flexibility. However, studies show synchronization and load imbalance are...
Compiler Optimizations for High Performance Architectures (1997)
Hwansoo Han, Gabriel Rivera, Chau-wen Tseng
We describe two ongoing compiler projects for high performance architectures at the University of Maryland being developed using the Stanford SUIF compiler infrastructure. First, we are investigating...
Compiler Optimizations for Eliminating Cache Conflict Misses (1997)
Gabriel Rivera, Chau-wen Tseng
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor...
Compiler Optimizations for High Performance Architectures (1997)
Hwansoo Han, Gabriel Rivera, Chau-wen Tseng
We describe two ongoing compiler projects for high performance architectures at the University of Maryland being developed using the Stanford SUIF compiler infrastructure. First, we are investigating...
Compiler Optimizations for High Performance Architectures (1997)
Hwansoo Han, Gabriel Rivera, Chau-wen Tseng
We describe two ongoing compiler projects for high performance architectures at the University of Maryland being developed using the Stanford SUIF compiler infrastructure. First, we are investigating...
Compiler Optimizations for Eliminating Cache Conflict Misses (1997)
Gabriel Rivera, Chau-wen Tseng
Limited set-associativity in hardware caches can cause conflict misses when multiple data items map to the same cache locations. Conflict misses have been found to be a significant source of poor...
Improving the Compiler/Software DSM Interface: Preliminary Results (1997)
Current parallelizing compilers for message-passing machines only support a limited class of data-parallel applications. One method for eliminating this restriction is to combine powerful...
Reducing Synchronization Overhead for Compiler-Parallelized Codes on Software DSMs (1997)
Hwansoo Han, Chau-wen Tseng, Pete Keleher
Software distributed-shared-memory (DSM) systems provide an appealing target for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can provide performance...
Data Layout Optimizations for High-Performance Architectures (1997)
padding, transposing, and reindexing array dimensions, and modifying heap al