Skip navigation
Brigham Young University
Department of Electrical & Computer Engineering

Koy Rehme's Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

If you have institutional or personal access to the ACM Digital Library, IEEE Xplore, and/or SpringerLink, the DOI links will give you the official versions of papers.

(hide abstracts)

Exposing Parallelism and Locality in a Runtime Parallel Optimization Framework [abstract] (DOI, PDF)
David A. Penry, Daniel J. Richins, Tyler S. Harris, David Greenland, and Koy D. Rehme
Proceedings of the 2010 ACM International Conference on Computing Frontiers (CF), May 2010.

The widespread use of tens to hundreds of processor cores in commodity systems will require widespread deployment of parallel applications. Despite advances in parallel programming models, it seems unlikely that the average programmer will be able to negotiate the twin shoals of understanding how to map parallelism well on a particular architecture and the likelihood that the particular architecture will not even be known at development time. Furthermore, for many important applications, a good mapping depends upon data or application characteristics not known until runtime.

Runtime parallel optimization has been suggested as a means to overcome these difficulties. For runtime parallel optimization to be effective, parallelism and locality which are expressed in the programming model need to be communicated to the runtime system. We suggest that the compiler should expose this information to the runtime using a representation which is independent of the programming model. We term such a representation an exposed parallelism and locality (EPL) representation. An EPL representation allows a single runtime environment to support many different models and architectures and to perform automatic parallelization optimization.

In order to accomplish these goals, an EPL representation needs to be task-based, multi-relational, hierarchical, and concise. This paper describes these four properties. It also presents an optimizing runtime, ADOPAR, which uses an EPL representation.

An Internal Representation for Adaptive Online Parallelization [abstract] (PDF)
Koy D. Rehme
Masters Thesis, Department of Electrical and Computer Engineering, Brigham Young University, August 2009.

Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. ADOPAR is a unifying framework that attempts to overcome diversity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved.

This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner.

The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers.

SPRI: Simulator Partitioning Research Infrastructure [abstract] (PDF)
Zhuo Ruan, Koy Rehme, and David A. Penry
Proceedings of the 3rd Workshop on Architectural Research Prototyping (WARP), June 2008.

Using FPGAs as architectural simulation accelerators has been widely discussed in the computer architecture design community. We previously proposed a hybrid SW/HW simulation infrastructure named SPRI (Simulator Partitioning Research Infrastructure) which automatically partitions the general timing model into the software and hardware portions for simulation speedup, conforming to the set-based partitioning specification. The SPRI platform takes two main inputs—partitioning specification and the architectural model; it then produces a modified SW architectural binary and a HW-accelerated RTL description which can communicate with each other, called hybrid SW/HW co-simulator—the final output of SPRI. Various experiment cases have been also run through the SPRI infrastructure to test its partitioning functionality and API wrapper generation.

An Infrastructure for HW/SW Partitioning and Synthesis of Architectural Simulators [abstract] (PDF)
David A. Penry, Zhuo Ruan, and Koy Rehme
Proceedings of the 2nd Workshop on Architectural Research Prototyping (WARP), June 2007.

Many researchers are interested in using FPGAs to accelerate architectural simulation. Partitioning of the simulator between hardware and software is an important problem which has not been explored because of the enormous effort required to develop different RTL and communication infrastructure for each potential partition. We are developing a hybrid HW/SW simulation infrastructure which will provide tools for partitioning architectural simulators and synthesizing RTL for the hardware portions. This infrastructure will allow the community to explore and understand the partitioning problem and will eventually lead to automated partitioning algorithms.