Skip navigation
Brigham Young University
Department of Electrical & Computer Engineering

BARDD Publications

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright hold- ers. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

(hide abstracts)

An Internal Representation for Adaptive Online Parallelization [abstract] (PDF)
Koy D. Rehme
Masters Thesis, Department of Electrical and Computer Engineering, Brigham Young University, August 2009.

Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. ADOPAR is a unifying framework that attempts to overcome diversity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved.

This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner.

The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers.

Issues in Hybrid Simulator Synthesis [abstract] (PDF)
Zhuo Ruan, Koy Rehme, and David A. Penry
Proceedings of the 4th Workshop on Architectural Research Prototyping (WARP), June 2009.

The Simulator Partitioning Research Infrastructure (SPRI) is a project to automate the generation of hybrid architectural simulators. In this paper, we examine the interesting issues and challenges in hybrid simulator synthesis.

Multicore Diversity: A Software Developer's Nightmare [abstract] (PDF)
David A. Penry
ACM SIGOPS Operating Systems Review (OSR), April 2009.

Commodity microprocessors with tens to hundreds of processor cores will require the widespread deployment of parallel programs. This deployment will be hindered by the architectural and environmental diversity introduced by multicore processors. To overcome diversity, the operating system must change its interactions with the program runtime and parallel runtime systems must be developed that can automatically adapt programs to the architecture and usage environment.

SPRI: Simulator Partitioning Research Infrastructure [abstract] (PDF)
Zhuo Ruan, Koy Rehme, and David A. Penry
Proceedings of the 3rd Workshop on Architectural Research Prototyping (WARP), June 2008.

Using FPGAs as architectural simulation accelerators has been widely discussed in the computer architecture design community. We previously proposed a hybrid SW/HW simulation infrastructure named SPRI (Simulator Partitioning Research Infrastructure) which automatically partitions the general timing model into the software and hardware portions for simulation speedup, conforming to the set-based partitioning specification. The SPRI platform takes two main inputs—partitioning specification and the architectural model; it then produces a modified SW architectural binary and a HW-accelerated RTL description which can communicate with each other, called hybrid SW/HW co-simulator—the final output of SPRI. Various experiment cases have been also run through the SPRI infrastructure to test its partitioning functionality and API wrapper generation.

An Infrastructure for HW/SW Partitioning and Synthesis of Architectural Simulators [abstract] (PDF)
David A. Penry, Zhuo Ruan, and Koy Rehme
Proceedings of the 2nd Workshop on Architectural Research Prototyping (WARP), June 2007.

Many researchers are interested in using FPGAs to accelerate architectural simulation. Partitioning of the simulator between hardware and software is an important problem which has not been explored because of the enormous effort required to develop different RTL and communication infrastructure for each potential partition. We are developing a hybrid HW/SW simulation infrastructure which will provide tools for partitioning architectural simulators and synthesizing RTL for the hardware portions. This infrastructure will allow the community to explore and understand the partitioning problem and will eventually lead to automated partitioning algorithms.

You Can't Parallelize Just Once: Managing Manycore Diversity [abstract] (PDF)
David A. Penry
Position paper for the Workshop on Manycore Computing at ICS'07, June 2007.

One of the greatest challenges for the use of manycore architectures will be the growing diversity of manycore systems. This diversity will come in many forms: architecture, goals, programming languages, pre-parallelization, and dynamicisim. We argue that the most managable approach to such diversity is to delay optimization and parallelization until runtime.