An Internal Representation for Adaptive Online Parallelization [abstract] (PDF)
Koy D. Rehme
Masters Thesis, Department of Electrical and Computer Engineering,
Brigham Young University, August 2009.
Future computer processors may have tens or hundreds of cores,
increasing the need for efficient parallel programming models. The
nature of multicore processors will present applications with the
challenge of diversity: a variety of operating environments,
architectures, and data will be available and the compiler will have
no foreknowledge of the environment until run time. ADOPAR is a
unifying framework that attempts to overcome diversity by separating
discovery and packaging of parallelism. Scheduling for execution may
then occur at run time when diversity may best be resolved.
This work presents a compact representation of parallelism based on
the task graph programming model, tailored especially for ADOPAR
and for regular and irregular parallel computations. Task graphs can
be unmanageably large for fine-grained parallelism. Rather than
representing each task individually, similar tasks are grouped
into task descriptors. From these, a task descriptor
graph, with relationship descriptors forming the edges of the
graph, may be represented. While even highly irregular computations
often have structure, previous representations have chosen to restrict
what can be easily represented, thus limiting full exploitation by the
back end. Therefore, in this work, task and relationship descriptors
have been endowed with instantiation functions (methods of
descriptors that act as factories) so the front end may have a full
range of expression when describing the task graph. The
representation uses descriptors to express a full range of regular and
irregular computations in a very flexible and compact manner.
The representation also allows for dynamic optimization and
transformation, which assists ADOPAR in its goal of overcoming
various forms of diversity. We have successfully implemented this
representation using new compiler intrinsics, allow ADOPAR
schedulers to operate on the described task graph for parallel
execution, and demonstrate the low code size overhead and the
necessity for native schedulers.
Issues in Hybrid Simulator Synthesis
[abstract] (PDF)
Zhuo Ruan, Koy Rehme, and David A. Penry
Proceedings of the 4th Workshop on Architectural Research Prototyping
(WARP), June 2009.
The Simulator Partitioning Research Infrastructure (SPRI) is a project
to automate the generation of hybrid architectural simulators. In this
paper, we examine the interesting issues and challenges in hybrid
simulator synthesis.
Multicore Diversity: A Software Developer's Nightmare [abstract] (PDF)
David A. Penry
ACM SIGOPS Operating Systems Review (OSR), April 2009.
Commodity microprocessors with tens to hundreds of processor cores
will require the widespread deployment of parallel programs. This
deployment will be hindered by the architectural and environmental
diversity introduced by multicore processors. To overcome diversity,
the operating system must change its interactions with the program
runtime and parallel runtime systems must be developed that can
automatically adapt programs to the architecture and usage environment.
SPRI: Simulator Partitioning Research Infrastructure
[abstract] (PDF)
Zhuo Ruan, Koy Rehme, and David A. Penry
Proceedings of the 3rd Workshop on Architectural Research Prototyping
(WARP), June 2008.
Using FPGAs as architectural simulation accelerators has been widely
discussed in the computer architecture design community. We previously
proposed a hybrid SW/HW simulation infrastructure named SPRI
(Simulator Partitioning Research Infrastructure) which automatically
partitions the general timing model into the software and hardware
portions for simulation speedup, conforming to the set-based
partitioning specification. The SPRI platform takes two main
inputs—partitioning specification and the architectural model; it then
produces a modified SW architectural binary and a HW-accelerated RTL
description which can communicate with each other, called hybrid SW/HW
co-simulator—the final output of SPRI. Various experiment cases have
been also run through the SPRI infrastructure to test its partitioning
functionality and API wrapper generation.
An Infrastructure for HW/SW Partitioning and Synthesis of Architectural Simulators
[abstract] (PDF)
David A. Penry, Zhuo Ruan, and Koy Rehme
Proceedings of the 2nd Workshop on Architectural Research Prototyping
(WARP), June 2007.
Many researchers are interested in using FPGAs to accelerate
architectural simulation. Partitioning of the simulator between
hardware and software is an important problem which has not been
explored because of the enormous effort required to develop different
RTL and communication infrastructure for each potential partition. We
are developing a hybrid HW/SW simulation infrastructure which will
provide tools for partitioning architectural simulators and
synthesizing RTL for the hardware portions. This infrastructure will
allow the community to explore and understand the partitioning problem
and will eventually lead to automated partitioning algorithms.
You Can't Parallelize Just Once: Managing Manycore Diversity [abstract] (PDF)
David A. Penry
Position paper for the Workshop on Manycore Computing at ICS'07, June 2007.
One of the greatest challenges for the use of manycore architectures will be
the growing diversity of manycore systems. This diversity will come in many
forms: architecture, goals, programming languages, pre-parallelization,
and dynamicisim. We argue that the most managable approach to such
diversity is to delay optimization and parallelization until runtime.