The Pulled-Macro-Dataflow Model: An Execution Model for Multicore Shared-Memory Computers [abstract] (PDF)
Daniel J. Richins
Masters Thesis, Department of Electrical and Computer Engineering, Brigham Young University, December 2011.

The macro-dataflow model of execution has been used in scheduling heuristics for directed acyclic graphs. Since this model was developed for the scheduling of parallel applications on distributed computing systems, it is inadequate when applied to the multicore shared-memory computers prevalent in the market today.

The pulled-macro-dataflow model is put forth as an alternative to the macro-dataflow model, having been designed specifically to accurately describe the memory bandwidth limitations and request-driven nature of communications characteristic of today's machines. The performance of the common scheduling heuristics DSC and CASS-II are evaluated under the pulled-macro-dataflow model and it is shown that their poor performance motivates the development of a new scheduling heuristic. The Concurrent Tournament Reducer (ConTouR) is developed as a scheduling heuristic which operates well with the pulled-macro-dataflow model.

ConTouR is compared to the existing heuristics Load Balancing and Communication Minimization in scheduling two programs. For both programs, the other reducers are shown to outperform ConTouR.