Abstract
Task-parallel languages are increasingly popular. Many of them provide expressive mechanisms for inter-task synchronization. For example, OpenMP 4.0 will integrate data-driven execution semantics derived from the StarSs research language. Compared to the more restrictive data-parallel and fork-join concurrency models, the advanced features being introduced into task-parallel models in turn enable improved scalability through load balancing, memory latency hiding, mitigation of the pressure on memory bandwidth, and as a side effect, reduced power consumption. In this paper, we develop a systematic approach to compile loop nests into concurrent, dynamically con- structed graphs of dependent tasks. We propose a simple and effective heuristic that selects the most profitable parallelization idiom for every dependence type and communication pattern. This heuristic enables the extrac- tion of inter-band parallelism (cross barrier parallelism) in a number of numerical computations that range from linear algebra to structured grids and image processing. The proposed static analysis and code genera- tion alleviates the burden of a full-blown dependence resolver to track the readiness of tasks at run time. We evaluate our approach and algorithms in the PPCG compiler, targeting OpenStream, a representative data-flow task-parallel language with explicit inter-task dependences and a lightweight runtime. Experimental results demonstrate the effectiveness of the approach.
Original language | English |
---|---|
Journal | ACM Transactions on Architecture and Code Optimization |
DOIs | |
Publication status | Published - Nov 2014 |
Keywords
- Data-flow, task parallelism, point-to-point synchronization, auto-parallelization, polyhe- dral framework, polyhedral compiler, tiling, dynamic wavefront