Abstract
Loop tiling is a fundamental optimization for improving data locality. Selecting the right tile size combined with the parallelization of loops can provide additional performance increases in the modern of Chip MultiProcessor (CMP) architectures. This paper presents a runtime optimization system which automatically parallelizes loops and searches empirically for the best tile sizes on a scalable multi-cluster CMP. The system is built on top of a virtual machine and targets the runtime parallelization and optimization of Java programs. Experimental results show that runtime parallelization and tile size searching are capable of improving performance for two BLAS kernels and one Lattice-Boltzmann simulation, despite overheads. © 2008 Springer-Verlag Berlin Heidelberg.
Original language | English |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|Lect. Notes Comput. Sci. |
Publisher | Springer Nature |
Pages | 220-232 |
Number of pages | 12 |
Volume | 5022 |
ISBN (Print) | 9783540695004 |
DOIs | |
Publication status | Published - 2008 |
Event | 8th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2008 - Duration: 1 Jul 2008 → … http://dblp.uni-trier.de/db/conf/ica3pp/ica3pp2008.html#ZhaoHLRKW08http://dblp.uni-trier.de/rec/bibtex/conf/ica3pp/ZhaoHLRKW08.xmlhttp://dblp.uni-trier.de/rec/bibtex/conf/ica3pp/ZhaoHLRKW08 |
Publication series
Name | Lecture Notes in Computer Science |
---|
Conference
Conference | 8th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2008 |
---|---|
Period | 1/07/08 → … |
Internet address |
Keywords
- Automatic parallelization
- Feedback-directed optimization
- Loop tiling
- Multi-cluster CMP