Abstract
Development of application specific accelerators for deep convolutional neural networks (ConvNets) have mainly focussed on accelerating the computationally intensive layers, that is the convolutional layers, to improve performance and energy efficiency. Traditional approaches in this space have relied on handcrafted dataflow implementations to leverage the fine-grained parallelism and datalocality properties within these layers. However, ConvNets layers also have an untapped potential from cross-layer data locality.
In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memoryaware heuristic based on Heterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems.
Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.
In our work, we explore a novel approach in the context of deep neural networks accelerators by modelling the computation as a task-dependency directed acyclic graph and proposing a memoryaware heuristic based on Heterogeneous Earliest Finish Time (HEFT) for task-graph scheduling on shared memory systems.
Our results show the benefits of task graphs in terms of better memory use (23.4 % less) over conventional layer-by-layer processing in a simulated environment with the first three layers of LeNet-5. Certain task-graphs trade-off makespan (10% increase) for memory use (20 % decrease). Finally, our exploration of graphs with different slicing configurations for the pooling layer while using memory-aware HEFT versus the original HEFT reveals that regular shaped tiles across layers offers better makespan and memory use than tiles with large dimensions along one axis.
Original language | English |
---|---|
Title of host publication | 5th Workshop on design of Low Power EMbedded Systems at Computing Frontiers 2019 |
Pages | 366-372 |
DOIs | |
Publication status | Published - 2019 |
Event | The 16th ACM International Conference - Alghero, Italy Duration: 30 Apr 2019 → 2 May 2019 |
Conference
Conference | The 16th ACM International Conference |
---|---|
Period | 30/04/19 → 2/05/19 |
Keywords
- Convolutional neural networks
- scheduling
- task-based parallelism
- accelerator systems