Abstract
Power is a limiting factor in the design of embedded processors. For this reason adding more instruction extensions is not a scalable option. To overcome this issue, we study the effects of replacing the NEON unit of an ARM SoC with an FPGA-like reconfigurable fabric. We measure the gap between the conventional hard-NEON and a soft-NEON implementation. We found that the soft-NEON has an overhead of 25.17× and 6.23× for area and latency, respectively. This overhead is reduced by exploiting the reconfigurability of the fabric by incorporating FPGA-specific optimization techniques. Moreover, we show that instead of implementing the pre-defined NEON instruction set, custom instructions can be loaded to the reconfigurable fabric by using a HLS compilation flow. With this approach performance gains of over 2.8× have been obtained for some kernels.
Original language | English |
---|---|
Title of host publication | 2016 IEEE 27th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2016 |
Publisher | IEEE |
Pages | 229-230 |
Number of pages | 2 |
Volume | 2016-November |
ISBN (Electronic) | 9781509015030 |
DOIs | |
Publication status | Published - 28 Nov 2016 |
Event | 27th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2016 - London, United Kingdom Duration: 6 Jul 2016 → 8 Jul 2016 |
Conference
Conference | 27th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2016 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 6/07/16 → 8/07/16 |