Parallel Hardware Merge Sorter

Wei Song, Dirk Koch, Mikel Lujan, Jim Garside

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Sorting has tremendous usage in the applications that handle massive amount of data. Existing techniques accelerate sorting using multiprocessors or GPGPUs where a data set is partitioned into disjunctive subsets to allow multiple sorting threads working in parallel. Hardware sorters implemented in FPGAs have the potential of providing high-speed and low-energy solutions but the partition algorithms used in software systems are so data dependent that they cannot be easily adopted. The speed of most current sequential sorters still hangs around 1 number/cycle. Recently a new hardware merge sorter broke this speed limit by merging a large number of sorted sequences at a speed proportional to the number of sequences. This paper significantly improves its area and speed scalability by allowing stalls and variable sorting rate. A 32-port parallel merge-tree that merges 32 sequences is implemented in a Virtex-7 FPGA. It merges sequences at an average rate of 31.05 number/cycle and reduces the total sorting time by 160 times compared with traditional sequential sorters.

Original languageEnglish
Title of host publication24th IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2016
Number of pages8
ISBN (Electronic)9781509023561
Publication statusPublished - 16 Aug 2016
Event24th IEEE International Symposium on Field-Programmable Custom Computing Machines - Washington, United States
Duration: 1 May 20163 May 2016


Conference24th IEEE International Symposium on Field-Programmable Custom Computing Machines
Abbreviated titlejump trading
Country/TerritoryUnited States


  • FPGA
  • Merge sort
  • parallel sorting
  • sorting network


Dive into the research topics of 'Parallel Hardware Merge Sorter'. Together they form a unique fingerprint.

Cite this