Large Scale HPC

FPGA acceleration has been a key enabler to solve today’s bleeding edge computational problems. Neural Networks deep learning, data mining, cloud computing or scientific research are just a few fields where traditional servers lack computational power despite consuming a lot of energy. Recent, tremendous advances in FPGA technology has opened the door for its use in HPC applications.


Aldec’s scalable, FPGA accelerators are ideal for large scale HPC applications. Today’s generation of FPGA accelerator boards feature low power, Xilinx® Ultrascale™ FPGAs providing outstanding computational capabilities with power efficiency not achievable with the GPU-based accelerators.


FPGA Accelerators


  • HES-XCVU9P-QDR - low profile form factor board with PCIe x16 that can be installed directly inside servers used in data centers. On this board FPGA is mated with high bandwidth QDR-II+ memories provide high throughput for algorithm acceleration.
  • HES-XCVU9P-ZU7EV - board with separate host interface chip Xilinx Zynq UltraScale+ XCZU7 and another FPGA Xilinx UltraScale+ XCVU9P designated entirely for user’s application. It’s logic resources containing large number of DSP blocks (6840) making it ideal for DSP and computer vision applications.
  • HES-US-440 - stand-alone board with external PCIe x8 cable connection contains the largest Xilinx Virtex Ultrascale device with unprecedented capacity of 5.5 million logic cells, DDR4 memory up to 64GB in two modules and fast RLDRAM. It is dedicated for accelerating very complex algorithms or those which can benefit from large number of replicated instances of the algorithm kernel.
  • HES-XCKU11P-DDR4 - low profile form factor board with PCIe x16 that can be installed directly inside servers used for HPC/HFT. Includes Kintex UltraScale+ which belongs to the best price/performance/watt balance FPGA family. Two QSFP-DD can provide high bandwidth and low-latency communication (up to 400 Gbps).



Logic Cells

2.5 Million 2.5 Million 5.5 Million 653,000

DSP Blocks

6840 6840 2880 2,928

On-chip RAM

75.9 Mb BlockRAM
270 Mb UltraRAM
75.9 Mb BlockRAM
270 Mb UltraRAM
88.6 Mb BlockRAM 21.1 Mb BlockRAM
22.5 Mb UltraRAM

Off-chip RAM

432 Mb QDR-II (3x 144 Mb)
Or in *-DDR version:
32 Gb DDR4 (2x 16Gb)
144 Mb QDR-II
32GB DDR4 (2x 16GB)
2x 576Mb RLD3
32GB DDR4 (2x 16GB)
1152 Mb RLD3 (2x 576Mb)
SODIMM DDR4 Memory socket
512 Mb Flash Memory   2x 64 kb I2C EEPROM

Host Interface

PCI Express x16, gen3 PCI Express x8, gen3
Zynq UltraScale+ XCZU7
PCI Express x8, gen2
Zynq-7000, XC7Z100
PCIe x16 gen3 endpoint or PCIe x8 gen4
2x QSFP-DD (total up to 400Gbps)


Host Interface


Connecting the FPGA accelerator board with a host workstation via PCIe is not trivial and if done from ground up would require extensive knowledge of hardware design. Software developers need a ready-to-use hardware platform without low-level hardware integration implications. Understanding such a use model, Aldec provides HES Proto-AXI interface that hides low level PCI Express implementation details and saves your development time. The user receives HES Proto-AXI IP-core which is based on AMBA AXI standard and bridges accelerated algorithm kernels to the PCIe bus of host computer.


The HES Proto-AXI has been optimized to achieve high data throughput above 2 GB/s for transfers between the Host and the HES board. It provides an easy to use memory mapped interface for integration with the Compute Device and it can be also easily converted to a streaming AXI interface. Use of external on-board memories like DDR3, DDR4 or QDR-II is also facilitated by HES Proto-AXI that contains appropriate controller and provides memory access from the same AXI interface


Quick Integration

An algorithm can be converted to the FPGA directly from C using Xilinx High Level Synthesis (HLS) or similar tools and then easily integrated with the HES Proto-AXI infrastructure. Alongside, provided high level C API is easy to use on either Linux or Windows OS and there is no need to develop low level PCIe drivers.

An example HPC design flow is based on the Xilinx Vivado HLS tool for direct compilation from software language C to hardware description language HDL for running in FPGA. The flow is divided into five stages and as you will see it is well integrated with Aldec HPC platform components.




The program or algorithm to accelerate is partitioned in two parts – one designated for acceleration and the other that runs on the host. Such partitioning can be made based on the results of profiling that indicate pieces of C code that are computational intensive. Next, the Xilinx Vivado HLS tool is used to convert from C to Verilog or VHDL RTL code that is appropriate for further automatic processing (synthesis and implementation in FPGA). User should choose to include AMBA AXI interface in the RTL code which will be required for the next stage.



Once the HDL code is available it needs to be integrated with Aldec HES Proto-AXI - that is connected to AMBA AXI ports. Using HDL editor tool such as one from Aldec’s Riviera-PRO is sufficient for this stage. Concurrently, the main application intended to run in the host computer is modified to replace calls of algorithm functions with their counterparts using the FPGA via HES Proto-AXI API.



Before running the whole project with the FPGA accelerator board, you can verify it against any integration/connectivity mistakes by using Aldec’s high performance Riviera-PRO simulator and the HES Proto-AXI simulation model included in the Large Scale HPC solution.



The last stage is automatic Synthesis and Implementation using Xilinx Vivado environment that generates FPGA bitstream and configuration files for your main application.



Aldec provides run-time environment that makes FPGA accelerator boards usage straightforward. The PCI Express device driver is installed and accelerator board housekeeping functions are included in the Proto-AXI API library linked with your program. When you launch your main application on the host computer the FPGA is configured automatically, so any special knowledge of FPGA operation or programming is not required, thus it’s a very convenient environment for software developers.


Main Features

  • Choice of several FPGA accelerator board to match project requirements
  • Scalability with multiple-board configurations support
  • Supports hot-reconfiguration of FPGA
  • Integrated with FPGA development and verification environment


Solution Contents

  • HES-HPC FPGA Accelerator board
  • HES Proto-AXI host interface module and software stack
  • AXI Bus Functional Model (BFM) for RTL simulation
  • Riviera-PRO high performance HDL simulator
  • Reference designs, technical documentation, tutorials and white papers
  • Integration services
Ask Us a Question
Ask Us a Question
Captcha ImageReload Captcha
Incorrect data entered.
Thank you! Your question has been submitted. Please allow 1-3 business days for someone to respond to your question.
Internal error occurred. Your question was not submitted. Please contact us using Feedback form.
We use cookies to ensure we give you the best user experience and to provide you with content we believe will be of relevance to you. If you continue to use our site, you consent to our use of cookies. A detailed overview on the use of cookies and other website information is located in our Privacy Policy.