Large Scale HPC

Neural Networks deep learning, data mining, cloud computing or scientific research are just a few fields where traditional servers lack computational power despite consuming a lot of energy. Recent, tremendous advances in FPGA technology has opened the door for its use in HPC applications.

 

Aldec’s scalable, FPGA-based accelerators are ideal for Large Scale HPC applications. Today’s generation of FPGA boards feature low power, Xilinx® Ultrascale™ FPGAs providing outstanding computational capabilities with power efficiency not achievable with the GPU-based accelerators.

 

Large Scale HPC Accelerators

 

 

HES-HPC-HFT-XCVU9P HES-HPC-DSP-XCVU9P HES-US-440

Logic Cells

2.5 Million 2.5 Million 5.5 Million

DSP Blocks

6840 6840 2880

On-chip RAM

75.9 Mb BlockRAM
270 Mb UltraRAM
75.9 Mb BlockRAM
270 Mb UltraRAM
88.6 Mb BlockRAM

Off-chip RAM

432 Mb QDR-II (3x 144 Mb)
Or in *-DDR version:
32 Gb DDR4 (2x 16Gb)
144 Mb QDR-II
32GB DDR4 (2x 16GB)
2x 576Mb RLD3
32GB DDR4 (2x 16GB)
1152 Mb RLD3 (2x 576Mb)

Host Interface

PCI Express x16, gen3 PCI Express x8, gen3
Zynq UltraScale+ XCZU7
PCI Express x8, gen2
Zynq-7000, XC7Z100

 

Host Interface

 

Connecting the FPGA board with a host workstation via PCIe is not trivial and if done from ground up would require extensive knowledge of hardware design. Software developers need a ready-to-use hardware platform without low-level hardware integration implications. Understanding such a use model, Aldec provides HES Proto-AXI interface that hides low level PCI Express implementation details and saves your development time. The user receives HES Proto-AXI IP-core which is based on AMBA AXI standard and bridges accelerated algorithm kernels to the PCIe bus of host computer.

 

The HES Proto-AXI has been optimized to achieve high data throughput above 2 GB/s for transfers between the Host and the HES board. It provides an easy to use memory mapped interface for integration with the Compute Device and it can be also easily converted to a streaming AXI interface. Use of external on-board memories like DDR3, DDR4 or QDR-II is also facilitated by HES Proto-AXI that contains appropriate controller and provides memory access from the same AXI interface

 

Quick Integration

An algorithm can be converted to the FPGA directly from C using Xilinx High Level Synthesis (HLS) or similar tools and then easily integrated with the HES Proto-AXI infrastructure. Alongside, provided high level C API is easy to use on either Linux or Windows OS and there is no need to develop low level PCIe drivers.

An example HPC design flow is based on the Xilinx Vivado HLS tool for direct compilation from software language C to hardware description language HDL for running in FPGA. The flow is divided into five stages and as you will see it is well integrated with Aldec HPC platform components.

 

 

Convert

The program or algorithm to accelerate is partitioned in two parts – one designated for acceleration and the other that runs on the host. Such partitioning can be made based on the results of profiling that indicate pieces of C code that are computational intensive. Next, the Xilinx Vivado HLS tool is used to convert from C to Verilog or VHDL RTL code that is appropriate for further automatic processing (synthesis and implementation in FPGA). User should choose to include AMBA AXI interface in the RTL code which will be required for the next stage.

 

Integrate

Once the HDL code is available it needs to be integrated with Aldec HES Proto-AXI - that is connected to AMBA AXI ports. Using HDL editor tool such as one from Aldec’s Riviera-PRO is sufficient for this stage. Concurrently, the main application intended to run in the host computer is modified to replace calls of algorithm functions with their counterparts using the FPGA via HES Proto-AXI API.

 

Simulate

Before running the whole project with FPGA board, you can verify it against any integration/connectivity mistakes by using Aldec’s high performance Riviera-PRO simulator and the HES Proto-AXI simulation model included in the Large Scale HPC solution.

 

Configure

The last stage is automatic Synthesis and Implementation using Xilinx Vivado environment that generates FPGA bitstream and configuration files for your main application.

 

Run

Aldec provides run-time environment that makes FPGA boards usage straightforward. The PCI Express device driver is installed and accelerator board housekeeping functions are included in the Proto-AXI API library linked with your program. When you launch your main application on the host computer the FPGA is configured automatically, so any special knowledge of FPGA operation or programming is not required, thus it’s a very convenient environment for software developers.

 

Main Features

 

Solution Contents



Printed version of site: www.aldec.com/en/solutions/hpc/large_scale_hpc