Large Scale HPC Neural Networks deep learning, data mining, cloud computing or scientific research are just a few fields where traditional servers lack computational power despite consuming a lot of energy. Recent, tremendous advances in FPGA technology has opened the door for its use in HPC applications. Aldec’s scalable, FPGA-based accelerators are ideal for Large Scale HPC applications. Today’s generation of FPGA boards feature low power, Xilinx® Ultrascale™ FPGAs providing outstanding computational capabilities with power efficiency not achievable with the GPU-based accelerators. Quick Integration & Short Bring-up Software developers need a ready-to-use hardware platform without low-level hardware integration implications. Understanding such a use model, Aldec provides a complete software stack that includes a PCI Express driver and software API that is easy to use and available for both Linux and Windows hosts. In this environment, the algorithm can be compiled into FPGA directly from C using Xilinx High Level Synthesis (HLS) or similar tools. On the hardware side, such algorithms are converted to the synthesizable IP-Core form called Compute Device containing some sort of standard interface. The most popular interface is the AMBA AXI and this is why Aldec provides the Proto-AXI host interface module that can be easily integrated with the Compute Device obtained from HLS compilation process. The Aldec Proto-AXI has been optimized to achieve high data throughput above 2 GB/s for transfers between the Host and the HES board. It provides an easy to use memory mapped interface for integration with the Compute Device and it can be also easily converted to a streaming AXI interface. The wide 256 bit local data bus runs at 125MHz to assure high bandwidth for data transfers between the Compute Device and internal or external memory, or between Compute Devices implemented in two FPGAs. The external memory controller is also embedded in the Aldec Proto-AXI module to offload the end user from low level hardware implementation details. More information is available in the following technical document, Getting started with Aldec HES7ProtoAXI. Main Features · HES-HPC FPGA-Based Accelerator board - Kintex-UltraScale XCKU085 / XCKU115 · Up to 1,451K Logic Cells & 5,520 DSP Slices - Zynq-7000 XC7Z035 / XC7Z045 / XC7Z100 - Host PHY: PCI Express x8, USB, Ethernet - 2x DDR4 16GB, 4x RLD3 576Mb - PCIe, 2x QFSP+, USB 3.0, SATA, 2x Samtec Firefly, ADC/DAC · RTL bring-up environment · RTL Porting Services Solution Contents · HES-HPC FPGA-Based Accelerator board · Proto-AXI host interface module and software stack · AXI Bus Functional Model (BFM) for RTL simulation · Riviera-PRO™ RTL simulator · Technical documentation, tutorials and white papers · Full support for Xilinx Integrated Logic Analyzer debug via JTAG · Full support for RTL Porting Services An example HPC design flow is based on the Xilinx Vivado HLS software for direct compilation from C to FPGA. The program or algorithm to accelerate is partitioned in two parts – one designated for acceleration and the other that runs on the host. Such partitioning can be made based on the results of profiling that indicate pieces of C code that are computational intensive. Once the C code to accelerate in FPGA is identified the Xilinx Vivado HLS tool is used to convert from C to Verilog or VHDL RTL code that is appropriate for further automatic processing (synthesis and implementation). Other tools can be also used instead of Vivado HLS, for example Cyber Workbench from Aldec Partner NEC. The RTL code needs to be integrated with Aldec Proto-AXI. Concurrently, the main host application is modified to replace some functions with Proto-AXI API calls. Before running the whole project in the FPGA board, you can verify it against any integration/connectivity mistakes by using Aldec’s high performance Riviera-PRO simulator and the Proto-AXI co-simulation plug-in and AXI BFM. The last stage is automatic Synthesis and Implementation using Xilinx Vivado environment that generates FPGA bitstream and configuration files for your main application. Aldec provides run-time environment that make FPGA boards usage seamless. FPGA devices are configured automatically when you launch your main application on the host, so the process does not require any special knowledge of FPGA operation or programming which makes it ideal environment for software developers.