Accelerating Simulation of Vivado Designs with HES

Table of Contents

Introduction

Complex FPGA Design Verification Challenge

Vivado Example Design

Simulating Example Design

Testbench for Simulation

Simulation Acceleration with HES`TM

HES-DVM Design Setup Flow

Preparing Xilinx Design for Acceleration

Design Compilation

Preparing Xilinx MIG Wrapper

Reusing EDIF netlists for Xilinx IPs

Mapping MIG Wrapper as external memory

Acceleration Benchmark

Conclusions

Introduction

With FPGA designs growing as fast as Moore's law, verification is becoming a bigger challenge and HDL simulation is quickly becoming a major bottleneck. This application note shows how to accelerate simulation using Aldec's HES-DVM to fix the simulation bottleneck.

Complex FPGA Design Verification Challenge

The FPGA design and verification “ecosystem” changes rapidly to keep pace with the fast growing size of FPGA devices. The largest Xilinx Virtex UltraSCALE chips provide 4.4 Million logic cells or using another metric, 50 million equivalent gate count.

To enable efficient design process for Virtex-7 and newer FPGAs, Xilinx provides software called Vivado Design Suite (Vivado in short). Besides supporting a classical HDL design flow, it also provides convenient system level design tools like IP Integrator, System Generator or even High Level Synthesis, which are very convenient for designing large and complex designs. With IP-cores from the libraries or generated automatically from algorithmic descriptions, it is possible to build a complex system in a week instead of months. It is also important to notice that Xilinx has decided to deploy the AXI standard for interconnecting IPs which improves re-usability and further accelerates the design process. It's needless to say that Vivado provides configurable AXI interconnect IPs which allows quick designing and evaluation of hierarchical networks on chip (NoC).

Verification has always taken a significant share of the project schedule with HDL simulation being the main stage of that process. It is not much different now. Xilinx has even decided to bundle a native HDL simulator with Vivado. It is good enough for simulating designs of the smallest FPGA devices, however when it comes to larger ones it becomes a bottleneck. I guess this is one of the reasons why Vivado allows to choose third party simulator from another EDA vendor, for example Aldec's Riviera-PRO.

With such big designs even the fastest third party simulators spend hours simulating not very long test scenarios. In such cases many engineers are tempted to skip simulation and jump right into the FPGA prototyping. Having both front-end and back-end design flows conveniently integrated by Vivado, it is very easy to launch synthesis and place & route jobs and get the first FPGA bit-file within two hours. The brutal reality quickly brings them down to earth when after programming FPGA in the board it doesn't do what they expect. Even if they were smart enough to hook the Xilinx ChipScope they quickly discover that selected debug probes are not adequate. It's time to return to HDL simulation.

If you have similar experience you should definitely consider HDL simulation acceleration. Aldec for years has been providing emulation solutions based on Xilinx FPGAs that were used successfully with ASIC designs. But the latest rapid growth of FPGA design sizes requires an upgrade to the simulation environment. This Application Note presents simulation acceleration of a design entirely created with the Vivado Design Suite.

Vivado Example Design

The benefits of using simulation acceleration can be seen even with a small portion of a larger design. A good example is the memory subsystem that can be configured in many ways but is obviously present in every design. It can provide access to an off-chip DDR3 SDRAM memory module for multiple sub-systems in an FPGA.

The DDR3 controller and AXI Interconnect would be the main blocks of such a memory subsystem as shown in Figure 1.

  1. DDR3 SDRAM controller

  2. AXI Interconnect

Figure 1. DDR3 Memory Subsystem

It can be built using the Vivado IP Integrator and Diagram Editor as shown in Figure 2[1].

Figure 2. DDR3 Memory Subsystem in the Vivado Diagram Editor

The two main IP blocks in this block diagram are:

1. Memory Interface Generator(MIG) - provides DDR3 controller and AXI slave as application interface.

2. AXI Interconnect - provides 3 independent AXI4 slave ports with different data sizes and arbitrates access to shared DDR3 memory.

Figure 3. AXI Interconnect and MIG blocks

For the sake of verification we have added some additional blocks:

3. AXI Traffic Generators - represent other sub-systems and inject AXI transfers into two out of three slave ports of AXI Interconnect.

Figure 4. Traffic Generators

4. AXI Protocol Checker - to monitor the AXI bus between the AXI Interconnect and MIG controller. The protocol checker provides AXI bus status information.

Figure 5. AXI Protocol Checker

Note that all blocks (1-4) mentioned above are synthesizable. The Diagram Editor tool in Vivado is used to draw interconnections. Some ports of instantiated IPs are made external. They are as follows:

Figure 6. External IO Ports

Simulating Example Design

The design shown in Figure 2 can be simulated with Aldec's Riviera-PRO simulator which can be selected in the Vivado Project Settings dialog box. Since Riviera-PRO will be used standalone, we set the "Generate scripts only" option.

Figure 7. Setting Riviera-PRO as the default simulator in the Vivado Project Settings

Next it is required to generate HDL source code using the option "Create HDL Wrapper..." and then call "Run Simulation" in the Flow Navigator (Figure 8) to create a simulation folder with all required scripts.

Figure 8. Running simulation from Vivado

The simulation scripts generated by Vivado can be used to compile the HDL wrapper of the diagram and all IP blocks used in it. There is, however, no testbench which can be used to generate stimulus. This should be developed separately.

Testbench for Simulation

The testbench should drive all inputs like clocks, resets, and generate traffic on one of the AXI slave ports which has been made external (S02_AXI). It also should connect the Design Top to a simulation model of the DDR3 memory. There are two embedded AXI traffic generators in the design and their start/stop/irq ports should be controlled by the testbench as well. Finally the testbench should monitor the AXI Protocol checker outputs (pc_status, pc_asserted) and contain self-checking functionality. Figure 9 shows a block diagram of such a testbench.

Note: Vivado provides example projects for some IP. Such projects contain necessary simulation models and testbenches. In this design we have reused the DDR3 simulation model and simulation wrapper of the MIG IP.

Figure 9. Testbench Block Diagram

Once the testbench is ready, we can run simulation and then move to accelerated simulation to see if there is any speedup.

Simulation Acceleration with HES`TM

Aldec has been providing HESв„ў - Hardware Emulation Solutions for many years now. During that time HES has evolved to address the most sophisticated design requirements and customer's requests. Simulation acceleration is only one example of how HES can be used[2]. It further splits into two use models:

  • Signal-level and cycle accurate

  • Transaction-level and loosely timed or untimed

In this application note we're demonstrating the former as it can be used with any kind of testbench. Please note, however, that in the case of transaction-level simulation, the acceleration ratio is orders of magnitude higher, therefore, an investment into transaction-level testbenches will result in higher return on investment.

Figure 10. Signal-level simulation acceleration

Aldec provides the DVM™ tool to automate the entire process of design compilation and implementation for HES boards. It also contains co-simulation interface libraries for various HDL simulators to enable simulation acceleration with other EDA vendors. However, the combination of Riviera-PRO and HES is best optimized for speed as both tools are from the same vendor and Aldec could develop direct hooks in the Riviera-PRO simulation kernel to squeeze every millisecond. Simulators from other vendors are supported via standard PLI/VHPI interfaces.

HES-DVM Design Setup Flow

The design setup flow (Figure 11) for simulation acceleration is fully automated with the DVM tool, which integrates with Vivado synthesis and implementation utilities.

Figure 11. Design Setup Flow for Acceleration using DVM™

The DVM has many features and those used with the current example design are as follows:

  • VHDL & SystemVerilog support - allows compiling any kind of design

  • EDIF netlist support - allows reusing Vivado out of context synthesis

  • External memory - mapping large memories (like DDR) to external off-chip memories

The outcome of DVM setup flow is a set of files containing:

  • HDL Wrapper for the module/design added to HES (Verilog in this case)

  • FPGA bitfiles (fpga_*.bit)

  • Simulator script (Simulate_HES.do)

  • Other files for mapping hardware signals and debug configurations

The HES board is seamlessly integrated with the simulator. It connects with host workstation via the PCI Express interface. Aldec provides all necessary drivers and HDL simulator interfaces. Running accelerated simulation is as easy as executing the gnerated simulator script (Simulate_HES.do).

Preparing Xilinx Design for Acceleration

Let's review details of design setup for acceleration. We'll review the steps that are specific for design created in Xilinx Vivado design suite.

Design Compilation

Other than with Riviera-PRO it is not necessary to compile all design and library sources in DVM. Rather, we will take advantage of the Vivado out-of-context (OOC) compilation flow where each IP instantiated in the design can be synthesized separately.

Once the block diagram is ready and validated, the OOC files can be generated using “Generate Output Products ...” option in drop-down menu of the block diagram source file as in the Figure 12.

Figure 12. Generate Output Products in Vivado

In the “Generate Output Products” pop-up window there is an option “Out of context per IP” which should be selected as in Figure 13. Output products generated this way contain two type of files that are of interest during the DVM setup flow. These are:

  1. Verilog stub files, the black-box source for each IP. Their names contain the "_stub" suffix (*_stub.v).

  2. Synthesis checkpoints (*.dcp) which are archives containing synthesized netlists (EDIF file).

Figure 13. Generating Out of Context Products in Vivado

The required IP stubs and dcp checkpoints can be found under sources directory of the Vivado project. For example:

./x_axi_mig.srcs/sources_1/bd/x_axi_mig/ip/

Each IP is stored in a separate directory and the respective stubs and DCP files can be found using the following find command in Linux:

find *.srcs/ -type f -name "*_stub.v"
find *.srcs/ -type f -name "*.dcp"

Except IP stub files it is also required to compile all wrappers generated from the block diagram and other custom HDL source files that were added in the Vivado project.

Preparing Xilinx MIG Wrapper

The Xilinx Memory Interface Generator IP will be handled differently than other IPs in the DVM tool. Instead of using EDIF netlist from the DCP checkpoint we will map it to the external memory model with AXI interface that is delivered with the DVM tool. The advantage of such mapping is that the Xilinx MIG will be replaced with the compound DDR controller that is already connected to the on-board DDR3 memory.

Since, the memory model from DVM library provides only pure AXI slave interface it is required to create an appropriate wrapper that emulates the behavior of other non-AXI signals. It is convenient to start with the MIG stub source file and modify it to achieve the required functionality. The MIG stub contains the following interfaces and ports:

Interface or Port

Prefix

Remarks

AXI Slave

s_axi_*

should be connected with Aldec AXI Slave wrapper

DDR

ddr_*

Outputs should be left tied-off and bidirectional buses left unconnected

APP Interface

app_*

These are unused outputs and can be tied-off to Low.

OUT: ui_clk

The simplest implementation can be just pass-through of input clock sys_clk_i. To be more accurate, it can be modeled as a gated input clock sys_clk_i with the gate signal enabling ui_clk after a number of cycles after sys_rst goes High.

OUT: ui_clk_sync_rst

This output system reset is active High. It starts in a High state when sys_rst is asserted Low and is deasserted after a number of cycles after sys_rst goes High.

OUT: mmcm_locked

Indicates that MMCM calibration is complete inside the Xilinx MIG. It is asserted High after some time since sys_rst deassertion.

OUT: init_calib_complete

Indicates the DDR line calibration is complete. This signal is asserted High some time after mmcm_locked.

Note: The reference implementation of the emulation wrapper for the MIG controller is available within the example project delivered with the HES-DVM software.

Reusing EDIF netlists for Xilinx IPs

Once all sources are compiled and the design is elaborated in the DVM tool the next stage will be to run synthesis as shown in Figure 11. Before synthesis is launched it is required to set EDIF netlists for all Xilinx IPs that were compiled as black-box stubs. It turns out that such EDIF netlists are included in DCP check-point files which are regular ZIP archives. You can use any standard ZIP packer program to extract EDIF netlist from DCP. The EDIF file base name is the same as DCP and has extension .edf. For example we can obtain the following DCP file generated for the AXI crossbar IP: "x_axi_mig_xbar_0.dcp"

The EDIF netlist can be extracted in Linux with the following command:

unzip x_axi_mig_xbar_0.dcp x_axi_mig_xbar_0.edf

In the DVM you can find the instance of this module in the design structure and set it as EDIF using the extracted netlist.

Mapping MIG Wrapper as external memory

Unlike other Xilinx IP cores, the MIG should be mapped to external memory instead of being set as EDIF. The DVM provides a library of external memory models and one of them has an AXI Slave interface. This model should be used for the MIG wrapper.

An additional advantage of using DVM external memory model is that it provides a debugging interface so memory contents can be viewed and changed at simulation runtime.

Acceleration Benchmark

For benchmarking, the testbench was orchestrated as follows:

  1. Start simulation and wait for DDR3 controller initialization & calibration

  2. Start AXI traffic generators

  3. Generate 200 write and read burst transfers on AXI port S02_AXI. (Each burst transfer length is 64. In orther words, the testbench writes and reads 100KB of data from the memory subsystem.)

The same test scenario was used during pure Riviera-PRO simulation and then during simulation accelerated with HES. Wall-clock time was measured for both simulations.

Riviera-PRO

Riviera-PRO + HES

Wall-clock time

1537 seconds

3 seconds

Acceleration ratio

512

Workstation and software used for benchmarking:

Workstation:

Component

Value

CPU

Intel® Core™ i7-3770K CPU @ 3.50GHz

RAM

32 GB

HES Board

HES7XV4000BP_REV2, contains 2x Virtex-7 2000T FPGAs

Software:

Component

Value

OS

Linux CentOS 6, x86_64

Simulator

Riviera-PRO 2017.02

Design env

Vivado 2016.4

Acceleration env

HES-DVM 2017.02

Conclusions

Simulation technology is not scaling as fast as design gate counts, so it becomes a real bottleneck in the design verification process. The slow simulation is experienced today even with large FPGA designs. Xilinx Vivado provides a very powerful design environment and rich libraries of ready-to-use IP cores. Their simulation models based on RTL or gate level netlists are often encrypted and slow down simulation.

In this application note we have demonstrated the benchmark proving that simulation speed can be improved significantly. Simulating in HES just small memory sub-system which usually is a fraction of the whole design reveals big potential of FPGA based simulation acceleration. The key to success however is in design setup automation and how easily you can connect FPGA boards with the simulator. Over the years Aldec has proven the leading position in that field providing high level of automation and seamless integration with all major HDL simulators.

References

[1] Xilinx provides rich resources of training and tutorial videos. Visit the Xilinx website for more information.

[2] Visit the Aldec website for more information: https://aldec.com/en/solutions/hardware_emulation_solutions

Ask Us a Question
x

Ask Us a Question

x
Captcha ImageReload Captcha
Incorrect data entered.
Thank you! Your question has been submitted. Please allow 1-3 business days for someone to respond to your question.
Internal error occurred. Your question was not submitted. Please contact us using Feedback form.