The Convergence of Emulation and Prototyping

Krzysztof Szczur, Hardware Verification Products Manager

Like(3) Comments (0)

During the development of a system on chip (SoC), hardware emulation and FPGA prototyping play distinct and essential roles.

● Emulation is used to verify that a design meets its functional requirements, where the verification is performed by emulating the hardware and simulating (using a testbench) the environment in which it must perform.

● FPGA prototyping is more of a ‘validation’ exercise, coming later in the SoC development lifecycle, and is far more real world. For example, instead of capturing individual frames of an emulated HDMI port, you would drive real hardware with a real video stream.

In the past, chip design emulators were large systems in their own right and incorporated custom processors (ASICs/ASSPs) – and they are still used for emulating very large SoCs. The table below (figure 1) contrasts traditional emulation and FPGA prototyping.

	Traditional emulation	FPGA prototyping
Cost	High	Moderate
Capacity	several billion-gate	1 billion gates - multiple FPGAs
Typical clocks frequency	1-5 MHz	>10 MHz
Design compilation automation	Fully automated	Low level of automation
Design compilation time and turnarounds	Fast	Slow
Hardware Debugging	Excellent - full visibility like a HDL simulator	Poor - on-chip logic analyzer for single-FPGA
Host interface	Yes - interoperability with simulators, virtual platforms, C++/SystemC	No - has to be custom made
External connectivity	Limited	Excellent
Size	Large - very large (not portable)	Small - medium (portable)

Figure 1: Traditional emulation and FPGA prototyping – both essential for SoC development and both with strengths and weaknesses.

As per the above table, FPGA prototyping’s strengths include high speed and great external connectivity. Let’s discuss both.

High speed

Compared to simulation, emulation is much faster but even so it can, for example, take hours for an OS to boot. An FPGA prototyping platform is faster still and an OS, like Linux, can boot in a matter of minutes.

However, it is not uncommon for hardware problems to be revealed when new driver code is being exercised by the software team. In this respect, the FPGA prototyping environment is not convenient for tracing hardware bugs, and the software team may need to spend hours with the design verification team to reproduce the same problem in a simulation environment.

Connectivity

SoC designs contain several interfaces to interact with the external world. Thankfully, FPGA prototyping boards provide hundreds or even thousands of I/O via standardized connectors like FMC, making them ideal for expansion. In most cases, the design runs at a clock speed fast enough to connect to external devices directly. In other instances, speed-adapters can be implemented as IPs in the same FPGA as the design under test.

Traditional emulators on the other hand were not designed to connect to peripherals. There are no direct I/Os available so if it is necessary to connect external devices dedicated speed-bridge hardware is required.

Needless to say, current SoC projects with many interfaces, often requiring high bandwidth and high speed I/O, quickly exceed the capabilities of the expansion bus.

Emulation’s strengths

As per figure 1, emulation has some advantages over FPGA prototyping, including:

Host interface (interoperability with simulators, virtual platforms etc.)
Automated and fast design compilation
Excellent hardware debugging capabilities

For these reasons, emulation is used much earlier in the design verification cycle. And it is not uncommon to see emulation used by design verification teams for simulation acceleration of blocks or subsystems long before the full integration of the SoC.

At such an early stage of design verification bugs will no doubt be found, and this is where emulation’s debugging capabilities prove their worth. Visibility of signals or memory data like that used in the HDL simulator, breakpoints and full control of design clocks are the features that make the debugging process so efficient.

When the bugs are examined and diagnosed, designers can correct the HDL code and then expect a fast and automated design compilation so that the design can be emulated again.

Convergence

Traditional emulation and FPGA prototyping have unique and distinct advantages, and large organizations (working on multiple billion-gate plus designs) spend a great deal of money on both. And, as ‘time is money’, there is the cost of preparing infrastructure, interfaces and test environments that are not interchangeable between the emulator and the FPGA prototyping platform.

Wouldn’t it be great if, for designs of a certain size, there were a single platform that could be used for emulation and FPGA prototyping - and spare us duplicating lots of work?

As per figure 1, FPGA prototyping platforms can cater for up to a billion ASIC gates. For SoC designs with fewer than a billion gates, FPGA prototyping platforms can, with a few additions, be used for emulation purposes. These ‘additions’ are:

Host interface and SCE-MI;
Automated design setup; and
Hardware debugging capabilities

Let’s discuss each.

Host interface and SCE-MI

A host interface is needed to connect the design with the verification environment running on the host, and the FPGA prototyping board should provide a high-speed, high-bandwidth and low-latency interface. Thankfully, today's larger FPGAs provide one, more or all of the interfaces shown in figure 2.

Host interface	Rate
PCI Express 4.0 (x8 link)	128 Gbit/s
QSFP+, 4 channels	40 Gbit/s
Ethernet 10GBASE-X	10 Gbit/s
USB 3.0	5 Gbit/s

Figure 2: Common host interfaces in today’s larger FPGAs.

Of the above, PCI Express is particularly popular because of not just its high bandwidth but also its versatility, and it is often used as a common bus in chipsets of modern host computers (with USB or Ethernet bridging to it). See figure 3.

Figure 3: PCI Express is popular for host interface.

On the software side, it is necessary to develop a PCI Express device driver and, on top of that, a plug-in library to a given verification tool.

If there is another interface in the design (AMBA AXI, for example) that needs to be driven from the testbench, one needs to add a bridge that provides a protocol layer. This complicates the host interface development, particularly if multiple interfaces need to share a single PCIe host connection.

However, a transaction-level host interface, based on Accellera’s SCE-MI standard, is a practical solution. It enables the rapid development of transactors that can be used for HDL simulation and then for emulation (see figure 4).

Figure 4: The same testbench infrastructure can run much longer test sequences because design simulation is accelerated between 10 and 100x.

The SCE-MI standard provides three types of message transport use models (macro-based, pipe-based and function-based – see figure 5) that can easily be used in a variety of verification environments and HDL or C/C++ languages.

Figure 5: The SCE-MI standard provides three interfaces.

The user is not concerned about the SCE-MI infrastructure implementation issues, such as underlying PCIe controllers or drivers, so can spend time designing transactors and the testbench. An additional benefit is that such testbench components are then reusable across other projects and even different simulators and emulation platforms.

Automated Design Setup

Users planning to acquire a bare FPGA prototyping board should also plan to hire a couple of experienced FPGA design engineers who know how to implement the design so that it works. That advice applies especially if the design is large and needs to be partitioned into multiple FPGAs.

Design setup flow has four major steps (the grey boxes in figure 6).

Figure 6: Design set up steps.

There are different tools dedicated to each step and the end user must become expert in all, which takes time. Indeed, verification engineers wouldn’t be able to prepare a design emulation setup for an FPGA without the help of dedicated FPGA team. Toolchain integration is difficult and expensive to create and maintain too. This difficulty increases the overall design setup time (plus the time needed for debug-fix turnarounds) to unacceptable levels.

Incremental design compilation is a way to decrease the mean time of turnarounds. The toolchain for FPGA setup should contain a synthesis tool - or integrate with a third party one - and provide incremental synthesis so that if a bug is fixed in a single module, it would be necessary to resynthesize only that module. The synthesis process may be quite fast, even for the first time when the entire design is synthesized, if it is run in multiple parallel threads that are submitted to a compute farm.

The next challenge faced during design setup for FPGA prototyping relates to the ASIC specific design considerations and code templates seen in the RTL code. For instance, if there are any ASIC memory macro cells or external DDR memories that can’t be synthesized, the FPGA backend team has to develop FPGA compliant models. This is an additional R&D and verification effort, but it can be avoided if the FPGA prototyping platform vendor provides a memory mapping utility.

Another common problem solved by EDA is the mapping of gated clocks that have to be converted to clock enables (CEs) in order to work reliably in the FPGA. See figure 7.

Figure 7: Clock enables are needed for implementation in an FPGA.

When design compilation, synthesis and mapping of RTL to the FPGA proof netlist is done, the next challenge is to fit the design into an FPGA or, more often, to partition it and map to multiple FPGAs in the prototyping platform.

Ideally, the same tool would provide partitioning functionality – so the user need not be overly concerned about how to implement chip-to-chip interconnections or whether to use multiplexers, for example.

A smart tool will provide a simple interface that allows FPGA resources and interconnections to be monitored and that also allows partitions to be created (automatically or with user guidance). The type of the interconnection should be set automatically by the tool to maximize emulation performance on a given FPGA board – see figure 8.

Figure 8: Emulation infrastructure contains I/O serializers and an emulation controller that should be generated and integrated with design partitions automatically.

Finally, when the partitioning is done, the tool should generate implementation scripts and run FPGA vendor place & route software.

Maintaining implementation constraints for timing and I/O pin assignments should be done automatically as well. Each FPGA implementation process can be submitted to a compute farm, as is the case during synthesis, so that all bit-files are implemented simultaneously. This allows the total implementation time to be reduced significantly. In future iterations, the tool should be smart enough to determine which FPGA needs to be re-implemented and which bit-files should remain unchanged.

Hardware debugging

FPGA debugging has improved considerably during recent years with, for example, the addition of an on-chip logic analyzer function provided by most FPGA vendors. However, it is still a long way from the capabilities of an emulator.

The first and most important requirement is to have 100% of signals available for debugging without the need to respin the design synthesis/implementation cycle. You need to be able to select any design signal for probing/debugging without having to re-run the design setup.

Next, the ability to control clocks and to stop them when a hardware breakpoint is hit is something that verification and design engineers would appreciate.

Finally, a memory back-door interface that provides data upload and download while the emulation is running is essential for debugging heterogeneous SoC designs containing RISC-V or ARM processors and various compute accelerators like DPU and TPU.

All debugging features should be correlated with the design HDL/RTL view, so that you can easily find the required signals and memory instances and data captured from hardware is converted to abstract data types as defined in HDL source code.

Figure 9: Advanced debugging features assure 100% signal availability and RTL-level viewers improve productivity when diagnosing problems encountered during emulation.

If such advanced debugging capabilities are needed, you need look no further than the EDA tool marketplace: noting that the FPGA technology and interoperability of tools in the FPGA toolchain are factors. For example, debugging (probe) instrumentation has to be done in RTL code so that the synthesis tool does not trim or optimize given signals. Also, elements of the debugging infrastructure – such as on-chip logic analyzer(s), multi-FPGA debug synchronization, and the storage for samples or interface bus to host data - have to be integrated and optimized to provide appropriate performance level and reliability.

Emulation and prototyping combined

Reusing the same hardware platform for both emulation and prototyping brings significant cost savings, and the emerging new verification techniques are even more appealing.

It is possible to divide the design into subsystems and run some of them in the prototyping mode while others run on the emulation ‘side’.

For example, a CPU subsystem can run at 100MHz while a custom processor IP or multimedia sub-system can run at emulation speed with full visibility and with host/testbench connection that are available in emulation.

Figure 10: A combined emulation and prototyping platform enables efficient hardware-software co-verification.

The CPU subsystem running at hundreds of MHz is great news for software developers as the whole platform and software debugging tools are very responsive and booting up the OS takes just a few minutes.

At the same time, if one encounters a hardware problem, it can be diagnosed by hardware designers on the same emulation platform on which it has been detected; meaning there may be no need to try and reproduce the problem in a simulator just to see what’s going on.

Let’s consider another example; emulating a video processor IP that receives an input data stream from the host computer but displays the output directly on a monitor through the HDMI PHY on the daughter card. See figure 11.

Figure 11: Above, just one example of how real hardware can be driven from IP that is being emulated.

Understandably, subsystems will run in different clock domains just like they will in the final SoC. They must be connected properly through CDC synchronizers because they are supposed to be driven from real-world, asynchronous clocks. It’s a bridge that needs crossing anyway, you may as well do it now.

Summary

Physical prototyping in FPGA is essential if you want to validate how your SoC design interfaces with the real world. It provides the software team with a great, high-speed platform. Not so great is visibility over the hardware unless the FPGA prototyping platform also has some functions and features for which emulators are best known.

Convergence delivers the ability to reuse the simulation testbench in FPGA testing plus test automation and control from the host PC that allows users to run thousands of regression tests. Overall project time is reduced. As are costs.

Remember figure 1? Here it is again…

	Traditional emulation	FPGA prototyping
Cost	High	Moderate
Capacity	several billion-gate	1 billion gates - multiple FPGAs
Typical clocks frequency	1-5 MHz	>10 MHz
Design compilation automation	Fully automated	Low level of automation
Design compilation time and turnarounds	Fast	Slow
Hardware Debugging	Excellent - full visibility like a HDL simulator	Poor - on-chip logic analyzer for single-FPGA
Host interface	Yes - interoperability with simulators, virtual platforms, C++/SystemC	No - has to be custom made
External connectivity	Limited	Excellent
Size	Large - very large (not portable)	Small - medium (portable)

Figure 1 (repeated): Traditional emulation and FPGA prototyping – both essential for SoC development and both with strengths and weaknesses.

So, let’s conclude with a modification to that table – and see where a combined solution would fit.

	Traditional emulation	Combined emulation/prototyping	FPGA prototyping
Cost	High	Moderate	Moderate
Capacity	several billion-gate	1 billion gates - multiple FPGAs	1 billion gates - multiple FPGAs
Typical clocks frequency	1-5 MHz	emulation domain 1-10 MHz prototyping domain >10 MHz	>10 MHz
Design compilation automation	Fully automated	Fully automated	Low level of automation
Design compilation time and turnarounds	Fast	Moderate	Slow
Hardware Debugging	Excellent - full visibility like a HDL simulator	Excellent - full visibility like a HDL simulator	Poor - on-chip logic analyzer for single-FPGA
Host interface	Yes - interoperability with simulators, virtual platforms, C++/SystemC	Yes - interoperability with simulators, virtual platforms, C++/SystemC	No - has to be custom made
External connectivity	Limited	Excellent	Excellent
Size	Large - very large (not portable)	Small - medium (portable)	Small - medium (portable)

Figure 12: The strengths and weaknesses of traditional emulation and FPGA prototyping with, between them, the clear benefits of a combined emulation/prototyping platform.

As can be seen, an FPGA prototyping platform with emulation capabilities is a ‘best of both worlds’ solution. What’s not to like?

Tags:ARM,ASIC,Emulation,FPGA,SoC,Validation,Verification