How To Implement A Real-time Human Detection Application at the Edge Using Zynq UltraScale+ MPSoC Device

Date: Jun 5, 2020

Type: In the News

Henderson, Nevada, USA – June 5th, 2020 – The ability to perform real-time, low-latency and deterministic processing at the edge is increasingly important for a range of applications, from autonomous vehicles to vision guided robotics and intelligent surveillance systems.

Processing at the edge is required for four main reasons availability, latency, security and determinism. Should wireless communications be used to communicate to and from a cloud service where the processing is performed, connection to the cloud service cannot be guaranteed. As you might experience service outages and/or signal blackspots created by buildings or natural vegetation. Of course, processing time and decision making for sensitive data in the cloud will also increase the latency and decrease the determinism of the response making it unsuitable for real time safety critical decisions.

Edge processing addresses the availability, latency and determinism challenges. However, it can present additional challenges as, normally, the computational power available at the edge is much lower than is available in the cloud.

In this article, we will address the low power and high performance challenges of an edge processing system by implementing a real time human detection application using a Zynq UltraScale+ MPSoC device on an Aldec TySOM-3A-ZU19EG embedded development board.

Zynq UltraScale+ MPSoC Heterogeneous System on Chip (SoC) Device

Using heterogeneous System on Chip (SoC) devices, like the Zynq UltraScale+ MPSoC, enable the user to address the challenges of implementing low latency, deterministic processing at the edge. Unlike traditional processor-based solutions, heterogeneous SoCs are divided into two elements; a processing system which contains high performance ARM processor cores and programmable logic which provides structures based on the latest Xilinx FPGA fabric.

The flexibility provided by the programmable logic IO structures enables the implementation of any-to-any interfacing. This frees the designer from the IO constraints enforced by application specific standard part (ASSP) devices. For example, thanks to programmable logic’s IO flexibility, several MIPI interfaces can be implemented both as TX and RX and supporting a range of data rates, data lanes and data types. The flexible implementation of this logic enables the solution to be upgraded as sensor technology evolves.

Outside of the programmable logic IO structures the parallel nature of programmable logic itself enables the implementation of a true image processing pipeline. This image processing pipeline will be implemented in parallel due to the massively parallel nature of the programmable logic. This means the image processing pipeline is implemented internally to the programmable logic, removing the need to use DDR memory to store processing stages. This reduces latency and increases the determinism, as there is no need to compete for access to a system resource in DDR.

When developing solutions for a heterogeneous SoC we can leverage embedded Linux running on the software processing side of the device, while development of the programmable logic can take advantage of vendor-provided IP and increasingly high level synthesis; which allows C/C++ to be implemented in programmable logic.

One of the more exciting total system solutions is the use of system optimizing compilers such as SDSoC and Vitis which enable functions to be accelerated from the processing system to programmable logic leveraging the OpenCL framework. This enables system architects to further leverage the programmable logic to accelerate algorithms, this is especially true for accelerating Deep Neural Network applications thanks to the Xilinx Deep Learning Processor Unit (DPU).

For the rest of this article, please visit Medium.com.