Mastering Scalable Radar Signal Processing with AMD Versal Devices

WEBINAR RECORDING ON-DEMAND

Watch Video

Key takeaways

Advanced Signal Processing: Learn how to enhance the efficiency and performance of radar signal processing using AI engines and the Space-Time Adaptive Processing (STAP) algorithm.
Scalable Design Solutions: Understand the challenges and solutions for developing scalable radar systems, including integration with AMD Versal devices.
Phased Development Strategy: Explore strategies for phased development to achieve quick and effective performance demonstrations.
Live Demonstration: Witness practical demonstrations showcasing design and performance metrics in real-time.
Integration and Scalability: Gain knowledge on integrating AMD Versal devices into radar systems for optimal performance and how to scale designs by incorporating additional devices.
AI and MATLAB Integration: See practical examples of how AI engines and MATLAB can be used for advanced radar signal processing.

Transcript

Timestamps:

[2:02] About Fidus
[4:00] Presentation: Mastering Scalable Radar Signal Processing with AMD Versal Devices
[4:26] Targeting AMD VC1902 Device
[4:34] Radar Demonstration Platform
[5:33] Scalable Design Architecture
[6:57] Phase One: Rapid Development
[8:00] Algorithm Implementation- Space-Time Adaptive Processing (STAP)
[9:51]DSP Functions on the CPI Data Cuboid at a glance
[12:02] Space- Time Processor Components
[12:51] System Model
[14:26] AI Engine Matrix Multiplication
[16:08] Design Decision – AI Engines
[17:18] Design Decision – Scalar Engines
[19:03] Hardware in the Loop Weight Application
[19:40] Results of Filtering
[21:13] Live Demo
[ 25:35] Conclusion
[ 26:21] Live Q&A Session

****

[0:00] Introduction

Welcome everyone to today’s webinar on mastering scalable radar signal processing with AMD Versal devices. We’re thrilled to have you here today as we explore advanced techniques and strategies for leveraging the power of Versal in scalable radar signal processing. Whether you’re a seasoned engineer or just starting out in the field, today’s session is designed to equip you with the knowledge and insights needed to enhance your radar signal processing projects’ efficiency and effectiveness. We’re going to be sharing our firsthand experience working with Versal devices to help you achieve scalable, high-performance solutions.

[0:37] Housekeeping Items

Before we begin, let’s go over some quick housekeeping items to ensure a smooth experience throughout today’s webinar. Please note that we will have a live Q&A session at the end, so we encourage you to submit your questions at any time during the webinar using the Q&A button on your control panel. Feel free to type them in as they arise, and we will address them at the conclusion. If you encounter any technical issues or need assistance, just send a message to the chat, and our webinar admin team is ready to help you.

[2:02] About Fidus Systems

Company Overview

So, who is Fidus? We were founded in 2001, and we have more than 150 people. We are a North American electronic system design services company, serving all industries and markets. We have three brick-and-mortar locations: two in Canada and one in the US in Silicon Valley. We’ve been working with FPGA SoCs since 2002, and it is safe to say that more than 80% of our projects have included FPGA content. We do an incredible amount of work with AMD and collaborate on many different projects. We are proud to serve a repeat customer rate close to 95%; our customers return year after year with new projects and new business, a testament to the quality and efficiency of our work in helping improve their time to market.

Our Services

As a full-service electronic systems design firm, our professionals and engineers cover a multitude of service disciplines, including FPGA design, embedded software, hardware, signal and power integrity, ASIC RTL design, and verification. We help our customers by supplying some of these expert skill sets directly or by managing complete projects from start to finish.

[3:12] Introducing Our Speakers

Before we dive into today’s content, I’d like to introduce our distinguished speakers:

Dhimiter Qendri: A Senior Embedded Software Designer at Fidus Systems with extensive experience working on Versal devices. He has contributed significantly to various projects on this device.
Jason Timpe: A Radar/EW System Architect at AMD/Xilinx, bringing a wealth of expertise in radar and electronic systems. He will provide unique insights into the capabilities of AMD Versal devices.
Bachir Berkane: A System and Algorithm Architect at Fidus Systems, and a well-published author on verification, circuit design, and system architecture. He will join us for the Q&A session.

[4:00] Presentation Agenda

Today, we will present a radar reference design for the VCK190 evaluation board with the AMD VC1902 Versal core device. This is a joint design developed by Fidus and AMD. The presentation will cover the following:

Challenges in Creating a Scalable Radar Demonstration Platform: Discussing inherent problems and our approach to the reference design.
Scalable Design: How we implement scalable high-performance solutions.
Phase One Performance Demonstration: Initial development phase and demonstration of phase one performance.
Algorithm Implementation: Detailed look at the space-time adaptive processing algorithm

[4:26] Targeting AMD VC1902 Device

Today we’re going to be presenting a radar reference design for the VCK190 evaluation board with the AMD VC1902 Versal core device. This is a joint design developed between Fidus and AMD. In this presentation, we will discuss some of the problems inherent in creating a scalable radar demonstration platform. And then we’ll go into our approach for the reference design and how Fidus was able to implement the design quickly. Finally, at the end, I will demonstrate the design.

[4:34] Radar Demonstration Platform

One of the problems we have when we talk about demonstrating signal processing capabilities for radar platforms is that most requirements for the design are really focused on the interaction between the radar and the target, how the radar performs in terms of range resolution, and the ability to avoid jamming or the ability to filter out clutter. This defines how the radar performs at the system level. On the other hand, when we talk about demonstrating signal processing capabilities, whether in the effectiveness of our algorithms or in the compute power of a device such as the VC1902, our specification is at a lower level: how much bandwidth can we process, the sample rates we can operate at, the bit widths or data types we can support, and latency. So, if we want to show how changes at the algorithm level are beneficial, we need to tie the performance of the signal processing to improvements at the system level. How we can do this is key to our reference design.

[5:33] Scalable Design Architecture

Secondly, we wanted to have a design that can scale. In this particular design, we are targeting the VC1902 device from AMD. This device is a Versal core device with AI engines on it, so we are going to focus on using them to improve our signal processing capabilities. But we want this design to be able to scale up and down for other devices in the portfolio. For example, AMD has recently announced the VP2502 and VP2802, which are Versal premium devices that also have AI engines. Premium devices have higher I/O bandwidth due to their GTH transceivers than the core devices. From a radar perspective, it allows us to connect to many more antenna elements and pull more data into the device for computation. These premium devices also have a significant amount of DSP58 blocks in the fabric, in addition to the AI engines, so there’s a lot more signal processing horsepower, even compared to the VC1902 and certainly compared to previous generations of Vertex devices. Ideally, we want a reference design that can potentially scale up to take advantage of more powerful devices in the future. On the other hand, we have the AI edge devices, which are smaller with fewer AI engines and are really mid-range devices. We would also like a reference design to be able to scale down so that we could show the performance that can be achieved even in a SWaP-constrained platform.

[6:57] Phase One: Rapid Development

Finally, we really want to be able to show a phased development. For each one of these phases, we want to be able to demonstrate where we are in the development and show that we are meeting our requirements. In the industry today, it is very important to be able to quickly demonstrate new performance capabilities to win that contract and also to show the customer that you’re meeting the milestones, and the design is going to work in the end. Similarly, our approach to the reference design was to do it in phases. Where we are right now in that development is that we’re going to demonstrate phase one performance. The focus for phase one was on rapid development that would create a quick demonstration platform that would be able to show the capabilities of the hardware and show that our algorithm is going to perform as expected. Also, to set up solving the various problems that I outlined in the first two sections.

In phase two of the design, we want to increase the capabilities of the system, extend the algorithm, and build a more fully functional system. As we move on to later phases, we can add more and more capability into our system. But at each step, we want to be able to demonstrate that performance and show our customers that we’re moving in the right direction.

[8:00] Algorithm Implementation- Space-Time Adaptive Processing (STAP)

The algorithm that we chose to implement is space-time adaptive processing for radars. If I have a radar on the ground, and I’m trying to detect a moving target, then reflections back from the stationary environment are clutter that is going to cloud my perception of the target. But because the clutter is not moving, I can use Doppler analysis to filter out anything that has zero motion. And so, it is easy enough to remove that clutter from the scene and focus on moving targets. This becomes more challenging if the radar is instead installed on a moving platform, such as an aircraft, which is looking for moving targets on the ground. Now, because of the motion of the platform itself, the background is no longer stationary but has a relative motion. So, the ground in front of me is going to look like it’s moving towards me. And the ground behind me is going to look like it’s falling away from me. My clutter return has an angle-dependent Doppler shift. So, I want to do my filtering in both space and time. I want to do my beamforming in the spatial dimension. But I also need to account for this Doppler shift, which is also related to the angle.

On the left, you can see a return without the STAP processing, where the return from the clutter is because of the smearing of the radar pattern. On the right, you see an ideal filter that would not allow that clutter, giving us a better pattern. Notice that we have also introduced a jammer into the system. This is a barrage jammer, so you can see that it is broadcasting across a range of frequencies, but it is at a fixed location relative to my platform. And so, we would also like our STAP algorithm to be able to filter out that jammer if possible. Ideally, our space-time adaptive processing will create this filter and then apply it so that I remove these interfering signals and the clutter and focus on my target. This processing also has to be adaptive because as my platform moves or as the environment changes, the specific filter that I need will change. So, the algorithm has to adapt as time goes on.

[9:51]DSP Functions on the CPI Data Cuboid at a glance

When we look at how to do space-time adaptive processing, basically we’re going to create a radar data cube with the number of antennas in one dimension, which is our spatial dimension. The second dimension is the number of range samples. So, from a given antenna, I send out a pulse and I sample the return, and that provides the range to the target. Then I’m going to use a number of coherent pulses to perform my data processing. So, this provides the third dimension of my cube. If we look at the data cube, our STAP processing is going to filter in the spatial and Doppler dimensions, as shown by the green slice in this diagram. But there are many other types of radar processing that all use different portions of the same data cube. So, in general, we can create a radar architecture that gathers the data cube and then selects some subset of that data cube and does processing on it. Then I’m going to use the information from that process to make some decisions about what I do next at the overall system level. The architecture is well suited to several different DSP functions that we can perform on our radar data. That architecture speaks to the scalability of our design and our ability to do different things. In our reference design, even though we’re focused on space-time adaptive processing, we want to structure our design in such a way that it can easily be adapted to other types of processing as well. And again, when we talk about scalability and adding performance, we can think of how a device like the VP2802 would allow us to add additional antenna elements or possibly operate at a higher sample rate. In both cases, this is just growing the size of the cube, and so it is easy to modify our design to handle the larger cube and understand the data and computation requirements needed to do this. So, with this architecture, we see that an ACAP is very well suited to this. There are embedded memory controllers and hardened memory controllers that allow us to collect the data cube, we have the AI engines to do the signal processing that we need, and then we have the processing system for decision-making based on our results.

[12:02] Space- Time Processor Components

Looking in more detail at the space-time adaptive processing and what it’s doing, we see that it takes several of the other slices and uses them to calculate the effect of the environment on our radar by calculating the covariance matrix. We can then use this along with our steering vector to create the weights which will undo the effects of the environment and leave only our targets. We apply this filter to the slice of interest to remove the clutter and jammer and reveal the target.

I mentioned earlier the phased approach. For phase one, we focused on the weight application portion of the algorithm. This is just matrix multiplication. So, we know that the AI engines in the VC1902 are very good at doing this. So, it’s a good way to see some quick benefits in our design. This serves as a good starting point for trying to leverage the power of the AI engines to do the space-time adaptive processing. For phase one, we only implemented the weight application and the data selection, with the weight calculation being done externally in MATLAB.

[12:51] System Model

I mentioned at the beginning how we wanted to tie our algorithm performance up to system-level performance. In order to do that, we have to be able to model the environment and the rest of the system to test our implementation within the context of the larger environment. A good way to do that is to leverage the STAP example design from MathWorks. We took that design as our starting point because MATLAB has modeled the transceivers, the antennas, the platform, the jammer, and the clutter. This allows us to focus on the signal processing algorithm that we want to implement into the VC1902. At the system level, we can make adjustments to the model to fit the system specifications that we have to meet. Then we can tune the algorithm in MATLAB to make sure that we’re meeting our system-level performance. When we take the algorithm and we implement it on the AI engines, we’ve got something to compare against, and a way to verify that the implementation in hardware is going to behave correctly at the system level and achieve the desired performance. Likewise, we can iterate this process to make the problem more difficult and gradually improve our system, demonstrating each step along the way how our changes at the hardware or algorithm level are translating to system-level improvements.

To start with, on the right, you can see the parameters that we’re implementing. Initially, we’re just building off of the MathWorks STAP example. So, it’s a pretty small example, but it does give us a quick way to get started as part of our phase one design. With that, I’m going to turn it over to Dhimiter to provide some more details on the design and how it works.

[14:26] AI Engine Matrix Multiplication

Thank you, Jason. The main task implemented during this phase was the application of the STAP weights. This operation requires the multiplication of the conjugated complex weight vector with a sub-cube. Recall that the cube dimensions represent six antennas, ten pulses, and 200 samples. A conjugated weight vector’s dimensions were reshaped from a 6×10 element vector to a 60×1 element vector. The sub-cube dimensions will also be reshaped to a 200×60 matrix for each sub-cube slice. This complex matrix multiplication was partitioned between ten AI engine kernels running on separate tiles concurrently. The result output from each kernel is a single vector with dimensions of 200×1 elements. The kernel results were then sent to the PS via GMIO and concatenated into a single vector. This vector was packetized and sent to the MATLAB client via TCP/IP.

To expedite the development of the sub-matrix multiplication for the filter application, the Level 2 General Matrix Multiplication AI Engine Kernel from Vitis DSP libraries version 2021.1 was used. From a software architecture standpoint, there are two main areas regarding design decisions: algorithm partitioning and data movement methodology. Initially, we started with the implementation of a custom kernel that used the AI Intrinsic API. Due to project timing constraints, it was decided to leverage the Vitis DSP libraries’ General Matrix Multiplication kernel and adapt that to the application at hand. The current implementation of the General Matrix Multiplication has constraints on the dimensions of the input matrices A and B. This kernel does not support vector-to-matrix multiplication. To get around this constraint, the weight vector was duplicated to form a 60×2 element matrix that conforms with the input matrix kernel requirements.

[16:08] Design Decision – AI Engines

Regarding data movement, there are two interfacing methods to the AI Engine kernels: PLIO and GMIO interfaces. When using IP blocks on the PL side that communicate with the kernels using AXI-Stream interface, the PLIO interface can be used. The other interface is the GMIO interface, which uses memory-mapped attributes that make connections from the PS DDR memory to the AI Engine data memories directly via the Network on Chip (NoC) without going through the PL side. In this case, it was decided that due to the relatively small size of the cube, the GMIO interface was optimal for the scenario under consideration.

[17:18] Design Decision – Scalar Engines

Let’s take a quick look at the host user space software architecture. It leverages the Boost asynchronous I/O libraries to implement a TCP server that communicates with a MATLAB TCP client program running in the simulation. The aim of the MATLAB script line is to send the weights and STAP cube data as well as receive the STAP results from the applied weights. To accomplish this, a custom serial framing protocol was implemented to send the STAP cube and weights as well as receive the incoming results from the AI kernels. The incoming data is parsed based on framing headers responsible for demarcating the sub-cube frames and weights. The outgoing resulting data is also framed and sent via TCP/IP to the MATLAB client for comparison.

On startup, the host application takes a command-line configuration for TCP/IP port assignment. Once the TCP server running on the Cortex-A72 application processor is ready, it waits for a connection from the TCP MATLAB client. The cube frames and weights are then received, parsed, and written to the GMIO-allocated memory. The next step is setting the graph. Once the STAP results are available, the data from the ten kernels is concatenated, framed with appropriate flags, and sent back to the MATLAB TCP client for comparison with the MATLAB golden reference design.

[19:03] Hardware in the Loop Weight Application

This solution was implemented as a hardware-in-the-loop application with a virtual VCK190 development board running the weight application based on data received from the live Simulink model running on the MATLAB side. Both sub-cube data and weights are single-precision complex floating-point numbers created on the MATLAB simulation. The data is received via TCP/IP, the packet is saved in a pre-allocated section in DDR memory, and both the space-time matrix and weights are sent to the ten AI engine matrix multiplication kernels running concurrently. The engine graph is initialized, and once the computation is completed, the resulting vectors are then sent back to the PS DDR section via GMIO. The final step is re-organizing the results and sending those back to the MATLAB client side for comparison with the golden ADPC MATLAB model. Back to you, Jason.

[19:40] Results of Filtering

Thank you, Dhimiter. These pictures show the expected performance of the STAP algorithm. What we see on the left is the return from the six antennas on a single pulse before the STAP filter is applied. In our model, the platform is flying at 1,000 meters, and so most of what we see is ground return, and we cannot distinguish the target, which is located at 1,732 meters. However, on the right, we see the result of the application of our STAP filter, which is the combined return that should very clearly show the target at 1,732 meters.

With that, I will run the demonstration. I have a VCK190 evaluation card here with a serial connection to my PC, so we will watch it boot up. The Fidus design is on the SD card installed on the VCK190, so that is how we’re booting the VC1902. Once it boots up, we can configure the Ethernet so that MATLAB will be able to connect to the board.

[21:13] Live Demo

Now we start the host program that runs on the processing system of the VC1902. Once we do this, the host program is waiting for the data cube and weights to be sent from MATLAB. I also have MATLAB running. We have a script in MATLAB that is going to set up the Ethernet connection to the board. Then, it will start the Simulink block diagram. It will then run the Simulink design to generate the radar data. The Simulink design is simulating the transmission of the ten coherent radar pulses, as well as calculating the return reflecting off the clutter and the target. It will also simulate the interference from the jammer. It will model the behavior of our six antennas and create the data cube needed for signal processing.

Once the data cube is constructed, MATLAB also calculates the weights through its own STAP algorithm. Then the weights as well as the data cube are sent via Ethernet down to the VCK190 board. Then, as we said, the algorithm running in the AI engines takes those weights that were calculated by MATLAB, takes the data cube, and applies the weights to the cube, calculating the single return. Once that return is passed back to MATLAB through the Ethernet, MATLAB can now compare that return to the return it calculated using its own STAP algorithm. So we have overall system-level performance that we are modeling in Simulink, and we can show that the algorithm running on the hardware meets all our requirements.

The goal in later phases will be to scale this design up to be able to perform STAP on larger data cubes and also to do the weight calculation in the hardware as well. One of the downsides of our current approach is that the Simulink design does take a while to run, so we’re limited in the number of data cubes that we can process because of that. In the future, we will create data cubes in advance and then push that data through the Ethernet quickly, rather than running the Simulink design every time.

What we see in the results is the raw return here, where you can see the interference from the clutter and the jammer. You can also see over here the weight filter calculated by MATLAB. We can see in the upper left the output of our STAP algorithm. The red is the data that came back from the AI engines, and the blue is the data that was calculated by MATLAB. When we look at the error, we basically have zero error between these two, so it indicates very clearly that our algorithm as implemented in the AI engines is performing correctly. So as we modify our system design or modify our algorithm in MATLAB, we can verify that our implementation in hardware continues to track the performance. Even if we don’t modify the algorithm in MATLAB but instead make improvements directly in the implementation, we can still test against various scenarios as modeled by our Simulink design as we change the behavior of the target, the clutter, or the jammer. Maybe add more jammers to make sure that we are going to meet our system-level specifications.

[ 25:35] Conclusion

Alecia: Thank you, Jason and Dhimiter, for that comprehensive presentation. Now, let’s welcome Bachir Berkane to join us for the live Q&A session. We’ve already received some questions from the audience. Please continue to submit your questions using the Q&A function.

[ 26:21] Live Q&A Session

Question 1

David: Could you elaborate on the status of subsequent phases where all elements of STAP are implemented in the AIE engine? Also, whether the full or reduced STAP is considered.

Bachir Berkane: I can take that one. As mentioned, the implementation of the STAP AD PCA pipeline was completed in a phased approach. In the first phase, the main deliverable was the application of the weight vector to a slice of the STAP cube to obtain the statistics used to determine the target presence. For the second phase, the complete STAP algorithm was implemented in the engines, and during Phase Three, a polyphase analyzer bank was also implemented. The main aim of the STAP algorithm is to identify targets in the presence of clutter and jamming. The first phase focused on the weight vector application, which required applying the weight vector to test cell data to obtain test statistics. This was implemented using the AI engines on the Versal VC1902 device. For Phase Two, we implemented the full STAP algorithm, focusing on the low complexity, low precision AD PCA, which is a reduced-rank method.

Question 2

Audience Member: Could you walk us through the implementation of the Phase Two data path from a high level of abstraction?

Bachir Berkane: In Phase Two, we implemented the full STAP AD PCA, focusing on low complexity and low precision. All operations are performed in floating-point. We use C16 for synthetic data generation in MATLAB, converting just before operations. We implemented the complete pipeline, including covariance matrix calculation, matrix inversion, and weight application, using AI engines.

Question 3

Audience Member: What part of the AD PCA pipeline was offloaded to the AI engines on the Versal VCK1902?

Dhimiter Qendri: We implemented the complete STAP pipeline on AI engines. Our estimates showed around 460 DSP slices would be needed for similar operations on the programmable logic side. AI engines offer an advantage by running at 1.25 GHz, reducing concerns with timing closure issues.

Question 4

Audience Member: How do AI engines stack up compared to DSP58 blocks on Versal for this application?

Dhimiter Qendri: AI engines offer significant advantages over DSP58 blocks for this application. The AI engines provide efficient computation and can handle complex operations at high speed. They run at 1.25 GHz, which reduces the effort required for timing closure compared to using DSP58 blocks. For the phase two implementation, we leveraged AI engines to perform the entire STAP AD PCA algorithm, including matrix multiplications, covariance matrix calculations, and other necessary operations.

[43:40] Alecia: Thank you for your detailed answers. This concludes our Q&A session. The recorded presentation will be sent via email to everyone who registered. For further questions or consultations, please visit our website at fidus.com and book a meeting to discuss your upcoming projects. Thank you for attending, and we look forward to working with you.

Featured Speakers:

Dhimiter Qendri: A Senior Embedded Software Designer at Fidus Systems with extensive experience working on Versal devices. He has contributed significantly to various projects on this device.

Jason Timpe: A Radar/EW System Architect at AMD/Xilinx, bringing a wealth of expertise in radar and electronic systems. He will provide unique insights into the capabilities of AMD Versal devices.

Bachir Berkane: A System and Algorithm Architect at Fidus Systems, and a well-published author on verification, circuit design, and system architecture. He will join us for the Q&A session.

Additional Resources

Expand your knowledge with these additional resources from our website:

Build for Zynq UltraScale+: Learn how Fidus can help you leverage the advanced capabilities of Zynq UltraScale+ for your projects, ensuring optimized performance and reduced development time.
Build for AMD Versal Adaptive SoCs & FPGAs: Discover our specialized services for AMD Versal Adaptive SoCs and FPGAs, designed to enhance your design efficiency and overall project success.
AMD Partner Page: Explore the benefits of our strong partnership with AMD, providing you with cutting-edge solutions and support for your FPGA and SoC designs.
A Comprehensive Guide to Versal™ FPGA Platform: Gain in-depth knowledge of the Versal FPGA platform and its applications.
FPGA: Accelerating Innovation in Technology: Read about how FPGAs are driving innovation across various industries.
The Role Of FPGAs In AI Acceleration: Discover the critical role FPGAs play in accelerating AI and machine learning applications.

Partners