Fidus Systems is an SDSoC development environment-qualified Xilinx Alliance Member, and a Xilinx Premier Design Services member, offering electronic product development and design services.
Below are the candid memoirs of Dessislav Valkov, Fidus Team Leader. Enjoy!
Use the SDSoC development environment to move an open source AES-256 encryption algorithm from ‘C’ into hardware, to facilitate the comparison of software execution time vs hardware execution time on Xilinx’s Zynq® SoC.
Avnet® Zedboard™ (Note: although Zynq contains dual ARM® Cortex™-A9 cores, I only made use of a single core, running at 667MHz)
OS: Windows 7 (see story)
Software: Xilinx SDSoC 2015.2 development environment
A few years ago we had the chance to work with, what was at the time, a brand new tool from Xilinx called HLS (High Level Synthesis). The goal of HLS was to compile native C/C++ to synthesizable HDL, thus enabling software developers to take advantage of the benefits of FPGAs. Various companies had tried making these types of tools in the past, albeit with mixed success. We concluded that HLS worked well, although it still had a steep learning curve for a software developer, and thus, typically still required some hand holding by the FPGA specialist.
Today, SDSoC has embedded the design flow into an Eclipse based IDE thus allowing software designers to target hardware in a familiar and much more abstracted environment. The tool can also intelligently partition the algorithm into software and hardware, then select the interfaces between the application and the translated HDL functions, and finally, automatically build a Linux or bare metal (just to mention the big two) SD card image. All of this strives to make the whole design process seamless to a SW developer.
“Here’s what I did”–
1. Obtained and installed SDSoC 2015.2 from Xilinx
a. It requires a separate license which had to be installed as well using the standard Xilinx license manager.
b. Worth mentioning that the tool seemed to have some stability issues on RedHat 6.6 and so with great reluctance I had to install it on Windows 7 where it was performing as expected. It is a brand new tool and that probably makes some sense, although I was expecting it to be the opposite.
2. Configuring the environment is straight forward and similar to the other Eclipse based SDK tools from Xilinx.
a. First I had to configure the Linux TCF Agent to connect to my Zedboard IP; establishing the debugging communication channel.
b. After that, I had to configure the debug configuration with the TCF Agent, and the local and the remote .elf file location. Interestingly, sometimes the tool found the three settings automatically, but most often I had to do it manually. I also noted that the Zedboard DHCP always picked an IP already in use my some other machine, so I had to assign it manually after each reboot. I didn’t look into this, so it’s probably just me.
3. After experimenting with some of the example projects provided with the tool, I was ready to tackle the mission code. To be fair, it was a really refreshing experience. Everything worked as promised in the three YouTube tutorials (see links below). How often does that happen?
4. It was decided that we should try to optimize the same code we tried back then on Vivado HLS. After downloading the freely available AES-256 from the web, coding a simple top level calling function, and trying to compile it with SDSoC, it became clear that there are some new rules which should be followed. The pure C++ compilation was completing without any errors, but when I assigned the AES-256 function to be implemented in hardware, SDSoC complained about a couple things:
a. It could not find the body of the AES function, because it was in a separate file from the calling function. Obviously SDSoC wants to have all functions dedicated for implementation in HW in the same file. An easy fix.
b. Next, SDSoC didn’t like the function parameters which were declared like pointer to arrays:
void call_aes_rtl(uint8_t * key, uint8_t * message, uint8_t * cipher);
This is understandable since a function implemented in hardware must have rigidly defined parameters passed back and forth, since hardware cannot accommodate on the fly the pointers to potentially different sized arrays. The C++ compiler didn’t have problems deriving the array sizes from the code, but SDSoC compiler needed something more explicit in the parameters declaration:
void call_aes_rtl(uint8_t key, uint8_t message, uint8_t cipher);
Although to a software guy this might not be a very common way of passing arrays, this was exactly what was needed. Thanks to the example projects it was easy to find out what the tool was expecting.
Compiling HLS can be quite involved. For example, back in the day, we had to define the function’s parameters as ports using the special HLS #pragma properties. This told the HLS compiler exactly how to implement every parameter as a port – Master/Slave port, AXI-Lite, AXI-FIFO, AXI-ACP, etc. SDSoC can also use the #pragma for fine tuning, but even without additional effort it immediately recognized the ports and picked the best fit. In our case, SDSoC picked AXI-FIFO for each one of three ports, since the three ports had to transfer arrays of 32 elements each. I was relieved how well SDSoC completed this task.
c. Running SDSoC is quite intuitive, and software like, in the way that it handles debugging, code stepping, and variable updates, on the active platform. In addition to the standard SDDebug and the SDRelease configurations, Xilinx have added a new one called SDEstimate. SDEstimate can offer insight into the speed improvements one could expect by pushing a function into hardware, prior to undertaking the actual HDL compilation, testing, and, benchmarking.
d. To be fair, as an HDL designer I could not stop myself from optimizing the C code just a little. In the original code, the AES function was working with the three arrays directly in the memory, with many reads and writes occurring during the message encryption/decryption cycles. My background told me that when moved to hardware these unnecessary accesses over the AXI interfaces will be very detrimental to the total performance, so I decided to copy the three arrays locally to the AES function, thus limiting the access over the AXI-FIFO interface only to the initial vectors loading and result unloading – three arrays of 32 elements each.
e. Then I had to copy the already prepared SDcard image containing a light Linux distribution, together with the files needed to run my Linux application. Just drag and drop the SDcard folder to the SDcard, insert the SDcard to the Zedboard, power it and see Linux booting on the COM port (configured 115200,8,1,N).
For Linux to boot on a Zedboard in the pre-SDSoC/early HLS times – I had to do quite a few things manually. First, generate the device tree. Then clone Xilinx Linux Git repository, and configure and build the kernel. Then clone the Buildroot/BusyBox Git repository and configure and build the file system with the applications we might want to use. And not to forget packaging our C++ application binary in the file system. Configure and compile u-boot bootloader. Now, with SDSoC, all of this is just copy and paste to the SD card. Compared to all this SDSoC saves a lot of time and typing. Not to mention that the stock SDSoC Linux distribution comes with persistent file system, SSH, and a CGI-Perl web server, which is so handy.
5. With the system now running, I was able to quickly figure out that the SDSoC targeted hardware was running 7x faster, at only 143MHz in the programmable logic fabric, compared to the algorithm in software on the 667MHz processor. Check out the benchmarks below.
AES-256 in SW:
Delta time 3752 us = (new timestamp 555258 us) – (old timestamp 551506 us)
Calculated cipher \„NÔo˜^]jO”Ç×
AES-256 as HDL:
Delta time 550 us = (new timestamp 555840 us) – (old timestamp 555290 us)
Calculated cipher \„NÔo˜^]jO”Ç×
And frankly, this was purely a software coded AES-256 algorithm compared to the identical code implemented in the programmable logic, with near zero design effort, and definitely no significant attempts at internal algorithm optimization. Pretty powerful stuff. Pretty cool too.
This 7x optimization in speed is significantly better compared to the original HLS-only implementation improvement from a few years back. In both cases the C code interface was handled with as minimal effort as possible and with no further attempts to improve performance with directed HLS #pragma driven optimizations.
SDSoC was really quick and easy: It took me 3 days to implement in hardware an pre-existing AES-256 algorithm written in C++. Really though, most of that time time was spent on learning SDSoC features and configurations, which thankfully were very consistent with the other Eclipse based tools. Not bad for a newbie. SDSoC does seem to be all that!
1. Looking ahead:
a. I want to check out if I can get things running even faster! Up the fabric speed, optimize, etc.
b. I want to see if SDSoC supports Partial Reconfiguration (PR) and Isolation Design Flow (IDF) using Xilinx IVT tool (http://www.xilinx.com/applications/isolation-design-flow.html). If so, things should be way easier than before. Before SDSoC we used Vivado HLS to implement different C++ HLS and RTL code in the same reconfigurable partition with the twist of IDF. It was quite doable, but not trivial. Today SDSoC offers the whole Vivado IPI project, another very nice and handy feature which hopefully simplifies PR and IDF flows as well.
2. On the videos above, you can see that there are example projects using the famous OpenCV libraries (http://docs.opencv.org/doc/tutorials/tutorials.html). OpenCV basically gives you the power of processing images and video in a standard and rich framework. Really impressive stuff. And in the earlier versions of the tool there were example projects using these libraries, but for some reason they were removed from the standard distribution of SDSoC. This is my Christmas wish – Xilinx, please, put them back in.
About Fidus Systems
Fidus Systems provides Electronic Product Development and Consulting Services to companies across a wide range of industries. Focusing on high-speed, complex designs, Fidus enables your success with multiple design centers, a large full-time staff, and flexible business models.
Fidus provides high-speed, high complexity, electronic product development and consulting services across a wide range of industries. As a Xilinx® Alliance Program Premier Design Services Member, Fidus designs Xilinx solutions to enable customer products. Xilinx highlights: Vivado®, 7-series, Zynq®, Partial Reconfiguration, HLS, SDR, mixed signal, JESD204B, 4k+ video/broadcast, emulation, and FMC development.
By leveraging in-house expert knowledge, and utilizing industry leading tools, Fidus delivers excellence in Hardware, FPGA/DSP, Signal Integrity, Embedded Software, RF/Wireless, and PCB Layout. Fidus is proud to be selected and recognized as Premier Design Services Member for Xilinx North America.
Since 2001, Fidus has delivered over 1000 products/projects for more than 300 customers.