Devices like Xilinx Zynq or Microchip PolarFire give us the possibility to mix in a single design a high level OS like Linux and a fast device like FPGA. For signal processing, or even data science, is very interesting to acquire an event at rates of several tens of megasamples per second, and then, hundreds of microseconds later, all this data is available on the DDR to be processed from python or C++ algorithm. This kind of devices allow us to perform processing at the edge. Obviously, the APU inside this devices are limited in terms of power consumption, which is related of the processing power, but in most cases will be enough. On the other hand, in cases where a high performance processing is needed, this devices allow us to preprocess data to improve the transmission to powerful servers. In both cases, we will need to store data on the same board that data is acquired, and the speed of the interface used to transfer data from DDR to the non volatile memory will determine the speed what we will be ready to acquire the next event.
Zynq devices has a lot of interfaces, including gigabit interfaces on the ports GTX, GTH or GTR. These ports can act as Sata interface, USB3.0, PCIe, and is this last one which we will use to communicate an NVME SSD hard disk.
Before start with the design, we have to take care on the SSD we will connect. In this case I will use the Trenz TE0802 board, which has a m.2 connector. For this connector we can find SSD with SATA interface and PCI Express interface. In this case I will use a PCI Express SSD.
Vivado design to this design is very simple, and only will have the Zynq processing system.
On the Zynq configuration, we have to configure as PCIe interface the GT Lane0 on this board. If you are using a different board, you have to verify which lane is used. Also, m.2 connectors can manage up to 4 gigabit lanes, and we only will use one, so the speed both read and write will be decreased at least by 4.
Once we have the configuration done, we have to verify the design, generate the bitstream and export the hardware to use it in Petalinux build.
Once we have the hardware exported, we can build out Petalinux distribution. To enable the packages corresponding with the manage of NVME SSD, we have to modify first the kernel, to do that we have to execute the next command
petalinux-config -c kernel
After a while, a menu will be opened. At that point we have to navigate to
Device Drivers > NVME Support
And select the options NVM Express block device and NVME Target Support.
With the kernel module enabled, we will confgiure the rootfs.
petalinux-config -c rootfs
Now, on the menu, ensure that pciutils is selected.
Filesystem packages > console > utils > pciutils
Now, add fsdisk, blkid and util-linux
Filesystem packages > base > util-linux
Then exit and save.
Finally, we can build and package Petalinux, and, using an SD card, run the board.
Next step is ensure that the SSD is detected by Petalinux by executing the next command
On the bottom of all DDR modules, we will find the disk.
In my case, the disk has a created a partition, but in case of the new disk, we have to create the partition. To do that we will check if the SSD has been detected as block device.
Then, with the name that appears, we have to create a partition.
Type n to create a new partition, then type p, then 1 to create a new primary partition, and the rest will remain as default. Finally, type w to write the configuration to the disk. Now, when we execute again lspci, we can see the SSD and the partition created.
Last we will perform an speed test to our device using the dd command. To perform a write test, the command we have to execute is the next.
time dd if=/dev/urandom of=/dev/nvme0n1p1 bs=2M count=1000
and to read test
time dd if=/dev/nvme0n1p1 of=/dev/null bs=2M count=1000
These commands will return the time spent in the write operation and read operation.
Regarding write operation, the board has spent 32.698 seconds to move 2MB 1000 times, so the speed can be calculate as 1000*2MB/32.698s = 61MB/s
Read operation will spent 9.319 on the same operation so the speed can be calculated as 1000*2M/9.319 = 216MB/s
If we compare these numbers, with the datasheet numbers of the Corsair MP510 (3000MB/s read, 2400MB write), maybe they can be disappointing, but, the SSD will achieve the maximum performance using its 4 lanes. Also, the test I did is done on Petalinux, so the speed will depend on how the operating system manages the read and write operations.
There are several ways to improve this results, but we are limited by the board in some cases, and by the device in others. If we want to keep the device, XZU2EG in this case, we will notice that this device has a total of 4 GT lanes, but this lanes are shared with USB 3.0, Display port and Gigabit Ethernet. If we don’t need this others Gigabit peripherals, we can dedicate all GT Lanes to the m.2 connector, and improve by 4 the performance of the SSD.
The other way to improve the performance is use the gigabit transceivers instead of the PS, of the PL. Xilinx 7 series families has different kind of gigabit interfaces according their speed, and the device what they are connected. In this case, the transceivers that this part has are connected to the PS, so the management are completely done by the operating system, which is a bottle neck in terms of performance. If we take a look to the Zynq UltraScale+ product table, we will notice that devices from ZU4xx has on the PL, GTH and GTY transceivers, that achieve speeds up to 32Gbps. Not only Zynq MPSOC has gigabit transceivers, if we take a look to Zynq SOC product table, we can see that XC7Z012 or XC7Z015 has 4 GTP transceivers with a bandwidth up to 6.25Gbps. In case of use the PL as interface with the SSD, we have to know that gigabit transceivers implement only the physical interface, and we will need an IP to implement the protocol.