Simulating DSP algorithms using Verilog.

When you think about DSP algorithms, sure you think of applications such as MATLAB, Octave or Python. In the same way, when you think about simulating HDL, to your head comes words like Verilog or VHDL, but, what happens when we mix these worlds? When we have to simulate a DSP algorithm, implemented into an FPGA, the fact that for the simulation we are going to need analog and floating point signals, could make us search complex, or at least more complex, test engines like cocotb or UVM, assuming that simulating this kind of algorithms in raw Verilog is difficult or simply, Verilog has not enough power, which is completely false. In my job, I follow the maxim Keep it as simple as possible, so, if we know how to use Verilog, or VHDL, we are going to try to avoid the use of other different tools, and in the case of DSP algorithms, Verilog has enough power to generate floating point signal, calculate sines and cosines, and also compare signals. In case of we need to calculate an IIR filter or a FIR filter, although Verilog can also calculate them, they are based on sines and cosines, I use to add Python to my test benches because of how simple is modify and manage different kind of files.

Verilog is not MATLAB, which is obvious, but we can declare real signals in a testbench and also calculate trigonometrical expressions like sine using $sin and cosines using $cos, which are the base of DSP. Signals like the sampling frequency fs, or the base frequency for the test signal f0 and even the test signal signal will be declared as real. Also, since the signal will be based on sine and cosine signals, we need to declare an angle signal, and also an angle_increment signal, which will be used to increase the angle at a speed related to the base frequency and the sampling frequency. In the code below also you can see a signal called prescaler that is used as a prescaler between the simulation time and the sampling frequency.

Once we have all the signals declared, we are going to initialize all the signals and calculate the values for all of them. The prescaler signal is the relation between the frequency of the simulation, defined in the timescale, and the sampling frequency, defined in fs. Regarding the angle_increment, it can be calculated as $\pi$ divided by the number of samples per period.

Each point of the test signal is calculated inside a forever statement with a delay corresponding with the prescaler. In addition to the points of the real signal, inside the loop is also assigned the digital signal, in this case, signal_q15, an 18-bit signal with a Q15 format, and also a data valid signal is set, and then, one clock cycle later is cleared. I

real f0; /* fundamental frequency of the signal */
real fs; /* sampling frequency */

real nsamples; /* number of samples per sampling period */
real angle; /* angle of the signal */
real angle_increment; /* angle increment according sampling frequency */
real signal; /* signal to use */

real prescaler; /* number of ticks per sampling period */

reg [17:0] signal_q15; /* signal in 18 bit q15 format */
reg signal_dv; /* signal data valid for the filter */

/* stimulus generation */
initial begin
  /* initial conditions of the test */
  f0 = 30000;
  fs = 500000; /* 500ksps */
  signal = 0;
  angle = 0;

  prescaler = 1/(fs*0.00000001); 
  nsamples = fs/f0; /* number of samples */

  angle_increment = 2*3.14159/nsamples; /* angle increment per fs period */

  signal_dv <= 1'b0;

  forever begin
    
    #(prescaler-`clkcycle)
    angle = angle + angle_increment;
    signal = $sin(angle) + $cos(angle*3 + 2) + $sin(angle*4+0.2);

    signal_q15 = signal * (2**15);
    signal_dv <= 1'b1;
    #(`clkcycle)
    signal_dv <= 1'b0;

  end
end

To help in the test bench development, I use to declare at the top of the code some definitions with the most used delays. This helps me if we need to change the timescale, or the clock frequency.

`define delay_1ms 100000
`define delay_10ms 1000000
`define delay_100ms 10000000

`define clkcycle 2

When we have the stimulus signals declared, now it’s time to execute the test. In this case, the test is manually checked, this means that the correct behavior of the DSP algorithm will be check by us, so the test flow only will have the declaration of the .vcd file, followed by a delay which is the duration of the test, and finally a $finish() statement to finish the test.

/* test flow */
initial begin
  
  /* create vcd file and save all signals */
  $dumpfile("test_result.vcd");
  $dumpvars();

  #(`delay_100ms);

  $finish();
end

Now, I have added a module that implements a 16th-order FIR filter. You can find this code on Github. This filter has an AXI4 Stream interface and uses only one multiplier using folding, so we need to take care of the ratio between the clock frequency and the sampling frequency. You can find more information about these kind of filters here.

axis_fir1_16_v1_0 #(
.inout_width(18),
.inout_decimal_width(15),
.coefficient_width(17),
.coefficient_decimal_width(15),
.internal_width(18),
.internal_decimal_width(15)
) dut (
.aclk(aclk),
.resetn(resetn),
.s_axis_tdata(signal_q15),
.s_axis_tlast(1'b0),
.s_axis_tvalid(signal_dv),
.s_axis_tready(),
.m_axis_tdata(signal_filtered),
.m_axis_tlast(),
.m_axis_tvalid(),
.m_axis_tready(1'b1),
.b0(),
.b1(),
.b2(),
.b3(),
.b4(),
.b5(),
.b6(),
.b7(),
.b8(),
.b9(),
.b10(),
.b11(),
.b12(),
.b13(),
.b14(),
.b15(),
.b16()
);

usually, when we are simulating our HDL code, we are in a design phase of the project, so it is common test the algorithm in some differnets situations. In this case, I want to test the code of the FIR filter with different sets of coefficients. To perform this test I have two different options. The first one, change the set of coefficients manually for each test, or the second one, use an external script that modify the coefficients automatically for different filter calculated. This second option is my preferred because we can test many different situations spending less time.

For this test, I am going to use Python to generate the set of coefficients, and then replace in the test file the coefficients. To do this, first, we need to add a mark in the test code to help the Python script to find the code that has to replace. I have used the mark /* !COEFF_INIT */ to indicate that the coefficients start here, and the mark /* !COEFF_END */ to indicate that the coefficients end here.

axis_fir1_16_v1_0 #(
.inout_width(18),
.inout_decimal_width(15),
.coefficient_width(17),
.coefficient_decimal_width(15),
.internal_width(18),
.internal_decimal_width(15)
) dut (
.aclk(aclk),
.resetn(resetn),
.s_axis_tdata(signal_q15),
.s_axis_tlast(1'b0),
.s_axis_tvalid(signal_dv),
.s_axis_tready(),
.m_axis_tdata(signal_filtered),
.m_axis_tlast(),
.m_axis_tvalid(),
.m_axis_tready(1'b1),
/* !COEFF_INIT */
.b0(-17'd100),
.b1(-17'd164),
.b2(-17'd221),
.b3(17'd0),
.b4(17'd837),
.b5(17'd2394),
.b6(17'd4340),
.b7(17'd5983),
.b8(17'd6628),
.b9(17'd5983),
.b10(17'd4340),
.b11(17'd2394),
.b12(17'd837),
.b13(17'd0),
.b14(-17'd221),
.b15(-17'd164),
.b16(-17'd100)
/* !COEFF_END */
);

Now, in Python we are going to calculate the FIR filter using the firwincommand from the signal package. The coefficients are stored in the variable taps, and later they are codified in the expected format, 18-bit Q15.

# define filter
n = 16

fs = 500e3
fn = 100e3

wn = fn/fs

# compute filter taps
taps = firwin(n+1,wn)

# quantize filter taps
width = 17
decimal_width = 15

taps_q = (n+1)*[0]

for index in range(0,n+1):
    taps_q[index] = int(taps[index] * 2**(decimal_width))

Now, we are gong to generate the Verilog code that we are going to use in the testbench. In this code we also have to add the marks because later, when we replace the text, these marks will be replaced.

# create verilog code
code = "/* !COEFF_INIT */\n"

for index in range(0,n+1):
    if taps_q[index] < 0:
        code += f'  .b{index}(-{width}\'d{-taps_q[index]})'
    else:
        code += f'  .b{index}({width}\'d{taps_q[index]})'
        
    if index != n:
        code += ','
    code += '\n' 
code += "  /* !COEFF_END */"

Now, I like to create a backup file of the code with the extension .bak. Then, using the command re.sub, we can find the marks in the file, and finally, replace the text and generate a new file with the new coefficients.

file = "../test/test.sv"

# Backup file generation
os.rename(file,f'{file}.bak')

f = open(f'{file}.bak', "r")

# Read the file
file_content = f.read()
f.close()

# Replace the content
file_content_new = re.sub('/\* !COEFF_INIT \*/?(.*?)/\* !COEFF_END \*/', code, file_content, flags=re.DOTALL)

# Rewrite the file
f = open(file,"w")
f.write(file_content_new)
f.close()

Now, when the testbench is complete, here you can find the script run.sh that executes the test using the simulator Icarus.

iverilog ../test/test.sv ../source/axis_fir1_16_v1_0.v  -s test -o ./sim.vvp
vvp sim.vvp
rm ./sim.vvp

When the test finishes, the .vcd file can be opened with GTKWave

In this article, we have seen how using “only” Verilog, we can generate real signals with different harmonics, and also a way to test different configurations of the filter using Python. Is important to say that I am not saying that you don’t have to use tools like UVM or cocotb, or that they are useless, I am saying that you don’t need them to simulate this kind of modules. If you are feeling comfortable using these tools, go ahead, but make it because you like it, not because you need it. In my case, I am very comfortable with Verilog, and I have learned how to extract all we need from Verilog, many times checking AMD documents about the Vivado synthesizer because not all the Verilog commands are supported by all the synthesizers.

Simulating DSP algorithms using Verilog.

Related posts

Undersampling. Playing with Nyquist and Shannon.

Implementing and ADC inside the FPGA

Solving a Boost converter in FPGA