**MyHDL by Example**

**Note:** This design requires MyHDL 0.5.1

On this page we will present the design of a sine and cosine computer.

You will learn about the following MyHDL features and techniques:

- using the
`intbv`

class to model negative numbers elegantly - how the Verilog convertor takes care of the tricky issues with the signed representation in Verilog
- how to take advantage of the fact that only code inside generator functions gets converted to Verilog
- how to use co-simulation to verify the Verilog code generated by the convertor

We will assume that the input angle is represented in radians, and that it is in the range between `-π/2`

and `π/2`

. Other angles can be handled by adding a factor `π`

first (modulo `2π`

) and changing the sign of the sine and cosine result.

The floating point numbers are represented as integers. Technically, this means rounding them to a fixed number of bits, and then multiplying them with the factor `2**F`

, where `F`

is the number of bits after the point.

Both the sine and the cosine of the input angle will be computed. The interface of the module looks as follows:

def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset): """ Sine and cosine computer. This module computes the sine and cosine of an input angle. The floating point numbers are represented as integers by scaling them up with a factor corresponding to the number of bits after the point. Ports: ----- cos_z0: cosine of the input angle sin_z0: sine of the input angle done: output flag indicating completion of the computation z0: input angle; -pi/2 <= z0 <= pi/2 start: input that starts the computation on a posedge clock: clock input reset: reset input """

We will first write a unit test for the design. The idea is to use the `cos`

and `sin`

functions from the `math`

module to compute the expected results on a number of input angles, and to compare them with the outputs from the design under test. Here is the code:

from math import pi, sin, cos, log import random from myhdl import * from SineComputer import SineComputer, SineComputer_v def bench(fractionSize, errorMargin, nrTests=100): """ Test bench for SineComputer. fractionSize: number of bits after the point errorMargin: margin for rounding errors on result nrTests: number of tests vectors """ # scaling factor to represent floats as integers M = 2**fractionSize # maximum angle ZMAX = int(round(M*pi/2)) # error margin shorthand D = errorMargin # signals cos_z0 = Signal(intbv(0, min=-D, max=M+D)) sin_z0 = Signal(intbv(0, min=-M-D, max=M+D)) z0 = Signal(intbv(0, min=-ZMAX, max=ZMAX+1)) done = Signal(False) start = Signal(False) clock = Signal(bool(0)) reset = Signal(True) # design under test dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset) # clock generator @always(delay(10)) def clockgen(): clock.next = not clock # test vector setup testAngles = [-pi/2, -pi/4, 0.0, pi/4, pi/2] testAngles.extend([random.uniform(-pi/2, pi/2) for i in range(nrTests)]) # actual test @instance def check(): yield clock.negedge reset.next = False for z in testAngles: yield clock.negedge z0.next = int(round(M*z)) start.next = True yield clock.negedge start.next = False yield done.posedge exp_cos_z0 = int(round(cos(z)*M)) exp_sin_z0 = int(round(sin(z)*M)) assert abs(cos_z0 - exp_cos_z0) < D assert abs(sin_z0 - exp_sin_z0) < D raise StopSimulation return dut, clockgen, check

As the design will perform its calculations in finite precision, there will be rounding errors. Therefore, the comparisons between expected and actual results are performed using an error margin. We may want to use the test bench for designs with different characteristics; therefore, the error margin is made a parameter.

The precision is specified in terms of the number of bits after the point, using the parameter `fractionSize`

.

To represent the numbers, we use the `intbv`

class, which is basically an integer-like type with bit-vector capabilities. Note that we constrain the `intbv`

instances by specifying the range of valid integer values, not by a bit width. For high-level, algorithmic work, this is much easier. Moreover, it enables fine-grained range error checking at run-time.

By filling in the parameters of the `bench`

function, we can construct an actual test bench that runs a simulation, as follows:

def test_bench(): fractionSize = 18 errorMargin = fractionSize tb = bench(fractionSize, errorMargin) sim = Simulation(tb) sim.run()

A function with a name starting with `test_`

is automatically recognized and run by a unit testing framework such as `py.test`

.

To implement the design, we will use the Cordic algorithm, a very popular algorithm to compute trigonometric functions in hardware. On this page, we are mainly interested in the mechanical characteristics of the algorithm and their hardware implications. For more information and background on the algorithm itself, please consult other sources, such as this paper by Ray Andraka.

The Cordic algorithm is an iterative algorithm based on vector rotations over elementary angles. The algorithm normally operates in one of two modes. In rotation mode, it rotates a vector (x_{0}, y_{0}) in the Cartesian plane over an input angle z_{0}. The Cordic equations for this mode are:

x_{i +1} = x_{i} - y_{i}d_{i}2^{-i}

y_{i +1} = y_{i} - x_{i}d_{i}2^{-i}

z_{i +1} = y_{i} - d_{i}tan^{-i}(2^{-i})

where

d_{i} = -1 if z_{i} < 0, else +1.

These equations can be implemented with relatively simple hardware. This is the characteristic that makes the Cordic algorithm attractive. In particular, multiplications with a factor 2^{-i} are simply shift-right operations. Also, the arctangent values tan^{-i}(2^{-i}) can be precomputed and stored in a small look-up table.

The Cordic equations can be used for a variety of computations. For our purposes, it can be shown that

x_{n} = cos z_{0}

y_{n} = sin z_{0}

for the following initial conditions:

x_{0} = 1 / A_{n}

y_{0} = 0

where

A_{n} = Product[ sqrt(1 + 2^{-2i}) ] with i = 0 … n-1

The Cordic algorithm can be implemented in many ways, with various characteristics and advantages. On this page, we will implement a parallel, iterative processor, which is a fairly straightforward mapping of the equations into a bit-parallel data path and a state machine.

from math import atan, sqrt, ceil, floor, pi from myhdl import * t_State = enum("WAITING", "CALCULATING") def SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset): """ Sine and cosine computer. This module computes the sine and cosine of an input angle. The floating point numbers are represented as integers by scaling them up with a factor corresponding to the number of bits after the point. Ports: ----- cos_z0: cosine of the input angle sin_z0: sine of the input angle done: output flag indicated completion of the computation z0: input angle start: input that starts the computation on a posedge clock: clock input reset: reset input """ # angle input bit width W = len(z0) # angle input z0 represents number between -pi/2 and pi/2 # scaling factor corresponds to the nr of bits after the point M = 2 ** (W-2) # nr of iterations equals nr of significant input bits N = W-1 # calculate X0 An = 1.0 for i in range(N): An *= (sqrt(1 + 2**(-2*i))) # X0 X0 = int(round(M*1/An)) # tuple with elementary angles angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)]) # iterative cordic processor @instance def processor(): x = intbv(0, min=sin_z0.min, max=sin_z0.max) y = intbv(0, min=sin_z0.min, max=sin_z0.max) z = intbv(0, min=z0.min, max=z0.max) dx = intbv(0, min=sin_z0.min, max=sin_z0.max) dy = intbv(0, min=sin_z0.min, max=sin_z0.max) dz = intbv(0, min=z0.min, max=z0.max) i = intbv(0, min=0, max=N) state = t_State.WAITING while True: yield clock.posedge, reset.posedge if reset: state = t_State.WAITING cos_z0.next = 1 sin_z0.next = 0 done.next = False x[:] = 0 y[:] = 0 z[:] = 0 i[:] = 0 else: if state == t_State.WAITING: if start: x[:] = X0 y[:] = 0 z[:] = z0 i[:] = 0 done.next = False state = t_State.CALCULATING elif state == t_State.CALCULATING: dx[:] = y >> i dy[:] = x >> i dz[:] = angles[int(i)] if (z >= 0): x -= dx y += dy z -= dz else: x += dx y -= dy z += dz if i == N-1: cos_z0.next = x sin_z0.next = y state = t_State.WAITING done.next = True else: i += 1 return processor

The actual computation is done by the `processor`

generator. Note that outside the generator function, we calculate some data such as the `X0`

constant, and the look-up table of elementary arctangents, represented by the `angles`

tuple.

The internal number variables are represented by `intbv`

instances. The dual nature of this class comes in very handy. On the one hand, we can constrain the instances as integer subtypes by specifying the valid integer range at construction time. On the other hand, we can access their two's complement representation as a bit vector, for example for slicing or right-shifting.

It seems obvious that a type that unifies the integer and the bit vector views should be very useful for hardware design. One would therefore expect a similar feature in other HDLs. However, I believe that it is actually a unique capability offered by MyHDL. Other HDLs seem to try to solve the issues by creating more and more integer and bit-vector like types. In MyHDL, a single type does it all - the `intbv`

class.

`py.test`

confirms that this is a valid impementation:

> py.test ============================= test process starts ============================= testing-mode: inprocess executable: /usr/local/bin/python (2.4.2-final-0) using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown> test_SineComputer.py[1] . ================== tests finished: 1 passed in 0.47 seconds ===================

Now that we have a working design (and only now!) we can attempt to convert it to Verilog automatically. In MyHDL, this is done by using the `toVerilog`

function. For example, consider the instantiation of the design under test in the test bench:

# design under test dut = SineComputer(cos_z0, sin_z0, done, z0, start, clock, reset)

To convert the design instance to Verilog, this line can be replaced by the following:

# design under test dut = toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset)

As before, the `dut`

object is a simulatable design instance, but as a side effect of the instantiation, an equivalent Verilog module file will be generated. The Verilog output is as follows:

module SineComputer ( cos_z0, sin_z0, done, z0, start, clock, reset ); output signed [19:0] cos_z0; reg signed [19:0] cos_z0; output signed [19:0] sin_z0; reg signed [19:0] sin_z0; output done; reg done; input signed [19:0] z0; input start; input clock; input reset; always @(posedge clock or posedge reset) begin: _SineComputer_processor reg [5-1:0] i; reg [1-1:0] state; reg signed [20-1:0] dz; reg signed [20-1:0] dx; reg signed [20-1:0] dy; reg signed [20-1:0] y; reg signed [20-1:0] x; reg signed [20-1:0] z; if (reset) begin state = 1'b0; cos_z0 <= 1; sin_z0 <= 0; done <= 0; x = 0; y = 0; z = 0; i = 0; end else begin // synthesis parallel_case full_case casez (state) 1'b0: begin if (start) begin x = 159188; y = 0; z = z0; i = 0; done <= 0; state = 1'b1; end end 1'b1: begin dx = $signed(y >>> $signed({1'b0, i})); dy = $signed(x >>> $signed({1'b0, i})); // synthesis parallel_case full_case case (i) 0: dz = 205887; 1: dz = 121542; 2: dz = 64220; 3: dz = 32599; 4: dz = 16363; 5: dz = 8189; 6: dz = 4096; 7: dz = 2048; 8: dz = 1024; 9: dz = 512; 10: dz = 256; 11: dz = 128; 12: dz = 64; 13: dz = 32; 14: dz = 16; 15: dz = 8; 16: dz = 4; 17: dz = 2; default: dz = 1; endcase if ((z >= 0)) begin x = x - dx; y = y + dy; z = z - dz; end else begin x = x + dx; y = y - dy; z = z + dz; end if ((i == (19 - 1))) begin cos_z0 <= x; sin_z0 <= y; state = 1'b0; done <= 1; end else begin i = i + 1; end end endcase end end endmodule

With this example, some interesting features of the Verilog convertor can be illustrated.

It is important to realize that the conversion occurs on a design *instance*. This means that the code has already been *elaborated* by the Python interpreter. Therefore, the convertor works on the simulatable data structure, which is a (hierarchical) list of generators. This means that only the source code of generator functions is converted. The pleasant consequence is that the restrictions of the “convertible subset” apply only to the code inside generator functions, not to any code outside them.

For example, consider how the look-up table of elementary arctangents is set up in the `SineComputer`

design:

# tuple with elementary angles angles = tuple([int(round(M*atan(2**(-i)))) for i in range(N)])

This line uses things like a list comprehension and a call to the trigonometric function `atan`

from the `math`

library, At this point, this is beyond the scope of the convertible subset, and it may stay like that forever. But as this code is outside a generator function, it doesn't matter.

Inside the `processor`

function, the lookup-up table is used as follows:

z[:] = angles[int(i)]

This is just fine for the convertor. Note how this single-line lookup is expanded into a case statement in the Verilog output.

Note that we have been talking about the convertible subset, and not about the “synthesizable subset”. The reason is that the convertible subset is much less restrictive. You can find more information about the convertible subset in the MyHDL manual.

Obviously, MyHDL code intended for synthesis also has to take synthesis-related restrictions into account. But again, these restrictions only apply to the code inside generator functions, because only that code is actually converted.

The described behavior is a unique feature of the MyHDL design flow. Outside generator functions, you can use Python's full power to describe designs. As long as the code inside them obeys the constraints of the convertible subset, the design instance can always be converted to Verilog. And if that code also obeys the constraints of the synthesizable subset, the result will be synthesizable Verilog.

One important feature of the convertor is that it handles the details of signed and unsigned representations automatically. When writing synthesizable code, a MyHDL designer can use a high-level view for integer operations by using the `intbv`

type, and rely on the underlying two's complement representation to do the right thing automatically. In contrast, a Verilog designer is forced to deal with low-level representational issues explicitly. This can become very tricky, especially with negative numbers and the signed representation.

First of all, note in the Verilog output that the convertor infers which variables have to be declared as `signed`

. This is the easy part.

Now consider the following line in the MyHDL code of `SineComputer`

:

dx[:] = y >> i

I believe it's quite clear what this is supposed to do. With the underlying two's complement representation, it works for positive and negative values of `y`

.

Now consider how this has been translated into Verilog:

dx = $signed(y >>> $signed({1'b0, i}));

The convertor has to deal with several potential pitfalls. The fundamental problem is that Verilog uses an unsigned interpretation by default, which is the opposite from what you should do to get the naturally expected results.

First, the convertor uses arithmetic shift (`>>>`

) instead of bit-wise shift (`>>`

), to have the sign bit shifted in instead of a zero.

Secondly, the second (unsigned) operand is typecasted to `$signed`

, after adding a sign bit. Otherwise, Verilog will interprete all operands in a mixed expression as unsigned. Having said that, the Verilog literature seems to indicate that a shift operation is an exception to this rule. But the convertor doesn't take risks and inserts the typecast as a general measure. It may be redundant in this case.

Finally, the whole expression is typecasted to `$signed`

. Actually, I would have expected that this typecast would not be necessary - after all, we are shifting a signed number. However, my current simulator (Cver) tells me that it is. It may be wrong, I don't know. Anyway, the cast is an additional safety net.

The message you should get from this discussion is the following: working with the signed representation in Verilog is tricky. I believe that writing the code in a natural, high-level way in MyHDL, and letting the convertor take care of the low-level representation issues, is a better option. (Of course, we still need to make sure that the convertor gets it right, which is hard enough.)

Clearly we will want to verify that the Verilog output from the convertor is correct. For this purpose, MyHDL supports co-simulation with Verilog.

To set up a co-simulation, we need to create a Cosimulation object for the Verilog design. The Verilog convertor makes this task easier. In addition to the Verilog code for the design itself, it also generates a Verilog test bench stub that defines an interface between the Verilog design and a Cosimulation object.

The following function creates a Cosimulation object for our design:

def SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset): toVerilog(SineComputer, cos_z0, sin_z0, done, z0, start, clock, reset) cmd = "cver -q +loadvpi=myhdl_vpi:vpi_compat_bootstrap " + \ "SineComputer.v tb_SineComputer.v" return Cosimulation(cmd, **locals())

We start by doing the Verilog conversion itself first. Then, we define the command to start up the Verilog simulator. This is of course simulator-specific. The command shown is for the open-source Cver simulator. It loads a vpi module that defines the interface between the MyHDL simulator and the Verilog simulator. Also, both the Verilog code for the design and the test bench stub are compiled.

The Cosimulation object is then constructed with the command as its first parameter, followed by a number of keyword arguments. The keyword arguments make the link between signal names declared in the Verilog test bench stub and signals in the MyHDL code. When the signal names in MyHDL and Verilog are identical we can use a little trick and simply pass the local namespace dictionary `locals()`

to the constructor.

In the test bench code, we replace the instantiation of the MyHDL module with this function:

dut = SineComputer_v(cos_z0, sin_z0, done, z0, start, clock, reset)

When running py.test again, we now run a co-simulation between the MyHDL test bench and the Verilog code:

============================= test process starts ============================= testing-mode: inprocess executable: /usr/local/bin/python (2.4.2-final-0) using py lib: /usr/local/lib/python2.4/site-packages/py <rev unknown> test_SineComputer.py[1] Copyright (c) 1991-2005 Pragmatic C Software Corp. All Rights reserved. Licensed under the GNU General Public License (GPL). See the 'COPYING' file for details. NO WARRANTY provided. . ================== tests finished: 1 passed in 1.07 seconds ===================

So the test bench confirms that the Verilog code is correct.

**Historical note** When this example was developed with MyHDL 0.5, this test failed, showing that the convertor had bugs. It turned out that the handling of signed variables with shift operations had not been implemented and tested properly. Therefore, this example has been the trigger to fix these bugs and develop MyHDL 0.5.1.

To confirm synthesizablity, this example was synthesized with Xilinx ISE and targeted to a Spartan FPGA. For detailed information, you can review the synthesis report.

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported