# The RTL Model of a Reconfigurable Pipelined MCM 

Arya p kumar ${ }^{1}$, Mercy Mathew ${ }^{2}$<br>${ }^{1}$ M-TECH Student, Department of ECE, Believers Church Caarmel Engineering College, Kerala -689711,India<br>${ }^{2}$ Asst.Professor, Department of ECE, Believers Church Caarmel Engineering College, Kerala -689711, India


#### Abstract

This work introduces a reconfigurable constant multiplier as multiplication with constant coefficients plays an important role in digital signal processing(DSP).Placing embedded multipliers into the fabric of field programmable gate arrays(FPGAs) reduces the performance gap that were formed between that of FPGAs and application specific integrated circuits(ASICs).This work consists of two parts.firstly,the register transfer level(RTL) model of the architecture proposed by Moller et.al is constructed by using a ripple carry adder structure.Secondly,RTL model of the proposed architecture is constructed by using carry look ahead adder.The aim is to compare the performance of the two models after synthesizing and mapping them on a FPGA device library.The RTL models were constructed by using verilog hardware description language. The functional simulation and verification are performed using Modelsim simulator.


Key Words: RTL, ModelSim, MCM, CSD

## 1. INTRODUCTION

Constant multiplication can be implemented by using multiplier less additions, subtractions and bit-shifts. This is easy because of the fact that bit shifts can be realized as wires on ICs and special properties in the constant number representations can be alone exploited. A multiple constant multiplier (MCM) is used in a plethora of applications including, digital, filtering, FFT computation, matrix multiplications, transforms and block data manipulations. They are widely used in DSP algorithms performed on video, audio, speech and image data. A number of citation related to variable MCM structures are available [1],[2].The MCM design approaches basically falls under four different categories. They are the Common Sub expression Elimination (CSE) algorithms, Digit based recording algorithm, and Graph based algorithm and Hybrid algorithms.

The Canonical Signed Digit (CSD) is a popular digit based recoding algorithm. The digit based recording algorithms are simple, extremely fast, requires low computational cost and they are linear in number of bits and can be easily applied to constants with large bit length. However their performance is usually worst. To achieve better performance, a modified CSD algorithm is used. The CSE algorithm are the evolved version of digit based recording algorithm. The basic idea of this algorithm is to convert the multiplier constants on to a convenient number system such as CSD. The second step is to find common sub-patterns in the representation of
constants. The performance of the algorithm truly depends on number representation.

Further, even though the considered CSE problem is nondeterministic polynomial time complete its optimal solution in general does not provide optimal MCM solutions. The graph based algorithm uses the bottom top approach to iteratively construct graph representing a multiplier block. The graph construction is determined by a heuristic that determines the next graph vertex to add to the graph. This algorithm offers more degrees of freedom, by not being restricted to a particular representation of constants or a predefined graph topology and produce solutions with the lowest number of operations. Among the graph based algorithms, the Bull Horrocks (BH) algorithm, Bull Horrocks modified (BHM) algorithm and n-dimensional reduced adder graph (RAG-n) are popular. The Hybrid algorithm combines two or more algorithm from different classes to obtain a MCM structure. Among all these algorithms, it is evident that the RAG-n yields the solution for an MCM with the smallest number of add/subtract operations.

## 2. EXISTING ARCHITECTURE

Moller et.al proposed a pipelined MCM architecture consisting of registers, adders, subtractors and bit-shifters to implement a three-constant multiplier. This architecture is built on the RPAG algorithm, having a 16 bit input data. These three constants are supposed to be the normalized filter coefficients of a typical FIR filter. The main problem in implementing and computing convolution is speed, area and power which affect any DSP system. Speeding up convolution using a Hardware Description Language for design entry not only increases (improves) the level of abstraction, but also opens new possibilities for using programmable devices. Today, most DSPs suffer from limitations in available address space, or the ability to interface with surrounding systems. The use of high speed FPGAs, together with DSPs, can often increase the system bandwidth, by providing additional functionality to the general purpose DSPs.

The selected set of multiplier constants in the above figure $1912_{10}, 1111_{10}, 1331_{10}$.


Fig -1: Single constant multiplier fused between the constants 1912, 1111, 1331.

These three constants are supposed to be the normalized filter coefficients of a typical FIR filter. In this, input $x$ is a 16 bit variable in which they are stored in a register. The entire figure can be divided into three sections as it switches between three different constants sections. In left hand section a multiplier is placed in which three selection lines are used. Here multiplication is employed by repeated additions, subtractions and bit shifts. The selection line of the multiplexer determines which constants are produced at the output. Enabling of selection line 0 will produce a constant $1912_{10}$ at the output for further multiplication process. Similarly, the selection line 1 and 2 will produce constants $1111_{10}$ and $1331_{10}$ at the output. As three constants are present, the switching between the given set of constants of multipliers during run-time apart from using large generic multipliers is necessary to realize hardware efficient run-time adaptable filters [2], [3], [4], DCT as well as FFT implementations. This multiplier uses the suitable constants from the set of constant $\{912,1111,1331\}$.The adder used in above architecture is ripple carry adder.

## 3. PROPOSED ARCHITECTURE

Since three selection lines are present in the existing architecture which switches between three constants. Here a fourth selection line is added which gives another constant as the output. The four selection lines used are $00,01,10,11$. First it checks whether the first bit of the four selection lines are 0 or 1 .Ifthe first bit is 0 , then it checks the second bit to be 0 or 1.If the second bit is 0 , it will enable selection line 1 and as a result ,the coefficient 1912 is generated at the output. And the ripple carry adder is used in fig- 1 which is replaced by carry look ahead adder. And the performance between the multiplier is compared. Switching between single and that of multiple constants outputs are obtained by the inclusion of multipliers among different stages. It is based on
a most favorable algorithm which fuses previously optimized pipelined constant multipliers generated by an obtainable heuristic called RPAG. For such applications, two to six constants or coefficients are common in a coefficient set and switching between coefficients are done during run-time and is obtained by placing multiplexers keen on a number of constant multiplication circuits. This will leads to the reuse of repeated partial circuits and hence required resources. The problem is to obtain the most possible solution during the insertion of multiplexers among different stages of multiplications. Large routing delays in high speed applications can be often reduced by providing pipelining. Pipeline registers are placed following every stages which includes registers in the multiplexer stages. The wires can be connected with a left shift and a sign. The value vector associated with each operation corresponds to the halfway or output factors for a particular multiplexer configuration. A switchable adder/subtractor is depicted as an adder with an extra sign vector input.


Fig -2: Multiple constant multiplier switching between the constants 1912, 1111, 1331, 1092.

## 4. RESULTS

Since ripple carry adders are replaced by carry look ahead adder, significant variations are noticed.

Improvement in maximum clock frequency in model 2 when compared to model 1 is about $3.53 \%$. Improvement in throughput in model 2 when compared to model 1 is about $3.53 \%$. Improvement in multiplication per second in model 2 when compared to model 1 is about $3.53 \%$. Decrease in latency time in model 1 when compared to model2 is $3.53 \%$.In power utilization; model 2 consumes $1.65 \%$ more
power when compared to model 1.This is not so significant. There is no significant difference in resource utilization.


Chart-1: Graph of reconfigurable multiplier switching between constants 1912,1111,1331,1092.

## 4. CONCLUSION

The heuristic is necessary as the search space is getting too large with increasing numbers of operations in the fused circuits. Using the heuristic with its controllable search width was able to find close-to-optimal or even optimal fusion solutions within a feasible run-time. The RTL models were constructed by using verilog hardware description language. The functional simulation and verification are performed using ModelSim simulator. After comparing both these models, the second model was found to be faster with more throughputs, less latency and having moderate resource utilization.

## REFERENCES

[1] Konrad M"oller, Martin Kumm, Marco Kleinlein," Reconfigurable Constant Multiplication for FPGAs, " 2016 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Digital Technology.
[2] D. Kornack and P. Rakic, "Cell Proliferation without Neurogenesis in Adult Primate Neocortex," Science, vol. 294, Dec. 2001, pp. 2127-2130, doi:10.1126/science.1065467.
[3] M. Young, The Technical Writer's Handbook. Mill Valley, CA: University Science, 1989.
[4] R. Nicole, "Title of paper with only first word capitalized," J. Name Stand. Abbrev., in press.
[5] K. Elissa, "Title of paper if known," unpublished.
[6] L. Aksoy, P. Flores, and J. Monteiro, "Multiplierless Design of Folded DSP Blocks," ACM Transactions on Design Automation of ElectronicSystems, vol. 20, no. 1, pp. 1-24, Nov. 2014 [3] P. Lowenborg and H. Johansson, "Minimax Design of Adjustable-Bandwidth Linear-Phase FIR Filters," Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 53, no. 2, 2006.
[7] M. Garrido, F. Qureshi, and O. Gustafsson, "LowComplexity Multiplierless Constant Rotators Based on Combined Coefficient Selection and Shift-and-Add Implementation (CCSSI)," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 61, no. 7, pp. 20022012, July 2014.
[8] P. R. Cappello and K. Steiglitz, "Some Complexity Issues in Digital Signal Processing," Acoustics, vol. 32, no. 5, pp. 1037-1041, Oct. 1984.

