Design of Power and Area Efficient Approximate Multipliers

Dr. Y. L. Ajay Kumar

Dr. Y.L. Ajay Kumar, Associate Professor, Anantha Lakshmi Institute of technology & Sciences, Anantapuram

Abstract Approximate computing can decrease the design complexity with an increase in performance and power efficiency for error resilient applications. This brief deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38%, respectively, compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous works. Performance of the proposed multipliers is evaluated with an image processing application, where one of the proposed models achieves the highest peak signal to noise ratio.

INTRODUCTION

In applications like multimedia signal processing and data mining which can tolerate error, exact computing units are not always necessary. They can be replaced with their approximate counterparts. Research on approximate computing for error tolerant applications is on the rise. Adders and multipliers form the key components in these applications. In [1], approximate full adders are proposed at transistor level and they are utilized in digital signal processing applications. Their proposed employed in fixed-width multiplier designs. Then a constant or variable correction term is added to compensate for the quantization error introduced by the truncated part [2], [3]. Approximation techniques in multipliers focus on accumulation of partial products, which is crucial in terms of power consumption. Broken array multiplier is implemented in [4], where the least significant bits of inputs are truncated, while forming partial products to reduce hardware complexity. The proposed multiplier in [4] saves few adder circuits in partial product accumulation.

Most of the students of Electronics Engineering are exposed to Integrated Circuits (IC’s) at a very basic level, involving SSI (small scale integration) circuits like logic gates or MSI (medium scale integration) circuits like multiplexers, parity encoders etc. But there is a lot bigger world out there involving miniaturisation at levels so great, that a micrometer and a microsecond are literally considered huge! This is the world of VLSI - Very Large Scale Integration. The article aims at trying to introduce Electronics Engineering students to the possibilities and the work involved in this field.

VLSI stands for "Very Large Scale Integration". This is the field which involves packing more and more logic devices into smaller and smaller areas. Thanks to VLSI, circuits that would have taken boardfuls of space can now be put into a small space few millimetres across! This has opened up a big opportunity to do things that were not possible
before. VLSI circuits are everywhere ... your computer, your car, your brand new state-of-the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise on many fronts within the same field, which we will look at in later sections.

VLSI has been around for a long time, there is nothing new about it ... but as a side effect of advances in the world of computers, there has been a dramatic proliferation of tools that can be used to design VLSI circuits. Alongside, obeying Moore's law, the capability of an IC has increased exponentially over the years, in terms of computation power, utilisation of available area, yield. The combined effect of these two advances is that people can now put diverse functionality into the IC's, opening up new frontiers. Examples are embedded systems, where intelligent devices are put inside everyday objects, and ubiquitous computing where small computing devices proliferate to such an extent that even the shoes you wear may actually do something useful like monitoring your heartbeats! These two fields are kind a related and getting into their description can easily lead to another article.

RELATED WORKS

Array multiplier is an efficient layout of combinational multiplier. Multiplication of two binary number can be obtained with one micro-operation by using a combinational circuit that forms the product bit all at once thus making it a fast way of multiplying two numbers since only delay is the time for the signals to propagate through the gates that forms the multiplication array. In array multiplier, consider two binary numbers A and B, of m and n bits. There are an summands that are produced in parallel by a set of an AND gates. Also, in array multiplier worst case delay would be (2n+1) td.

Array Multiplier

Full Adder

The hardware requirement in terms of full adder (FA) and the length of final adder (FAL) for different size of array multipliers is obtained in the manner given in below
The conventional array multiplier uses full adder in its reduction phase. The bottleneck of full adder is high power consumption due to XOR gates. As shown in fig. 2, conventional full adder consists of two XOR gates in critical path of sum and one XOR gate, one AND gate and one OR gate in the critical path of the carry.

Delay = 2 XOR
Path of sum and one XOR gate, one AND gate and one OR gate in the critical path of the carry.

**MUX based Full adder**

In order to reduce the power and area, the conventional Full adder in reduction phase of array multiplier is replaced by a modified full adder [9]. In MUX based full adder the full adder is implemented using 4:1 multiplexers as shown in fig. 3. By implementing MUX based full adder in reduction phase of Array multiplier power reduction has been achieved. It is evident that, one 4:1 MUX can be made using three 2:1 MUX. The critical path delay can be written as shown below The array multiplier can be made more efficient by further reducing the critical path delay. The same can be achieved by using proposed full adder.

**PROPOSED ARCHITECTURE**

Usage of multiplier involves three stages: age of halfway items, incomplete items lessening tree, lastly, a vector consolidate expansion to create last item from the aggregate and convey lines produced from the decrease tree. Second step expends more power. In this concise, estimate is connected in lessening tree arrange.

In exact equipment circuits, in opposition to programming approximations, offer transistors decrease, bring down unique and spillage control, bring down circuit postponement, and open door for scaling back. Inspired by the constrained research on surmised multipliers, contrasted and the broad research on inexact adders, and expressly the absence of estimated strategies focusing on the incomplete item age, we exclude the age of some fractional items, hence diminishing the quantity of halfway items that must be gathered, we diminish the range, power, and profundity of the amassing tree.
An 8-bit unsigned multiplier is used for illustration to describe the proposed method in approximation of multipliers.

Proposed Full Adder

From statistical point of view, the partial product \( a_m,n \) has a probability of 1/4 of being 1. In the columns containing more than three partial products, the partial products \( a_m,n \) and \( a_n,m \) are combined to for propagate and generate signals.

RESULTS AND DISCUSSIONS

The proposed and the existing multiplier designs are developed using Verilog HDL for 8 and 16 bits, respectively. The functionality of the 32-bit proposed array multiplier is verified through simulations using Quartus tool. The simulation waveform of array multiplier using proposed full adder for 8-bits is shown in fig.1 All the multiplier designs are synthesized in Synopsys Design Compiler using SAED90nm CMOS technology. As the synthesized results indicates an average power reduction of 29.94% for 8-bit and 44.97% for 16-bit respectively, compared to existing multiplier architectures. The average area reduction of 45.38% for 16-bit and 46.13% for 32-bit are also achieved. The average delay is also reduced by 11.8% for 16-bit and 29.45% for 32-bit compared to the existing architectures.

CONCLUSIONS

In this paper, a modified full adder using multiplexers and XOR gate is proposed. By incorporating the modified full adder in the reduction stage of Wallace tree multiplier, an average power, area and delay reduction of 35.45%, 40.75% and 15.65% respectively, compared to existing methods respectively is achieved. The synthesis result confirms that the proposed Wallace tree multiplier is suitable for low power and small area applications.

REFERENCES


