# Review of High Performance Advance Multiplier-Accumulator (MAC) Unit

Rashmi D. Rathour<sup>1</sup>, Pravin W. Jaronde<sup>2</sup>

<sup>1</sup>Student, Department Of Electronics and Telecommunication, Datta Meghe Institute of Engineering Technology and Research, Wardha (M.S.), India

<sup>2</sup>Assistant Professor, Department Of Electronics and Telecommunication, Datta Meghe Institute of Engineering Technology and Research, Wardha (M.S.), India

*Abstract-* In Digital Communication, Digital Signal Processor (DSP) is an important block which performs several digital signal processing applications such as Convolution, Discrete Cosine Transform (DCT), Fourier Transform, and so on. Every digital signal processor contains MAC unit. The MAC unit performs multiplication and accumulation processes repeatedly in order to perform continuous and complex operations in digital signal processing. MAC unit also contains clock and reset in order to control its operation. Many researchers have been focusing on the design of advance MAC unit architectures for complex numbers so as to achieve minimum resource utilization and delay.

*Index Terms*- Complex number, Digital systems, floating-point number, Multiplier-Accumulator unit (MAC), Vedic mathematic.

## I. INTRODUCTION

Digital system nowadays became an important system in this modern era. Analog system was replaced by digital system because digital system can do their processes with high speed operation, less space and energy required. This event happens after the big contribution of the digital system which most commonly used no matter in industrial field. Due to the crucial developing of digital system, we cannot deny that the system is very important for now and developing process. The Multiplierfuture accumulator (MAC) unit supports large number of digital signal processing (DSP) applications. It also signal processing ability to furnishes the microcontroller for various applications such as servo/audio control etc. MAC is an execution unit in the processor.

The general MAC architecture consists of a conventional multiplier, adder and an accumulator.

Where the output is added to the previous MAC output result by an accumulate adder.

The Multiply-Accumulate (MAC) unit is extensively used in microprocessors and digital signal processors for data-intensive applications, such as filtering, convolution, and inner products. Most digital signal processing methods use nonlinear functions such as discrete cosine transform (DCT) or discrete wavelet transform (DWT) or FFT/IFFT computations that can be efficiently accelerated by dedicated MAC units. Because they are basically accomplished by repetitive application of multiplication and addition, the speed of the multiplication and addition determines the execution speed and performance of the entire computation. As the multiplier exhibits inherently long delay among the basic operational blocks in digital system, the multiplier determines the critical path. The function of the MAC unit is given by the equation:  $F = \sum A i B i$ 

Complex number operations are the backbone of many Digital Signal Processing (DSP) algorithms especially for multimedia applications such as 3D graphics which mostly depend on extensive numbers of multiplications. Besides that, they are time critical components for radar, satellite and digital modulation applications too. Complex number multiplication needs to be done using four real number multiplications represented as fixed/floating point complex numbers.

Floating point arithmetic is a very important topic for researchers as floating point numbers are widely used in many application and as compared to a fixed point number it has better resolution and accuracy. Floating point numbers represent real numbers in binary format. Since computer memory is limited, you cannot store numbers with precision up to infinity, no matter whether you use binary fractions or decimal ones, at some point you have to truncate the number. Representing a number in floating point format has more boldness and efficiency when compared to fixed-point representations.



Fig1. Architecture of MAC unit

It is well known fact that the speed of MAC is governed by the speed of the multiplier. Multipliers have large area, long latency and consume considerable power. As the multiplier consumes considerable delay among the basic operational blocks in digital system, the multiplier determines the critical path. Fused MAC unit executes faster than basic MAC unit. The use of MAC unit in DSP applications is not limited up-to multiplication and addition but it performs well the division, squares, and square-root operations also. It can be used in digital filters.

## II. LITERATURE REVIEW

In paper [1], two possible architectures are proposed for a Vedic real multiplier based on the URDHVA TIRYAKBHYAM (Vertically and cross wise) sutra of Indian Vedic mathematics and an expression for path delay of an N×N Vedic real multiplier with minimum path delay architecture is developed. The architecture of Vedic real multiplier with minimum path delay is used in the implementation of complex multiplier. The architectures for the four multiplier solution and three multiplier solution of complex multiplier for  $32 \times 32$  bit complex numbers multiplication are coded in VHDL and implemented through Xilinx ISE 13.4 navigator and Modelsim 5.6. Finally, the results are compared with that of the four and three real multipliers solutions using the conventional Booth and Array multipliers.

In paper [2], they have synthesized and verified IEEE 754 single and double precision High Speed Floating Point Multiplier using VHDL on Xilinx Virtex - 5 FPGA. The Urdhva Tiryakbhyam sutra (method) was selected for designing of mantissa. In addition the proposed designed handled underflow, overflow and rounding condition. High speed is achieved by reducing carry propagation delay by using carry save adder while implementation of four (27 x 27 bit multiplier for double precision) and (12 x 12 bit multiplier for single precision).

Paper [3], proposes a floating point multiplier which manages overflow, underflow and rounding. The proposed and conventional floating point multipliers based on Vedic mathematics would be coded in Verilog, Synthesized and Simulated using ISE Simulator. Xilinx Virtex VI FPGA will be used for Hardware realization and Verification. It is proposed compare resource utilization and to timing performance of the proposed multiplier with that of existing as of now. An IEEE-754 format established multiplier applying Vedic Urdhva-Tiryagbhyam mathematics is cultivated to cover both single precision and double precision format floating point numbers in the paper.

Paper [4], proposes a novel fixed point complex number multiply accumulate circuit, which is used in real time digital signal processing applications. The proposed architecture consists of multiplier-cumaccumulator which can be used as multiplier as well as MAC. So the depth of the multiplier-cumaccumulator unit remains same as O (log2 n) in case of Wallace tree multiplier based multiplier-cumaccumulator and O(n) in case of Braun multiplier based multiplier-cum-accumulator. And hence the separate accumulator with depth O(log2 n) can be avoided. The proposed architecture achieves an improvement factor of 32.4% in Wallace tree and 19.1% in Braun multiplier based fixed point complex number MAC without pipeline using 45 nm technology library. The same architecture achieves an improvement factor of 14.6% in Wallace tree and 12.2% in Braun multiplier based fixed point complex number MAC with pipeline.

In this paper [5], a high speed complex multiplier design (ASIC) using Vedic Mathematics is presented. On account of those formulas, the partial products and sums are generated in one step which reduces the carry propagation from LSB to MSB. The functionality of the circuits was checked and performance parameters like propagation delay and dynamic power consumption were calculated by spice spectre using standard 90nm CMOS technology. The propagation delay of the resulting (16, 16) x (16, 16) complex multiplier is only 4ns and consume 6.5 mW power. They achieved almost 25% improvement in speed from earlier reported complex multipliers, e.g. parallel adder and DA based architectures.

This paper [6] proposes a low power pipelined MAC architecture that incorporates a 16x16 multiplier Baugh-Wooley using algorithm with high performance multiplier tree, together with clock gating the idle pipeline stages to reduce the power consumption. By using the technique of clock gating independent pipeline stages of MAC architecture, we have shown that the power dissipation of the proposed MAC architecture is less than existing low power MAC units with the same performance. Simulations show that the power consumption of the proposed architecture is 30% to 80% less than the other contemporary MAC architectures, without compromising its computation performance.

In this paper [7], an optimized co-processor unit, designed specifically for executing the DSP application is proposed. It can be used as a coprocessor for the ACORN ARM processor. The coprocessor comprises of one MAC unit, control unit, a 32 bit output registers and register files for storing the input values and other coefficient. The co-processor is designed to execute a FIR filter. Vedic multiplier and booth multiplier has been used in the MAC unit and comparison is done based on the power, speed and area.

## **III. CONCLUSION**

The key to the proposed MAC unit is to design the advanced floating point complex MAC using Vedic Multiplier and to compare the Vedic, Booth and conventional multiplier in terms of computation required to generate the partial products and add the generated partial products to get the final result of the multiplication. The basic building blocks for the MAC unit are identified and each of the blocks is analyzed for its performance.

### REFERENCES

- K.Deergha Rao, Ch. Gangadhar, Praveen K Korrai, "FPGA Implementation of Complex Multiplier Using Minimum Delay Vedic Real Multiplier Architecture", 2016.
- [2] S S. Mahakalkar, S L. Haridas,"Design of High Performance IEEE754 Floating point multiplier using Vedic mathematics", 2014. [2]
- [3] Soumya Havaldar, K S Gurumurthy "Design Of Vedic IEEE 754 Floating Point Multiplier", 2016.
- [4] Mohamed Asan Basiri M, Noor Mahammad Sk, "An Efficient Hardware Based MAC Design in Digital Filters with Complex Numbers", 2014.
- [5] Prabir Saha, Arindam Banerjee, Partha Bhattacharyya, Anup Dandapat, "High Speed ASIC Design of Complex Multiplier Using Vedic Mathematics", 2014.
- [6] Rakesh Warrier, C.H. Vun, Wei Zhang, "A Low-Power Pipelined MAC Architecture using Baugh-Wooley based Multiplier", in IEEE Gobal Conference on Consumer Electronics, 2014.
- [7] Rahul Narasimhan. A, R. Siva Subramanian, "High Speed Multiply-Accumulator Coprocessor Realized for Digital Filters", in IEEE International Conference on Electrical, Computer & Communication Technologies, 2015.