# Rounding Based Approximation Multiplier for High-Speed Yet Energy Efficient Digital Signal Processing

SK Seema<sup>1</sup>, SK Jafar Ameen<sup>2</sup>

<sup>1</sup>PG Student, DSCE, Quba College of Engineering & Technology <sup>2</sup>Assistant professor, DSCE, Quba College of Engineering & Technology

Abstract— In digital signal processing, multiplication is a critical fundamental arithmetic operation. The design of an approximation multiplier looks to be a potential option for many error-resilient applications to lower the power consumption of an embedded system. Applying the approximation to the arithmetic units can be performed at different design abstraction levels including circuit, logic, and architecture levels, as well as algorithm and software layers. The approximation may be performed using different techniques such as allowing some timing violations (e.g., voltage over scaling or overclocking) and function approximation methods (e.g., modifying the Boolean function of a circuit). In the category of function approximation methods, a number of approximating arithmetic building blocks, such as adders and multipliers, at different design levels have been suggested. In this work, we focus on proposing a high speed low power/energy yet approximate multiplier appropriate for error resilient DSP applications. The proposed approximate multiplier, which is also area efficient, is constructed by modifying the conventional multiplication approach at the algorithm level assuming rounded input values. We call this rounding-based approximate (RoBA) multiplier. The proposed multiplication approach is applicable to both signed and unsigned multiplications for which three optimized architectures are presented.

Index Terms— Arithmetic Units, Function Approximation, Boolean Function Modification, Voltage Over-Scaling, Overclocking, Algorithm-Level Approximation, Logic-Level Approximation, Circuit-Level Approximation

#### I. INTRODUCTION

The enhanced level of integration in present-day VLSI technology has facilitated the integration of multiple intricate devices onto a single chip. Furthermore, the utilization of digital domain is imperative for sustaining power in analogue circuit techniques. In numerous applications, multipliers are essential components as they significantly affect the overall

performance of a circuit in terms of power consumption, delay. Broadly speaking, there exist two approaches that can be employed to enhance the overall efficiency of a multiplier with respect to power dissipation, latency, and area. The former is contingent upon the efficacious utilization of the multiplier function, whereas the latter is contingent upon the appropriate selection of a logic circuit for its execution.

1.1 Motivation for low power Applications

In recent times, a multitude of enquiries have been directed towards the development of VLSI systems that are reliant on low-power consumption, with the aim of creating a diverse array of computational systems. Since the advent of VLSI technology, various low-power dependent portable devices such as handheld communication devices, laptops, and personal digital assistants have been designed and produced for diverse applications. The demanding objectives of achieving substantial chip density and throughput often necessitate meeting the prerequisites for low energy consumption in the majority of scenarios. The intricacy of this system has primarily been determined by the magnitude and mass of the batteries. The advancement of battery technology has not kept pace with that of device and circuit technology.

The existence of bottlenecks has posed multiple challenges for design engineers who are involved in the development of VLSI circuits for low-power applications. The advancement in this field has led to the exploration of circuits that exhibit high speed, maximum throughput, minimal chip area, and low power consumption. According to Moore's law, the quantity of transistors present in a densely packed integrated circuit increases twofold approximately every two years. However, the progress of integrated circuit advancements in technology has exceeded Moore's law to a significant extent. The exponential growth of integrated technology, as projected by Moore's law, is exemplified by the flash storage chip's capability, which has increased by 100 times in the past eight years. The remarkable progress in technology has led to a significant shift, whereby flash memory has become the predominant storage medium. The fundamental basis of most signal-processing techniques has been the mathematical operation of addition. The utilization of a low power compressor is of utmost importance in the development of highperformance digital signal processing (DSP) systems that employ finite impulse response (FIR) filters, as well as in the creation of cryptography algorithms such as the Advanced Encryption Standard (AES).

Therefore, it is imperative to increase the speed of the compressor across all components. The development of low power VLSI systems necessitates the creation of a low power compressor as a crucial component. The design of the compressor has been primarily focused on enhancing its speed and throughput rate, which are anticipated to impact the efficacy of digital signal processing (DSP) systems in the forthcoming The decades. swift expansion of mobile communications and other portable device technologies has resulted in power consumption emerging as a crucial design obstacle. Various powersaving techniques have been devised across different system design levels, encompassing circuit, device, and architectural levels. One of the most challenging aspects is preserving the characteristics while endeavoring to reduce power consumption. The scholars focused their efforts on the advancement of compressors with the aim of reducing energy consumption throughout multiple phases of the design process.

#### 1.2 Approximate Computing as a Field of Research

In its broadest interpretation, approximate computing pertains to the deliberate incorporation of imprecision into a computational process with the aim of enhancing its efficiency. This section presents an overview of approximation methodologies utilised across various tiers of the computational hierarchy. The paper delves into various techniques employed at different levels of software, architecture, and circuitry.

#### **II. LITERATURE SURVEY**

The author of the study constructed imprecise 4:2 compressor and multiplier circuits with the intention of utilizing them as fundamental constituents of approximate computing systems. The proposed compressor differs from conventional design techniques that utilize AND-OR and XOR logic by utilizing a singular majority gate. Furthermore, the utilization of majority gate is crucial as a primary logic unit in various emerging applications of nanotechnology that are favourable to majority. The circuits that have been suggested have been developed utilizing the contemporary industrial technology of Fin FET. Additionally, they have been subjected to analysis through the use of HSPICE, with a transistor size of 7nm. The findings suggest that the imprecise compressor outperforms its previous versions, exhibiting an average enhancement of 32%, 68%, 78%, and 66% in the respective measures. Furthermore, the effective approximation multiplier, which is commonly employed in image multiplication, is regarded as a crucial tool for enhancing images. Based on the results obtained from HSPICE and MATLAB, it can be concluded that the inexact multiplier proposed in this study provides a superior balance between precision and design effectiveness when performing approximation calculations.

The individual identified as proposed a design for a 15-4 compressor utilizing 5-3 compressors as the central module. The utilization of four unique types of approximately 5-3 compressors in a 15-4 compressor is employed to achieve low power consumption and high throughput. The outcomes were examined in all cases. The 15-4 compressor has been employed to duplicate a 16x16 bit multiplier. The simulation outcomes indicate that multipliers featuring the proposed approximation compressors may outperform those with accurate 15-4 compressors in power related aspects. The multipliers that have been proposed exhibit a comparatively high pass rate when compared to other existing approximation multipliers. The proposed multiplier is utilized in image processing applications to compute the maximum signal to noise ratio of the image.

The individual identified as formulated a probabilistic demonstration, determining that the maximum length of the carry chain within an n-bit adder is equivalent to the logarithm of n. Additionally, they developed an imprecise adder with restricted carry propagation to enhance processing speed. In reference, the method of achieving approximation involves partitioning the addition circuit into two distinct sections: one section that is accurate and another section that is approximate and prone to errors.

The authors of propose the construction of imprecise complete adder cells that utilise estimated logic functions and fewer transistors. These cells are subsequently employed in the construction of imprecise adders. Whilst adders designed for the purpose of constructing approximation multipliers are available, their application in diverse tree designs and the extent to which their error increases in multioperand addition scenarios remains ambiguous.

### 2.1 SUMMARY

Based on the analysis of the existing literature, two distinct categories of solutions have been identified: technological and architectural. The technical methodology introduces a novel technique for producing circuits, while the architectural alternative adjusts the circuits to conform to the requisite scaling technology modifications. The literature contains a plethora of documented solutions to address design challenges related to low supply voltages. In order to address various issues and challenges without introducing additional intricacy, novel technological breakthroughs and innovative circuit designs have been devised. The literature extensively discusses various design modifications aimed at effectively optimizing and managing performance factors such as power, latency, PDP, among others, for the given application.

#### III. EXISTING METHOD

The operation of multiplication holds significant importance in modern electronic circuits. Multiplication-based algorithms are commonly utilized in various implementations of Digital Signal Processing (DSP). Digital multipliers are essential elements in high-performance devices such as microcontrollers, digital signal processors, and FIR filters. The multiplier component, which is known to be the most area-intensive and slowest, is often utilized as a metric for evaluating the efficacy of the system. Hence, augmenting the multiplier's efficacy and size would pose a significant limitation in the design process.

The utilization of the multiplier has served as a fundamental structure in the development of a processor that is optimized for energy efficiency. Furthermore, the architecture of the multiplier has an impact on the overall performance of a given system. Signal processing applications are of utmost importance. The effectiveness of an algorithm is determined by the placement of the multiplier in the critical delay path, where multiple digital signal processing computations are performed. The significance of the multiplier speed appears to be greater in DSP processors as compared to conventional processors. Hence, the efficiency of the multiplier holds utmost significance in the functioning of any computing device. The study of digital multipliers in DSP has been a crucial field of enquiry for a considerable period of time, owing to its significance. The performance of DSP processors is reduced by limiting multiplier power.

## 3.1 BASIC MULTIPLIER CONCEPTS

Multiplication is a fundamental activity in many signal-processing algorithms. Multipliers have a large size, a significant latency, and a highpower consumption. As a result, the implementation of lowpower VLSI systems necessitates the design of appropriate low-power multipliers.



Figure 3.1 Architecture of conventional Multiplier

A multiplier has two operands: a multiplicand as well as a multiplier which produces the result. In the initial step, the multiplier and multiplicand have been bit-bybit multiplied to obtain the intermediate result. This is the most crucial step since it is the most complex and affects the rate at which the total multiplier combines the partial products to get the outcome. A half-adder would be the easiest way to do multiplication. M cycles are required to operate an N-bit adder for inputs that appear to be M & N bits larger. M partial products are added together using the shift-add multiplication technique.

#### 3.2 Conventional multiplier in VLSI

In every digital circuit, multiplier uses the majority of the power and it will generate delays if the properly optimized multiplier is not employed. There are various multipliers, and every multiplier has a unique algorithm and architecture. Every multiplier's performance characteristics are unique, but each one may be further adjusted to get superior performance characteristics. Various researchers have invented and refined different sorts of multipliers. Various traditional multipliers are addressed in this section.

#### 3.2.1 Array multiplier

An "array multiplier" would be a digital logic circuit that multiplies two binary numbers using an array of both full and half adders. This is the typical conventional multiplier. This multiplication employs the standard add-and-shift technique. The partial result would be obtained by multiplying the multiplicand bits by each bit of multiplier. As a consequence, the resultant partial products are shifted depending on their bit sequence before being eventually added. The quantity of multiplier bits produced is proportional to the number of partial products produced.



Figure 3.2 Conventional array multiplier with CSA [62]



Figure 3.3 4x4 Wallace tree multiplication

Figure 3.4 depicts the Wallace tree multiplier's schematic form. CSA is intended to reduce worst-case route latency.



Figure 3.4 Wallace Tree Multiplier using 3:2 Compressor (Parmar et al 2013)

#### IV. PROPOSED METHOD

In this work, we focus on proposing a high-speed low power/energy yet approximate multiplier appropriate for error resilient DSP applications. The proposed approximate multiplier, which is also area efficient, is constructed by modifying the conventional multiplication approach at the algorithm level assuming rounded input values. We call this roundingbased approximate (RoBA) multiplier. The proposed multiplication approach is applicable to both signed and unsigned multiplications for which three optimized architectures are presented. The efficiencies of these structures are assessed by comparing the delays, power and energy consumptions, energy-delay products (EDPs), and areas with those of some approximate and accurate (exact) multipliers. The contributions of this chapter can be summarized as presenting a new scheme for RoBA multiplication by modifying the conventional multiplication approach.

#### 4.1 Multiplication Algorithm of RoBA Multiplier

The main idea behind the proposed approximate multiplier is to make use of the ease of operation when the numbers are two to the power n (2n). To elaborate on the operation of the approximate multiplier, first, let us denote the rounded numbers of the input of A and B by Ar and Br, respectively.

#### 4.2 Hardware Implementation of RoBA Multiplier

Based on (4.2), we provide the block diagram for the hardware implementation of the proposed multiplier in Fig. 4.1 where the inputs are represented in two's complement format. First, the signs of the inputs are determined, and for each negative value, the absolute value is generated.

# © September 2024 | IJIRT | Volume 11 Issue 4 | ISSN: 2349-6002



Figure 4.1: Block diagram for the hardware implementation of the proposed multiplier.

#### V. RESULTS AND DISCUSSION

The various 8 X 8 multipliers outlined above have been synthesized and simulated using an FPGA board utilizing the Xilinx Spartan 7 XC7S15-1FTGB196C architecture. The inputs of the multiplier are connected to the input switches of the FPGA and outputs of the multiplier are connected to the LEDs present on the FPGA. For the implementation of the existing and proposed multipliers on to FPGA board, Xilinx ISE is used.



Figure 5.1 Simulation results of 8X8 Array Multiplier

A multiplication using the array multiplier with inputs X and Y as 8 bits each is depicted in Figure 5.1. The output result of the multiplication process is represented by P.

| p |         |       |              |             |              |             |              |             |              |              | 2,000, |
|---|---------|-------|--------------|-------------|--------------|-------------|--------------|-------------|--------------|--------------|--------|
| P | Nane    | Value | 1,999,992.ps | 1,599,953ps | 1,999,994 ps | 1,999,995ps | 1,999,996.ps | 1,599,997ps | 1,999,993 ps | 1,999,999 ps | 2,000, |
| 2 | ► ½ 174 | 20    |              |             |              | 21          |              |             |              |              |        |
| P | k ¥ 174 | 20    |              |             |              | 21          |              |             |              |              |        |
| 9 | ₩ 0150] | 400   |              |             |              | 40)         |              |             |              |              |        |
| a | 11.74   |       |              |             |              | (0010       |              |             |              |              |        |

Figure 5.2 Simulation results of 8X8 Booth Multiplier



Figure 5.4 Simulation results of 16x16 proposed ROBA Multiplier

#### IV CONCLUSION & FUTURE SCOPE

In this project, we proposed a high-speed yet energy efficient approximate multiplier called RoBA multiplier. The proposed multiplier, which had high accuracy, was based on rounding of the inputs in the form of 2n. In this way, the computational intensive part of the multiplication was omitted improving speed and energy consumption at the price of a small error. The proposed approach was applicable to both signed and unsigned multiplications. Three hardware implementations of the approximate multiplier including one for the unsigned and two for the signed operations were discussed. The efficiencies of the proposed multipliers were evaluated by comparing them with those of some accurate and approximate multipliers using different design parameters. The results revealed that, in most (all) cases, the RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers.

#### REFERENCE

- [1] J. M Rabaey, A. Chandrakasan, B, Nikolic, Digital Integrated Circuits: A Design Perspective, Upper saddle rive, NJ: 2e Prentice-Hall, 2002.
- [2] S. M. Kang, Y. Leblebici, CMOS Digital Integrated Circuits: Analysis & Design, TATA McGraw- Hill Publication, 3e, 2003.
- [3] N. Weste, K. Eshraghian, Principles of CMOS VLSI Design, A systems perspective, Addison Wesley MA, 1988.
- [4] J. P. Uyemura, CMOS logic circuit design, Kluwer Academic, 2002.
- [5] K.S. Yeo, K. Roy, Low- Voltage, Low-Power VLSI Subsystems, McGraw Hill Professional, 2005.