# Design Of a Delay Buffer Using Gated Driver Tree for Low Power Applications

Ms.V Lakshmi<sup>1</sup>, R. Renuka<sup>2</sup>, S. Pushpa<sup>3</sup>, S. Shalina<sup>4</sup>, S. Fayaz<sup>5</sup>, V. Tharun<sup>6</sup>

<sup>1</sup>Assistant Professor, PBR Visvodaya Institute of Technology and Science

<sup>2,3,4,5,6</sup> UG Student, PBR Visvodaya Institute of Technology and Science

Abstract— This paper presents circuit design of a low-power delay buffer. The proposed delay buffer uses new techniques to reduce its power consumption. Since delay buffers are accessed sequentially, it adopts a ring-counter addressing scheme. In the ring counter, double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half and the C-element gated-clock strategy is proposed. A novel gated-clock-driver tree is then applied to further reduce the activity along the clock distribution network. Moreover, the gated-driver-tree idea is also employed in the input and output ports of the memory block to decrease their loading, thus saving even more power.

#### I. INTRODUCTION

Portable multimedia and communication devices have experienced explosive growth recently. Longer battery life is one of the crucial factors in the widespread success of these products. As such, lowpower circuit design for multimedia and wireless communication applications has become very important. In many such products, delay buffers (line buffers, delay lines) make up a significant portion of their circuits [1]-[3]. Such serial access memory is needed in temporary storage of signals that are being processed, e.g., delay of one line of video signals, delay of signals within a fast Fourier transform (FFT) architectures [4], and delay of signals in a delay correlator [2]. Currently, most circuits adopt static random access memory (SRAM) plus some control/addressing logic to implement delay buffers. For smaller-length delay buffers, shift register can be used instead. The former approach is convenient since SRAM compilers are readily available and they are optimized to generate memory modules with low power consumption and high operation speed with a compact cell size. The latter approach is also convenient since shift register can be easily synthesized, though it may consume much power due to unnecessary data movement.

Previously, a simplified and thus lower-power sequential addressing scheme for SRAM application in delay buffers is proposed in [5]. A ring counter is used to point to the target words to be written-in and read-out. Since the ring counter is made up of an array of D-type flip-flops (DFFs) triggered by a global clock signal and all except one DFFs have a value of "0," it is possible to disable the clock signal to most DFFs. Such a gated-clock ring counter is implemented in [6] to compose a low-power first-in-first-out (FIFO) memory.

In this paper, we propose to use double-edge-triggered (DET) flip-flops instead of traditional DFFs in the ring counter to halve the operating clock frequency. A novel approach using the C-elements instead of the R–S flip-flops in the control logic for generating the clock-gating signals is adopted to avoid increasing the loading of the global clock signal. In addition to gating the clock signal going to the DET flip-flops in the ring counter, we also proposed to gate the drivers in the clock tree.

This technique will greatly decrease the loading on distribution network of the clock signal for the ring counter and thus the overall power consumption. The same technique is applied to the input driver and output driver of the memory part in the delay buffer. In a delay buffer based on the SRAM cell array such as the one in [6], the read/write circuitry is through the bit lines that work as data buses. In the proposed new delay buffer, we use a tree hierarchy for the read/write circuitry of the memory module. For the write circuitry, in each level of the driver tree, only one driver along the path leading to the addressed memory word is activated. Similarly, a tree of multiplexers and gated drivers comprise the read circuitry for the proposed delay buffer. Simulation results show the effectiveness of the above techniques in power reduction.

## II. EXISTING TECHNIQUE



Fig.1. Existing Block of Memory Organization

The existing technique in Fig.1 represents a memory organization system. Here's a breakdown of the blocks and their functions:

# A. Input Buffer

This block receives data inputs from external sources and temporarily stores them before processing.

## B. Memory Block

The input buffer transfers data to the memory block, which acts as a temporary storage unit, managing data flow and ensuring smooth processing.

#### C. Output Buffer

Once data is processed in the memory block, it moves to the output buffer, which temporarily holds the data before sending it to the output.

#### D. Ring Counter Block

This block is connected to the memory block and likely plays a role in addressing or sequencing data. A ring counter is commonly used for cyclic or sequential data access, ensuring that data flows in an organized manner.

#### E. Working

Data flows from the input buffer to the memory block. The ring counter helps in managing data placement and retrieval within memory. Processed data is sent to the output buffer before being delivered to the final destination. This system is typically used in memory management and buffering applications to optimize data flow and processing efficiency.

## III. PROPOSED METHODOLOGY



Fig.2. Block Diagram for Proposed Delay Buffer

In this proposed technique, power reduction techniques are adopted. Mainly, these circuit techniques are designed with a view to decreasing the loading on high fan-out nets, e.g., clock and read/write ports.



Fig.3. Gated driver tree

A Gated Driver Tree is a technique used in digital circuit design to reduce power consumption by controlling the distribution of the clock signal. In large circuits, the clock signal is distributed through a network known as a clock tree. By incorporating gating logic into this network, the clock signal can be selectively enabled or disabled for different parts of the circuit. This approach is particularly beneficial in systems with significant idle times or predictable periods of inactivity within specific modules.

# B. MODIFIED RING COUNTER



Fig.4. Modified Ring Counter

The modified ring counter consists of multiple blocks with control logic, likely reducing power consumption and improving timing efficiency. The presence of AND gates and clock dividers suggests that the counter might:Operate with multiple clock phases.Use conditional enable signals for optimized switching.Reduce unnecessary transitions to minimize power usage.

#### C.DET (Double edge triggered flip-flops)



Fig.5. Double edge triggered flip-flops

Double-edge-triggered (DET) flip-flops are utilized to reduce the operating frequency by half The logic construction of a double-edge-triggered (DET) flip-flop, which can receive input signal at two levels the clock, is analyzed and a new circuit design of CMOS DET. In this paper, propose to use double-edge triggered (DET) flip-flops instead of traditional DFFs in the ring counter to halve the operating clock frequency. Double edge-triggered flip flops are becoming a popular technique for low-power designs since they effectively enable a halving of the clock frequency.

## D.C-Element



Fig.6. C- Element

The Muller C-element, or Muller C-gate, is a commonly used asynchronous logic component originally designed by David E. Muller. It applies logical operations on the inputs and has hysteresis. The output of the C-element reflects the inputs when the states of all inputs match. The output then remains in this state until the inputs all transition to the other state. This model can be extended to the Asymmetric C-element where some inputs only effect the operation in one of the 33 transitions (positive or negative)

## IV. RESULTS AND DISCUSSION

#### A. Before Optimization simulation results:

The Fig.7 shows the simulation results of the delay buffer before optimization.



Fig.7. Simulation Results of Existing Method

The Existing Delay Buffer Timing Diagram explains the following inputs and outputs:

*Clock (clk):* The primary clock signal synchronizing the system.

Enable (enable): The control signal allowing data flow.

data\_input[7:0]: The input data being processed. data\_output[7:0]: The output data after passing through the buffer.

buf\_driver\_in[7:0]: Data entering the buffer driver. ring\_counter\_out[3:0]: The ring counter's output, managing the sequence of data transfer.

dpram\_out[7:0]: The data stored in the dual-port
RAM before output.

# B. Before Optimization power report:

The Fig.8 shows the power consumption of the delay buffer design before applying the gated driver tree technique.



Fig.8. Power report of Existing Method

The total power consumption was 4.549 W in existing method and gives following results:

- 1.Most of the power was used by signal transitions (2.798 W, 63%) and logic operations (1.591 W, 36%).
- 2. The power was mainly dynamic (98%), meaning that switching activity was a major contributor to power usage.

## C.After Optimization simulation results:

The Fig.9 images shows simulation results of the delay buffer after optimization.



Fig.9. Simulation Results of Proposed Method

The Proposed Delay Buffer Timing Diagram explains the following inputs and outputs:

clk: System clock signal.

en (Enable): Activation signal for data processing. Initialize1 & Initialize2: Initialization signals for the system.

data\_input[7:0]: Input data.

data output[7:0]: Final processed output.

gated\_driver\_in[7:0]: Input data after passing through the gated driver.

*ring\_counter\_out[31:0]:* Ring counter sequence controlling buffer operations.

dpram\_out[7:0]: Data stored in memory before reaching output.

## D.After Optimization power report:

The Fig.10 shows the power consumption of the delay buffer design after applying the gated driver tree technique.



Fig.10. Power report of Existing Method

The total power consumption reduced significantly to 0.346 W and gives following results:

- 1.The power consumed by signals dropped to 0.01 W (4%), and logic power reduced to 0.002 W (1%).
- 2.The majority of power (95%) was consumed by I/O operations (0.252 W).  $\square$
- 3. This shows that the gated driver tree technique effectively reduced dynamic power consumption.

#### V. CONCLUSION

In this paper,we compared the traditional delay buffer design with a new method called the Gated Driver Tree (GDT) to reduce power consumption. The existing buffers consume more power because they are always active, even when there is no need to switch signals. Our proposed method solves this problem by adding a gate control, which turns off the signal path when it is not needed. This helps in saving both dynamic and leakage power. The proposed buffer uses less power while maintaining good performance. Also, the new design works well even for longer interconnects and in systems where power-saving is very important, such as in mobile or battery-operated devices.

Table 1: Comparison of on-chip power between existing and proposed methods

| Existing method (Power in watts(W)) | Proposed method<br>(Power in watts(W)) |
|-------------------------------------|----------------------------------------|
| 4.549W                              | 0.346W                                 |

#### REFERENCES

- [1] W. Eberle et al., "80-Mb/s QPSK and 72-Mb/s 64-QAM flexible and scalable digital OFDM transceiver ASICs for wireless local area networks in the 5-GHz band," IEEE J. Solid-State Circuits, vol. 36, no.11, pp. 1829–1838, Nov. 2001.
- [2] M. L. Liou, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh, "Design of an OFDM baseband receiver with space diversity," IEE Proc. Commun., vol. 153, no. 6, pp. 894–900, Dec. 2006.
- [3] G. Pastuszak, "A high-performance architecture for embedded block coding in JPEG 2000," IEEE Trans. Circuits Syst. Video Technol., vol.15, no. 9, pp. 1182–1191, Sep. 2005.
- [4] W. Li and L. Wanhammar, "A pipeline FFT processor," in Proc. Workshop Signal Process. Syst. Design Implement., 1999, pp. 654–662.
- [5] E. K. Tsern and T. H. Meng, "A low-power video-rate pyramid VQ decoder," IEEE J. Solid-State Circuits, vol. 31, no. 11, pp. 1789– 1794, Nov. 1996.
- [6] N. Shibata, M. Watanabe, and Y. Tanabe, "A current-sensed high-speed and low-power firstin-first-out memory using a wordline/bitlineswapped dual-port SRAM cell," IEEE J. Solid-State circuits, vol.37, no. 6, pp. 735–750, Jun. 2002.
- [7] E. Sutherland, "Micropipelines," Commun. ACM, vol. 32, no. 6, pp.720–738, Jun. 1989.

- [8] R. Hosain, L. D. Wronshi, and A. albicki, "Low power design using double edge triggered flip-flop," IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 2, no. 2, pp. 261–265, Jun. 1994.
- [9] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N.Vallepalli, Y. Wang, B. Zheng, and M. Bohr, "SRAM design on 65-nm CMOS technology with dynamic sleep transistor for leakage reduction," IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 895–901, Apr.2005.