A Generalized Algorithm and Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT

C. Thejaswi¹, G. Ramanjaneya Reddy²

¹M.Tech., PG Scholar, Gouthami Institute of Technology & Management for Women, Proddatur
²Assistant Professor, Gouthami Institute of Technology & Management for Women, Proddatur

Abstract- Approximation of Discrete Cosine Transform (DCT) is useful for reducing its computational complexity without significant impact on its coding performance. Most of the existing algorithms for approximation of the DCT target only the DCT of small transform lengths, and some of them are non-orthogonal. This paper presents a generalized recursive algorithm to obtain orthogonal approximation of DCT where an approximate DCT of length could be derived from a pair of DCTs of length at the cost of additions for input preprocessing. We perform recursive sparse matrix decomposition and make use of the symmetries of DCT basis vectors for deriving the proposed approximation algorithm. Proposed algorithm is highly scalable for hardware as well as software implementation of DCT of higher lengths, and it can make use of the existing approximation of 8-point DCT to obtain approximate DCT of any power of two lengths. We demonstrate that the proposed approximation of DCT provides comparable or better image and video compression performance than the existing approximation methods. It is shown that proposed algorithm involves lower arithmetic complexity compared with the other existing approximation algorithms. We have presented a fully scalable reconfigurable parallel architecture for the computation of approximate DCT based on the proposed algorithm. One uniquely interesting feature of the proposed design is that it could be configured for the computation of a 32-point DCT or for parallel computation of two 16-point DCTs or four 8-point DCTs with a marginal control overhead. The proposed architecture is found to offer many advantages in terms of hardware complexity, regularity and modularity.

I. INTRODUCTION

VLSI stands for "Very Large Scale Integration". This is the field which involves packing more and more logic devices into smaller and smaller areas. VLSI Circuits that would have taken boardful of space can now be put into a small space few millimeters across. This has opened up a big opportunity to do things that were not possible before. VLSI circuits are everywhere like computer, car, brand new state-of-the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise on many fronts within the same field, which we will look at in later sections. VLSI has been around for a long time, but as a side effect of advances in the world of computers, there has been a dramatic proliferation of tools that can be used to design VLSI circuits. Alongside, obeying Moore's law, the capability of an IC has increased exponentially over the years, in terms of computation power, utilization of available area. The combined effect of these two advances is that people can now put diverse functionality into the ICs, opening up new frontiers. Examples are embedded systems, where intelligent devices are put inside everyday objects, and ubiquitous computing where small computing devices proliferate to such an extent that even the shoes you wear may actually do something useful like monitoring your heartbeats. Integrated circuit (IC) technology is the enabling technology for a whole host of innovative devices and systems that have changed the way we live. VLSI systems are much smaller and consume less power than the discrete components used to build electronic systems before the 1960s. Integration allows us to build systems with many more transistors, allowing much more computing power to be applied to solving a problem. Integrated circuits are also much easier to design and manufacture and are more reliable than discrete systems; that makes it
possible to develop special-purpose systems that are more efficient than general-purpose computers for the task at hand.

Electronic systems now perform a wide variety of tasks in daily life. Electronic systems in some cases have replaced mechanisms that operated mechanically, hydraulically, or by other means; electronics are usually smaller, more flexible, and easier to service. In other cases electronic systems have created totally new applications. Electronic systems perform a variety of tasks, some of them visible, some more hidden:

- Personal entertainment systems such as portable MP3 players and DVD players perform sophisticated algorithms with remarkably little energy.
- Electronic systems in cars operate stereo systems and displays. They also control fuel injection systems, adjust suspensions to varying terrain, and perform the control functions required for Anti-lock Braking (ABS) Systems.
- Digital electronics compress and decompress video, even at high definition data rates, on-the-fly in consumer electronics.
- Low-cost terminals for Web browsing still require sophisticated electronics, despite their dedicated function.
- Personal computers and workstations provide word-processing, financial analysis, and games. Computers include both Central Processing Units (CPUs) and special-purpose hardware for disk access, faster screen display etc.

Medical electronic systems measure bodily functions and perform complex processing algorithms to warn about unusual conditions. The availability of these complex systems, from overwhelming consumers, only creates demand for even more complex systems. The growing sophistication of applications continually pushes the design and manufacturing of integrated circuits and electronic systems to new levels of complexity. And perhaps the most amazing characteristic of this collection of systems is its variety as systems become more complex, we build not a few general-purpose computers but an ever wider range of special-purpose systems. Our ability to do so is a testament to our growing mastery of both integrated circuit manufacturing and design, but the increasing demands of customers continue to test the limits of design and manufacturing.

The FPGA industry sprouted from Programmable Read-Only Memory (PROM) and Programmable Logic Devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory or in the field (field programmable), however programmable logic was hard-wired between logic gates. In the late 1980s the Naval Surface Warfare Department funded an experiment proposed by Steve Casselman to develop a computer that would implement 600,000 reprogrammable gates. Casselman was successful and a patent related to the system was issued in 1992. Some of the industry’s foundational concepts and technologies for programmable logic arrays, gates, and logic blocks are founded in patents awarded to David W. Page and LuVerne R. Peterson in 1985. Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the first commercially viable field programmable gate array in 1985 – the XC2064. The XC2064 had programmable gates and programmable interconnects between gates, the beginnings of a new technology and market. The XC2064 boasted a mere 64 Configurable Logic Blocks (CLBs), with two 3-input Look Up Tables (LUTs). More than 20 years later, Freeman was entered into the National Inventors Hall of Fame for his invention. Xilinx continued unchallenged and quickly growing from 1985 to the mid-1990s, when competitors sprouted up, eroding significant market-share. By 1993, Actel was serving about 18 percent of the market. The 1990s were an explosive period of time for FPGAs, both in sophistication and the volume of production. In the early 1990s, FPGAs were primarily used in telecommunications and networking. By the end of the decade, FPGAs found their way into consumer, automotive, and industrial applications. FPGAs got a glimpse of fame in 1997, when Adrian Thompson, a researcher working at the University of Sussex, merged genetic algorithm technology and FPGAs to create a sound recognition device. Thomson’s algorithm configured an array of 10 x 10 cells in a Xilinx FPGA chip to discriminate between two tones, utilizing analogue features of the digital chip. The application of genetic algorithms to the configuration of devices like FPGAs is now referred to as Evolvable hardware. A recent trend has been to take the coarse-grained architectural approach a step
further by combining the logic blocks and interconnects of traditional FPGAs with embedded microprocessors and related peripherals to form a complete "system on a programmable chip". This work mirrors the architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems Group which combined a reconfigurable CPU architecture on a single chip called the SB24. That work was done in 1982. Examples of such hybrid technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or more PowerPC processors embedded within the FPGA's logic fabric. The Atmel FPSLIC is another such device, which uses an AVR processor in combination with Atmel's programmable logic architecture. The Actel Smart Fusion devices incorporate an ARM architecture Cortex-M3 hard processor core (with up to 512kB of flash and 64kB of RAM) and analog peripherals such as a multi-channel ADC and DACs to their flash-based FPGA fabric. In 2010, an extensible processing platform was introduced for FPGAs that fused features of an ARM high-end microcontroller (hard-core implementations of a 32-bit processor, memory, and I/O) with an FPGA fabric to make FPGAs easier for embedded designers to use. By incorporating the ARM processor-based platform into a 28 nm FPGA family, the extensible processing platform enables system architects and embedded software developers to apply a combination of serial and parallel processing to address the challenges they face in designing today's embedded systems, which must meet ever-growing demands to perform highly complex functions. By allowing them to design in a familiar ARM environment, embedded designers can benefit from the time-to-market advantages of an FPGA platform compared to more traditional design cycles associated with ASICs. An alternate approach to using hard macro processors is to make use of soft processor cores that are implemented within the FPGA logic. MicroBlaze and Nios II are examples of popular soft core processors. As previously mentioned, many modern FPGAs have the ability to be reprogrammed at "run time" and this is leading to the idea of reconfigurable computing or reconfigurable systems CPUs that reconfigure themselves to suit the task at hand. Additionally, new, non-FPGA architectures are beginning to emerge. Software-configurable microprocessors such as the Stretch S5000 adopt a hybrid approach by providing an array of processor cores and FPGA-like programmable cores on the same chip.

II. LITERATURE SURVEY

A New Distributed Arithmetic Architecture (NEDA) is presented in this paper. NEDA is a low power optimized architecture based on the distributed arithmetic paradigm. In addition to low power performance, NEDA offers high speed and reduced area. In NEDA, inner product computational module has been proved, mathematically, to require only additions. Moreover, minimum number of additions is used by exploiting the redundancy in the adder array. Such properties have made a NEDA unit a basic computational module for high performance DSP architectures. A case study of DCT NEDA-based architecture is analyzed. Savings exceeding 88% are achieved for the DCT implementation. Distributed Arithmetic (DA) was introduced about two decades ago and has since enjoyed widespread applications in VLSI implementations of Digital Signal Processing (DSP) architectures. A significant percentage of commercial DSP chips employ DA approach. Most of these applications are arithmetic intensive with Multiply/Accumulate (MAC) being the predominant operation. The main advantage of DA approach is that it speeds up the multiply process by pre-computing all the possible product values and storing these values in a ROM. The input data can then be used to directly address the memory and the result. Unfortunately, the size of ROM grows exponentially when the number of inputs and internal precision increase the possible combinations of bit patterns exhibited by the input signals have to be accommodated. In practice, DA often appears in the form of multiple ROM's of much smaller size coupled with conventional MAC structures. A class of practical fast algorithms is introduced for the Discrete Cosine Transform (DCT). For an 8-point DCT only 11 multiplications and 29 additions are required. A systematic approach is presented for generating the different members in this class, all having the same minimum arithmetic complexity. The structure of many of the published algorithms can be found in members of this class. An extension of the algorithm to longer transformations is presented. The resulting 16-point DCT requires only 31 multiplications and 81 additions, which is to the
authors knowledge, less than required by previously published algorithms. One promising solution to reduce the computational complexity of Discrete Cosine Transform (DCT) is to identify the redundant computations and to get rid of them. In this study, the authors present a new method to predict zero-quantized DCT coefficients for efficient implementation of intra-frame video encoding by identifying such redundant computations. Traditional methods use the Gaussian statistical model of residual pixels to predict all-zero or partial-zero blocks. The proposed method is based on two key ideas. At first, the bounds of DCT coefficients are derived from the intermediate signals of the Loeffler DCT algorithm instead of calculating the Sum of Absolute Difference (SAD) of residual pixels. The sufficiency conditions are then suitably chosen to predict the zero-quantized coefficients to reduce the arithmetic complexity without degrading the video quality. Simulation results are found to validate the analytical model and show that the proposed prediction eliminates more redundant computations than the existing methods. Moreover, the authors have derived a pipelined VLSI architecture of the proposed prediction scheme which offers a saving of more than 63 and 91% of multiplications of the second stage of one-dimensional DCT for high and low bit-rate intra-video encoding, respectively. In this paper, a systematic method for developing a binary version of a given transform by using the Walsh-Hadamard Transform (WHT) is proposed. The resulting transform approximates the underlying transform very well, while maintaining all the advantages and properties of WHT. The method is successfully applied for developing a Binary Discrete Cosine Transform (BDCT) and a binary discrete Hartley transform (BDHT). It is shown that the resulting BDCT corresponds to the well-known sequency-ordered WHT, whereas the BDHT can be considered as a new Hartley-ordered WHT. Specifically, the properties of the proposed Hartley-ordering are discussed and a shift-copy scheme is proposed for a simple and direct generation of the Hartley-ordering functions. For software and hardware implementation purposes, a unified structure for the computation of the WHT, BDCT, and BDHT is proposed by establishing an elegant relationship between the three transform matrices. In addition, a spiral-ordering is proposed to graphically obtain the BDHT from the BDCT and vice versa. The application of these binary transforms in image compression, encryption and spectral analysis clearly shows the ability of the BDCT (BDHT) in approximating the DCT (DHT) very well. A low-complexity 8-point orthogonal approximate Discrete Cosine Transform (DCT) is introduced. The proposed transform requires no multiplications or bit-shift operations. The derived fast algorithm requires only 14 additions, less than any existing DCT approximation. Moreover, in several image compression scenarios, the proposed transform could outperform the well-known signed DCT, as well as state-of-the-art algorithms. An orthogonal approximation for the 8-point Discrete Cosine Transform (DCT) is introduced. The proposed transformation matrix contains only zeros and ones multiplications and bit-shift operations are absent close spectral behavior relative to the DCT was adopted as design criterion. The proposed algorithm is superior to the signed discrete cosine transform. It could also outperform state-of-the-art algorithms in low and high image compression scenarios, exhibiting at the same time a comparable computational complexity. An efficient 8 times 8 sparse orthogonal transform matrix is proposed for image compression by appropriately introducing some zeros in the 8 times 8 Signed Discrete Cosine Transform (SDCT) matrix. An algorithm for its fast computation is also developed. It is shown that the proposed transform provides a 25% reduction in the number of arithmetic operations with a performance in image compression that is much superior to that of the SDCT and comparable to that of the approximated discrete cosine transform. Advances in video compression technology have been driven by ever-increasing processing power available in software and hardware. The emerging High Efficiency Video Coding (HEVC) standard aims to provide a doubling in coding efficiency with respect to the H.264/AVC high profile, delivering the same video quality at half the bit rate. In this paper, complexity-related aspects that were considered in the standardization process are described. Furthermore, profiling of reference software and optimized software gives an indication of where HEVC may be more complex than its predecessors and where it may be simpler. Overall, the complexity of HEVC decoders does not appear to be
significantly different from that of H.264/AVC decoders; this makes HEVC decoding in software very practical on current hardware. HEVC encoders are expected to be several times more complex than H.264/AVC encoders and will be a subject of research in years to come. The High Efficiency Video Coding (HEVC) standard is the most recent joint video project of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) standardization organizations, working together in a partnership known as the Joint Collaborative Team on Video Coding (JCT-VC). The first edition of the HEVC standard is expected to be finalized in January 2013, resulting in an aligned text that will be published by both ITU-T and ISO/IEC. Additional work is planned to extend the standard to support several additional application scenarios, including extended-range uses with enhanced precision and color format support, scalable video coding, and 3-D/stereo/multiview video coding. Video coding standards have evolved primarily through the development of the well-known ITU-T and ISO/IEC standards. The ITU-T produced H.261 and H.263, ISO/IEC produced MPEG-1 and MPEG-4 Visual, and the two organizations jointly produced the H.262/MPEG-2 Video and H.264/MPEG-4 Advanced Video Coding (AVC) standards. The two standards that were jointly produced have a particularly strong impact and have found their way into a wide variety of products that are increasingly prevalent in our daily lives. Throughout this evolution, continued efforts have been made to maximize compression capability and improve other characteristics such as data loss robustness, while considering the computational resources that were practical for use in products at the time of anticipated deployment of each standard. The major video coding standard directly preceding the HEVC project was H.264/MPEG-4 AVC, which was initially developed in the period between 1999 and 2003, and then was extended in several important ways from 2003–2009. H.264/MPEG-4 AVC has been an enabling technology for digital video in almost every area that was not previously covered by H.262/MPEG-2 Video and has substantially displaced the older standard within its existing application domains. It is widely used for many applications, including broadcast of High Definition (HD) TV signals over satellite, cable, and terrestrial transmission systems, video content acquisition and editing systems, camcorders, security applications, Internet and mobile network video, Blu-ray Discs, and real-time conversational applications such as video chat, video conferencing, and telepresence systems. However, an increasing diversity of services, the growing popularity of HD video, and the emergence of beyond HD formats (e.g., 4k×2k or 8k×4k resolution) are creating even stronger needs for coding efficiency superior to H.264/ MPEG-4 AVC’s capabilities. The need is even stronger when higher resolution is accompanied by stereo or multiview capture and display. Moreover, the traffic caused by video applications targeting mobile devices and tablet PCs, as well as the transmission needs for video-on-demand services, imposing severe challenges on today’s networks. An increased desire for higher quality and resolutions is also arising in mobile applications. HEVC has been designed to address essentially all existing applications of H.264/MPEG-4 AVC and to particularly focus on two key issues: increased video resolution and increased use of parallel processing architectures. The syntax of HEVC is generic and should also be generally suited for other applications that are not specifically mentioned above. As has been the case for all past ITU-T and ISO/IEC video coding standards, in HEVC only the bit stream structure and syntax is standardized, as well as constraints on the bit stream and its mapping for the generation of decoded pictures. The mapping is given by defining the semantic meaning of syntax elements and a decoding process such that every decoder conforming to the standard will produce the same output when given a bit stream that conforms to the constraints of the standard. This limitation of the scope of the standard permits maximal freedom to optimize implementations in a manner appropriate to specific applications (balancing compression quality, implementation cost, time to market, and other considerations). However, it provides no guarantees of end-to-end reproduction quality, as it allows even crude encoding techniques to be considered conforming. To assist the industry community in learning how to use the standard, the standardization effort not only includes the development of a text specification document, but also reference software source code as an example of how HEVC video can be encoded and decoded. The draft reference
software has been used as a research tool for the internal work of the committee during the design of the standard, and can also be used as a general research tool and as the basis of products. A standard test data suite is also being developed for testing conformance to the standard. The case for all past ITU-T and ISO/IEC video coding standards, in HEVC only the bit stream structure and syntax is standardized, as well as constraints on the bit stream and its mapping for the generation of decoded pictures. The mapping is given by defining the semantic meaning of syntax elements and a decoding process such that every decoder conforming to the standard will produce the same output when given a bit stream that conforms to the constraints of the standard. This limitation of the scope of the standard permits maximal freedom to optimize implementations in a manner appropriate to specific applications (balancing compression quality, implementation cost, time to market, and other considerations). However, it provides no guarantees of end-to-end reproduction quality, as it allows even crude encoding techniques to be considered conforming. To assist the industry community in learning how to use the standard, the standardization effort not only includes the development of a text specification document, but also reference software source code as an example of how HEVC video can be encoded and decoded. The draft reference software has been used as a research tool for the internal work of the committee during the design of the standard, and can also be used as a general research tool and as the basis of products. A standard test data suite is also being developed for testing conformance to the standard.

III. PROPOSED SYSTEM

The Discrete Cosine Transform (DCT) is popularly used in image and video compression. Since the DCT is computationally intensive, several algorithms have been proposed in the literature to compute it efficiently. Recently, significant work has been done to derive approximate of 8-point DCT for reducing the computational complexity. The main objective of the approximation algorithms is to get rid of multiplications which consume most of the power and computation-time, and to obtain meaningful estimation of DCT as well. Haweel has proposed the signed DCT (SDCT) for 8x8 blocks where the basis vector elements are replaced by their sign. They have provided a good estimation of the DCT by replacing the basis vector elements by 0, 1/2, 1. In the same vein, Bayer and Cintra have proposed two transforms derived from 0 and 1 as elements of transform kernel, and have shown that their methods perform better than the method in particularly for low and high compression ratio scenarios. The need of approximation is more important for higher-size DCT since the computational complexity of the DCT grows nonlinearly. On the other hand, modern video coding standards such as High Efficiency Video Coding (HEVC) uses DCT of larger block sizes (up to 32x32) in order to achieve higher compression ratio. But the extension of the design strategy used in H264 AVC for larger transform sizes, such as 16-point and 32-point is not possible. Besides, several image processing applications such as tracking and simultaneous compression and encryption require higher DCT sizes. In this context, Cintra has introduced a new class of integer transforms applicable to several block-lengths. Cintra have proposed a new 16x16 matrix also for approximation of 16-point DCT, and have validated it experimentally. Recently, two new transforms have been proposed for 8-point DCT approximation. Cintra have proposed a low-complexity 8-point approximate DCT based on integer functions and Potluri have proposed a novel 8-point DCT approximation that requires only 14 addition on the other hand, Bouguezel have proposed two methods for multiplication-free approximate form of DCT. The first method is for length 16 and 32 and is based on the appropriate extension of integer DCT. Also, a systematic method for developing a binary version of high-size DCT (BDCT) by using the Sequence Ordered Walsh-Hadamard Transform (SO-WHT) is proposed in this transform is a permutated version of the WHT which approximates the DCT very well and maintains all the advantages of the WHT. A scheme of approximation of DCT should have the following features i.e. it should have low computational complexity. It should have low error energy in order to provide compression performance close to the exact DCT, and preferably should be orthogonal. It should work for higher lengths of DCT to support modern video coding standards, and other applications like tracking, surveillance, and
simultaneous compression and encryption. But the existing DCT algorithms do not provide the best of all the above three requirements. Some of the existing methods are deficient in terms of scalability, generalization for higher sizes, and orthogonality. We intend to maintain orthogonality in the approximate DCT for two reasons. Firstly, if the transform is orthogonal, we can always find its inverse, and the kernel matrix of the inverse transform is obtained by just transposing the kernel matrix of the forward transform. This feature of inverse transform could be used to compute the forward and inverse DCT by similar computing structures. Moreover, in case of orthogonal transforms, similar fast algorithms are applicable to both forward and inverse transforms. In this paper, we propose an algorithm to derive approximate form of DCTs which satisfy all the three features. We obtain the proposed approximate form of DCT by recursive decomposition of sparse DCT matrix. It is observed that proposed algorithm involves less arithmetic complexity than the existing DCT approximation algorithms. The proposed approximate form of DCT of different lengths is orthogonal, and result in lower error energy compared to the existing algorithms for DCT approximation. The decomposition process allows generalization of the proposed transform for higher-size DCTs. Interestingly, proposed algorithm is easily scalable for hardware as well as software implementation of DCT of higher lengths, and it can make use of the best of the existing approximations of 8-point DCT. Based on the proposed algorithm, we have proposed a fully scalable, reconfigurable, and parallel architecture for approximate DCT computation. One uniquely interesting feature of proposed design is that the structure for the computation of 32-point DCT could be configured for parallel computation of two 16-point DCTs or four 8-point DCTs. The proposed algorithm is found to be better than the existing methods in terms of energy compaction and hardware complexity.

DCT approximations are 60, 152, and 368 additions, respectively. More generally, the arithmetic complexity of 8-point DCT is equal to additions. Moreover, since the structures for the computation of DCT of different lengths are regular and scalable, the computational time for DCT coefficients can be found to be where the addition time. The number of arithmetic operations involved in proposed DCT approximation of different lengths and those of the existing competing approximations are shown in Table 1. It can be found that the proposed method requires the lowest number of additions, and does not require any shift operations. Note that shift operation does not involve any combinational components, and requires only rewiring during hardware implementation. But it has indirect contribution to the hardware complexity since shift-add operations lead to increase in bit-width which leads to higher hardware complexity of arithmetic units which follow the shift-add operation. Also, we note that all considered approximation methods involve significantly less computational complexity over that of the exact DCT algorithms. According to the Loeffler algorithm, the exact DCT computation requires 29, 81, 209, and 513 additions along with 11, 31, 79, and 191 multiplications, respectively for 8, 16, 32, and 64-point DCTs. Pipelined and non-pipelined designs of different methods are developed, synthesized and validated using an integrated logic analyzer. The validation is carried out by using the Digilent EB of Spartan6-LX45. We have used 8-bit
inputs, and we have allowed the increase of output size (without any truncations).

Proposed reconfigurable architecture for approximate DCT of lengths and 16.

As specified in the recently adopted HEVC, DCT of different lengths such as 16, 32 are required to be used in video coding applications. Therefore, a given DCT architecture should be potentially reused for the DCT of different lengths instead of using separate structures for different lengths. We propose here such reconfigurable DCT structures which could be reused for the computation of DCT of different lengths the reconfigurable architecture for the implementation.

IV. RESULTS

This section evaluates the performance of the proposed modified least mean square (LMS) algorithm and shows the simulation results. The first result declares about the output of LMS adaptive filter with delay. It is having some delay in the output of Least Mean Square adaptive filter. And the second result declares about the output of LMS adaptive filter without delay. After the clock input has given the output of the adaptive filter is achieved without delay. The Model SIM is the tool used here to check the performance of LMS adaptive filter. It is a complete HDL simulation environment that enables to verify the source code and functional and timing models using test bench.

| Design Summary |
VI. CONCLUSION

In this paper, we have proposed a recursive algorithm to obtain orthogonal approximation of DCT where approximate DCT of length \( N \) could be derived from a pair of DCTs of length \( N/2 \) at the cost of \( N \) additions for input preprocessing. The proposed approximated DCT has several advantages, such as of regularity, structural simplicity, lower-computational complexity, and scalability. Comparison with recently proposed competing methods shows the effectiveness of the proposed approximation in terms of error energy, hardware resources consumption, and compressed image quality. We have also proposed a fully scalable reconfigurable architecture for approximate DCT computation where the computation of 32-point DCT could be configured for parallel computation of two 16-point DCTs or four 8-point DCTs.

REFERENCE


[17] U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera, and A. Edirisurya, “Improved 8-point approximate DCT for image and video compression requiring only 14


