This study looks into the difficulties of using the Distributed Arithmetic (DA) method for Finite Impulse Response (FIR) filters on Field-Programmable Gate Arrays (FPGAs). It focuses on how coefficients are represented, balancing precision using fixed-point arithmetic and quantization. The research explores ways to optimize memory use, aiming to store data more efficiently within FPGA resources and reduce memory needs. To speed up computations, it examines how to make accessing lookup tables faster and suggests improvements in design. The study also considers how to manage FPGA resources effectively, balancing latency, throughput, and resource use. It looks at specific improvements to the DA method for FIR filters to make better use of resources and enhance performance. Overall, this work provides insights and solutions for algorithm design, memory use, lookup table speed, and FPGA architecture to make DA-based FIR filter implementations on FPGAs more efficient.
Introduction
I. INTRODUCTION
Digital signal processing (DSP) relies heavily on Finite Impulse Response (FIR) filters for essential tasks. Designing these filters efficiently is crucial, and one
innovative method is using an Improved Differential Evolution Algorithm (DA) to optimize filter coefficients. This combined approach aims to enhance filter performance by reducing passband ripples and stopband distortions. The process starts with simulating the improved DA algorithm in MATLAB or Python to optimize the filter coefficients according to specific criteria. Once optimized, the next step is to implement these coefficients on an FPGA (Field-Programmable Gate Array) using hardware description languages (HDL) like Verilog or VHDL. This step involves creating a hardware architecture that realizes the FIR filter's functionality on the FPGA platform.
The process is iterative and includes testing, validation, and refinement to ensure the filter meets stringent specifications. It combines principles of DSP, optimization techniques, and FPGA design skills to achieve a high-performance FIR filter implementation.
The workflow includes the following key components:
FIR Filter Design using DA: This involves designing the FIR filter using the Differential Evolution algorithm.
DA Algorithm Optimization: This stage is dedicated to refining the filter's coefficients using the Differential Evolution algorithm.
FPGA Implementation & Hardware: This involves implementing the designed FIR filter onto FPGA hardware, including the necessary configurations and connections for execution.
Input/Output Interface Block: This represents the interfaces for inputting data into the filter and retrieving the processed output signals.
This synergy between DSP and evolutionary algorithms, facilitated by tools like MATLAB or Python, and the subsequent FPGA implementation using HDL, ensures an efficient and high-performing FIR filter.
In recent years, the trend of implementing digital signal processing (DSP) functions, like FIR filters, in Field Programmable Gate Arrays (FPGAs) has grown due to their wide applications in video, audio signal processing, and telecommunications. Traditional FIR filters require multiple multiply-and-accumulate (MAC) blocks, which are resource-intensive and complex to implement in FPGAs. To address this, a multiplier-less architecture called Distributed Arithmetic (DA) has been developed.
DA decomposes MAC operations into a series of lookup table (LUT) accesses and summations, significantly reducing the need for multipliers. However, DA's traditional implementation requires large LUTs, especially for high-order filters, leading to impractical memory sizes. This challenge is mitigated by splitting large LUTs into smaller ones, using offset binary coding, and exploiting symmetrical properties to reduce memory size.
Despite these optimizations, DA still faces challenges as the number of filter taps increases, particularly when the number of taps is a prime number. To overcome these issues, the proposed hardware-efficient DA architecture further reduces LUT sizes and modifies filter structures to enhance speed and reduce resource usage. This new architecture, implemented on a 4VLX40FF668 FPGA device and synthesized with ISE 7.1, demonstrates improved speed and reduced resource consumption compared to previous DA implementations.
A. Objective & Goal
Traditionally, implementing a K-tap FIR filter directly on an FPGA involves using K multiply-and-accumulate (MAC) blocks. These blocks are demanding in terms of resources and logic complexity. To overcome this challenge, a multiplier-free architecture known as distributed arithmetic (DA) has been introduced.
DA substitutes the multiplication operations in the MAC process with a sequence of lookup table (LUT) accesses and additions, employing a bit-serial approach to calculate the inner product of two vectors within a fixed number of cycles. The initial DA architecture stores every possible binary combination of the filter coefficients in a memory or LUT. However, as the number of taps (L) increases, the memory size required grows exponentially, making it impractically large.
To address this, DA employs 2's complement binary representation, allowing pre-computation and efficient storage of data in LUTs. This characteristic makes DA particularly suitable for LUT-based FPGA architectures. However, DA faces a significant issue: the LUT capacity required increases exponentially with the number of filter taps. Specifically, DA implementations need 2^K words of memory, where K is the number of taps in the filter. This requirement becomes even more challenging if K is a prime number, leading to higher hardware resource consumption.
In summary, DA provides an efficient method for implementing FIR filters in FPGAs, but the exponential increase in LUT requirements with higher filter orders remains a major obstacle.
II. LITERATURE SURVEY
In their paper published on August 7, 2017, Sumbal Zahoor, Shahzad Naseem, and Wei Meng (Reviewing Editor) propose a novel approach for designing and implementing efficient FIR digital filters. The paper discusses various design methods for FIR filters using different window functions. It examines and compares bandpass filters of orders 38 and 48, concluding that the Kaiser window yields superior results.[1]
In April 2017, S. R. Reddy and P. Jayakrishnan presented their work on designing FIR filters using an improved distributed arithmetic (DA) approach for high-speed ground-penetrating radar (GPR) systems. Their paper introduces an enhanced DA method to implement high-order digital FIR filters with reduced logical delay and hardware usage. The process begins with the design of a parallel DA, which is then improved through LUT decomposition. The improved DA FIR filters are subsequently implemented on a Xilinx Kintex-7 FPGA chip and used to process radar signals in high-speed GPR systems.[2]
Bo Hong proposes the implementation of FIR filters on FPGA using the DA-OBC algorithm, emphasizing that digital filters are fundamental components in numerous digital signal processing systems. These filters have extensive applications in communication,
image processing, and pattern recognition. The hardware implementation of FIR filters utilizes FPGA chips, employing the Kaiser window for design. The unit impulse response, h(n), is calculated using MATLAB software.[3]
In 2005, Qi Yue and colleagues proposed a low-power FIR filter design based on standard cells. Their paper compares three low-power schemes for the multi-hierarchy pipeline design of fixed-point finite impulse response (FIR) digital filters. They adopt an optimal Canonical Signed Digit (CSD) encoding method to minimize the number of adders and subtractors in the design. Additionally, they designed a 16-bit, 16-tap low-pass FIR filter to evaluate the performance of the three different algorithms.[4]
In March 2014, Kumari K. Dhobi, Dr. K. R. Bhatt, and Dr. Y. B. Shukla proposed the FPGA implementation of FIR filters using various algorithms in their retrospective study. They noted that the distributed arithmetic (DA) structure is easy to implement on FPGAs due to pre-calculated results stored in LUTs, simplifying the design process. They highlighted FPGA as an effective solution for realizing digital signal processing filters. Their simulation results indicated that the Kaiser window method is the most effective windowing technique.[5]
A. Background
Traditional implementations of the finite impulse response (FIR) filter equation typically rely on L multiply-accumulate (MAC) units. However, integrating multipliers directly using FPGA logic fabric incurs high costs in terms of logic complexity and area utilization, particularly with larger filter sizes. While modern FPGAs offer dedicated DSP blocks to mitigate this issue, challenges persist in reducing area and complexity for very large filters.
An alternative approach is to decompose the MAC operations into a sequence of lookup table (LUT) accesses and summations, known as distributed arithmetic (DA). This strategy offers a more efficient utilization of resources and can be particularly advantageous in FPGA implementations.
III. IMPLEMENTATION OF FIR FILTER USING DISTRIBUTED ARITHMETIC
A. Distributed Arithmetic FIR Filter Architecture
The Distributed Arithmetic (DA) architecture stands out as one of the most renowned methods for implementing Finite Impulse Response (FIR) filters. It addresses the computation of the inner product equation, particularly advantageous when the filter coefficients are known beforehand, as is the case with FIR filters.
An FIR filter of length K is described as:
Conclusion
This paper introduces innovative Distributed Arithmetic (DA) architectures tailored for high-order filters. These architectures achieve remarkable reductions in memory usage by halving it at each iteration of LUT reduction, albeit with a minor trade-off in system frequency. Furthermore, the paper proposes partitioning high-order filters into smaller groups, enabling a reduction in LUT size. To achieve high-speed implementation of FIR filters, the paper advocates for adopting a full-parallel version of the DA architecture. Through successful implementation, the paper demonstrates the effectiveness of these DA architectures. Specifically, it showcases the efficient realization of a 70-tap full-parallel DA filter using both an original and a modified DA architecture on a 4VLX40FF668 FPGA device. This implementation underscores the hardware efficiency of the proposed DA architectures for FPGA-based applications.
References
[1] Cui W. “Design and Analysis of High Performance FPGA System Timing. Microcomputer Applications”.2014(2)
[2] Fan K.Y et al.”Optimization of FIR filter design scheme based on FPGA”.Research and Exploration in Laboratory.2014,33(5):91- 95.DOI:10.3969/j.issn.1006-7167.2014.05.023.
[3] Li X&Y.T Jiang. “The Digital Filter with Finite Unit Impulse Response Achieved by DSP Window Function Method”. Journal of Science and Economic Market. 2011
[4] Lv W et al. “Optimization Design of FIR Filters Based on FPGA. TV Technology”. 2014,05:71-73.
[5] Qu S.R&J.C P. “A resource optimization algorithm for FIR filter implemented on FPGA.Electronic Design Engineering”. 21(14):147-150.DOI:10.3969/j.issn.1674-6236.2013.14.045.
[6] Wang Z&Q.N.Zhou. “Realization of high order FIR filter based on the improved DA algorithm”.Modern Electronics Technique.2014(4):8-12.DOI:10.3969/j.issn.1004-373X. 04.003.
[7] Yang W.M et al. “FPGA implementation of the distributed structure FIR filter”.Electronic teaching journal.
[8] Zhu X.X, j.Cai&W.Lu. “The design of the filter based on the optimization DA algorithm and its FPGA implementation”. Application of Electronic Technique.41(2):59-60,64.DOI: 10.16157/j.issn.0258-7998.2015.02.012
[9] Cui L&Z.X Zhang. “Realization and comparison of FIR filter based on FPGA design”. Electronic Design Engineering”.20(20):168-170.DOI:10.3969/j.issn.1674- 6236.2012.20.060.
[10] Yong Du. “The implementation of Digital filter based on Matlab and FPGA (Altrra/Verilog)”. Beijing: Electronic Industry Press, 2015.03