# INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Volume: 6 Issue: II Month of publication: February 2018 DOI: www.ijraset.com Call: © 08813907089 E-mail ID: ijraset@gmail.com ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com ### Reliable Low Power Multiplier Design Using Reduced Precision Redundancy by Wallace Architecture. P. Lakshmi Neeraja<sup>1</sup>, Ch. Rajesh Babu<sup>2</sup> <sup>1</sup>M. Tech, VLSI & ES Department of ECE, <sup>2</sup>Assistant Professor(PHD), Department of ECE Godavari Institute of Engineering and Godavari Institute of Engineering and Rajahmundry, A.P Abstract: In this paper, we have a tendency to propose a reliable low-power multiplier design by adopting Wallace multiplier design to make the reduced precision replica redundancy block (RPR). The projected Wallace multiplier factor design will meet the demand of high precision, low power consumption, and space efficiency. The partial product terms of input correction vector and minor input correction vector to lower the truncation errors, the hardware complexness of error compensation circuit may be simplified. In a very 12 ×12 bit Wallace multiplier factor, total real time delay and power consumption in our Wallace style may be saved by 22% as compared with the state-of-art ANT Design. Keywords: Wallace multiplier, reduced-precision replica (RPR), voltage over scaling (VOS). #### I. INTRODUCTION The rising of moveable and wireless computing systems in recent years drives the necessity for ultralow power systems. To lower the facility dissipation, offer voltage scaling is wide used as a good low-power technique since the ability consumption in CMOS circuits is proportional to the sq. of offer voltage [1]. However, in deep-sub micrometer method technologies, noise interference issues have raised difficulty to style the reliable and efficient electronics systems; thus, the planning techniques to boost noise tolerance are wide developed [2]-[10], an aggressive low-power technique, observed as voltage over scaling (VOS), was planned in [4] to lower offer voltage on the far side crucial provide voltage while not sacrificing the outturn. However, VOS ends up in severe degradation in signal-to-noise (SNR). a unique Wallace tree multiplier technique [2] combined VOS main block with reducedprecision duplicate (RPR), that combats soft errors effectively whereas achieving significant energy saving. Therefore the Wallace tree style idea is more extended to system level in [10]. Another improvement within the multiplier is made by reducing the quantity of partial product generated. The Wallace tree multiplier factor is one such multiplier; it scans the 3 bits at a time to scale back the quantity of partial product. These 3 bits are: the 2 bit from this pair, and a 3rd bit from the high order little bit of an adjacent lower order try. when examining every triplet of bits, the triplets square measure born-again into a collection of 5 management signals employed by the adder cells within the array to manage the operations performed by the adder cells. to hurry up the multiplication Wallace tree secret writing performs many steps of multiplication directly. From the fundamentals of Wallace Multiplication it is verified that the addition operation is skipped if the ordered bits within the number square measure same. If three consecutive bits square measure same then addition operation is skipped, so in most of the cases the delay related to Wallace Multiplication square measure smaller than that with Array multiplier factor. The rationale is sizable amount of adder cells needed that consumes massive power. However, the RPR styles within the Wallace tree styles of [5]-[7] measure designed in an exceedingly manner, that aren't simply adopted and continual. The RPR styles within the Wallace tree styles of [8] and [9] will operate in an exceedingly in no time manner, however their hardware quality is simply too complicated. As a result, the RPR style within the Wallace style of [2] continues to be the foremost standard style due to its simplicity. However, adopting with RPR in [2] ought to still pay additional space overhead and power consumption. in this paper, we have a tendency to more planned a straightforward approach using the Wallace tree design to exchange fastened -width RPR block in [2]. The Wallace methodology, the computation error is corrected with lower power consumption and lower space overhead so as to not increase the crucial path delay, As a result, we are able to understand the design with smaller circuit space, lower power consumption, and lower crucial offer voltage. #### II. EXISTING TECHNIQUE The existing fixed-width RPR to interchange the full-width RPR block within the ant design [2], as shown in Fig. 2, which may not solely give higher computation exactitude, lower power consumption, and lower space overhead in RPR, however additionally perform with higher SNR, a lot of space efficient, lower operative offer voltage, and lower power consumption in realizing the ANT ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com design. we have a tendency to demonstrate our fixed-width RPR-based ANT design in Associate in an ANT multiplier. The fixed-width styles square measure typically applied in DSP applications to avoid infinite growth of bit dimension. alienating n-bit least significant bit (LSB) output could be a standard answer to construct a fixed-width DSP with n-bit input and n-bit output. The hardware complexness and power consumption of a fixed-width DSP is sometimes concerning half the full-length one. However, truncation of LSB half ends up in rounding error, that has to be compensated exactly, several literatures [13]–[22] are conferred to cut back the truncation error with constant correction worth [13]–[15] or with variable correction worth [16]–[22]. The circuit complexness to compensate with constant corrected worth will be easier than that of variable correction value; but, the variable correction approaches are typically a lot of to precise. Fig. 2.1 Existing ant design with fixed-width RPR. In [6]-[10], their compensation technique is to compensate the truncation error between the full-length multiplier and therefore the fixed-width multiplier. However, within the fixed-width RPR of an ant multiplier, the compensation error we need to correct is that the overall truncation error of MDSP block. in contrast to [6]-[10], our compensation technique is to compensate the truncation error between the full-length MDSP multiplier and therefore the fixed-width RPR multiplier. In today, there measure several fixedwidth multiplier factor styles applied to the full-width multipliers. The error compensation formula makes use of chance, statistics, and simple regression analysis to find the approximate compensation worth [09]. to save hardware complexness, the compensation vector within the partial product terms with the most important weight within the least significant phase is directly inject into the fixed-width RPR, that doesn't need additional compensation logic gates [10]. To further lower the compensation error, we have a tendency to conjointly contemplate the impact of truncated product with the second most significant bits on the error compensation As compared with the full-width RPR style in [10], the projected fixed-width RPR multiplier factor not only performs with higher SNR however conjointly with lower electronic equipment space and lower power consumption. Precise Error Compensation Vector for Fixed-Width RPR design within the ant style, the operate of RPR is to correct the errors occurring within the output of MDSP and maintain the SNR of whole system whereas lowering offer voltage., however conjointly accelerate the computation speed as compared with the traditional full-length RPR. However, we need to compensate large truncation error as a result of isolating several hardware components within the LSB a part of MDSP, within the MDSP of n-bit ant Baugh-Woolley array multiplier, its 2 unsigned n-bit inputs of X and Y. Under VOS, there are variety of input-dependent soft errors in its output $y_a[n]$ ; but, RPR output $y_r[n]$ continues to be correct since the crucial path delay of the Replica is smaller than $T_{samp}$ [4]. Therefore, $y_r[n]$ is applied to discover errors within the MDSP output ya[n]. Error detection is accomplished by comparison the difference of $|y_a[n]| - y_r[n]$ against a threshold Th. Once the difference between $y_a[n]$ and $y_r[n]$ is larger than Th, the output $\hat{y}[n]$ is $y_r[n]$ rather than $y_a[n]$ . As a result, ^ y[n] is expressed as $$\hat{\ } y[n] = \begin{cases} y_a[n], & if \ |y_a[n] - y_r[n]| \leq \ \mathsf{Th} \\ y_{r[n]}, & if \ |y_a[n] - y_r[n]| > \ \mathit{Th} \end{cases}$$ Th is determined by ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com Th= $max_{\forall input} | y_o [n] - y_r [n] |$ where $y_0[n]$ is error free output signal. In this way, the power consumption can be greatly lowered while the SNR can still be maintained without severe degradationThe (n/2)-bit unsigned full-width Baugh-Woolley partial product array will be divided into four subsets, that square measure most significant part (MSP),input correction vector[ICV( $\beta$ )], minor Fig.2.2. 12x12 bit ANT multiplier is implemented with the six-bit fixed width replica redundancy block. #### A. Main Block 1) Fixed-Width Modified Booth Multipliers: To reduce the truncation error, first slightly modify the partial product matrix of Booth multiplication and then derive an effective error compensation function that makes the error distribution be more symmetric to and centralized in the error equal to zero, leading the fixed-width modified Booth multiplier to very small mean and mean square errors. The Baugh-Woolley $12\times12$ array multiplier is further replace by a signed booth multiplier. Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's complement representation, including an implicit bit below the least significant bit, y-1=0. For each bit $y_i$ , for i running from 0 to N-1, the bits $y_i$ and $y_{i-1}$ are considered. Where these two bits are equal, the product accumulator P is left unchanged. Where $y_i=0$ and $y_{i-1}=1$ , the multiplicand times 2i is added to P; and where $y_i=1$ and $y_{i-1}=0$ , the multiplicand times 2i is subtracted from P. The final value of P is the signed product. The multiplicand and product are not specified; typically, these are both also in two's complement representation, like the multiplier, but any number system that supports addition and subtraction will work as well. As stated here, the order of the steps is not determined. Typically, it proceeds from LSB to MSB, starting at i=0; the multiplication by 2i is then typically replaced by incremental shifting of the P accumulator to the right between steps; low bits can be shifted out, and subsequent additions and subtractions can then be done just on the highest N bits of P. There are many variations and optimizations on these details. #### B. Fixed Width RPR In the ANT design, the function of RPR is to correct the errors occurring in the output of MDSP and lowering the required s power. In the case of using Fixed-width RPR to realize ANT architecture. This method not only lower Circuit area and power consumption, but also accelerate the computation speed. The fixed width RPR uses a replica of MDSP output with reduced precision operands. In the MDSP of n-bit ANT Baugh–Woolley array multiplier, its two unsigned n-bit inputs of X and Y can be expressed as ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com $$X = \sum_{i=0}^{n-1} x_i \cdot 2^i$$ , $Y = \sum_{j=0}^{n-1} y_j \cdot 2^j$ The multiplication result P is the summation of partial products of $x_i, y_i$ , which is expressed as The (n/2)-bit unsigned full-width Baugh–Woolley partial product array can be divided into four subsets, which are most significant part (MSP), input correction vector [ICV $(\beta)$ ], minor ICV [MICV $(\alpha)$ ], and LSP. Therefore, the other three parts of ICV $(\beta)$ , MICV $(\alpha)$ , and LSP are called as truncated part. The truncated ICV $(\beta)$ and MICV $(\alpha)$ are the most important parts because of their highest weighting. Therefore, they can be applied to construct the truncation error compensation algorithm. To evaluate the accuracy of a fixed-width RPR, the difference between the (n/2)-bit fixed-width RPR output and the 2n-bit full-length MDSP output, which is expressed as $$\varepsilon = P - P_t$$ Where, P is the output of the complete multiplier in MDSP and $P_t$ is the output of the fixed-width multiplier in RPR. It is noted that $\beta$ is the summation of all partial products of ICV. By statistically analyzing the truncated difference between MDSP and fixed-width RPR with uniform input distribution, is used to find the relationship between f (EC) and $\beta$ . The statistical results show that the average truncation error in the fixed-width RPR multiplier is approximately distributed between $\beta$ and $\beta+1$ . More precisely, as $\beta=0$ , the average truncation error is close to $\beta+1$ . As $\beta>0$ , the average truncation error is very close to $\beta$ . For the $\beta$ >0 case, select $\beta$ as the compensation vector Where, P is the output of the complete multiplier in MDSP and $P_t$ is the output of the fixed-width multiplier in RPR. It is noted that $\beta$ is the summation of all partial products of ICV. The statistical results show that the average truncation error in the fixed-width RPR multiplier is approximately distributed between $\beta$ and $\beta$ +1. More precisely, as $\beta$ = 0, the average truncation error is close to $\beta$ +1. As $\beta$ >0, the average truncation error is very close to $\beta$ . #### III. PROPOSED METHOD #### A. Introduction of Wallace Multiplier Wallace multiplier is extracted form of parallel multiplier [5]. It is slightly faster and requires fewer gates. Different types of schemes are used in parallel multiplier. The Wallace scheme is one of the parallel multiplier schemes that essentially minimize the number of adder stages required to perform the summation of partial products. This is achieved by using full and half adders to reduce the number of rows in the matrix number of bits at each summation stage. Even though the Wallace multiplication has regular and less complex structure, the process is slower in manner due to serial multiplication process. Further, Wallace multiplier is less expensive compared to that of Wallace tree multiplier. Hence, in this paper, Wallace multiplier is designed and analyzed by considering different methods using full adders involving different logic styles. 1) Implementation of Wallace Multiplier: The algorithm of Wallace multiplier is shown in Fig.3.1. The partial product matrix is formed in the first stage by stages which is illustrated in Fig.3.2 Fig.3.1 Conventional Wallace tree multiplier 2) Wallace Tree Multiplier Using Ripple Carry Adder: Ripple Carry Adder is the method used to add more number of additions to be performed with the carry in sand carry outs that is to be chained. Thus multiple adders are used in ripple carry adder. It is possible to create a logical circuit using several full adders to add multiple-bit numbers. Each full adder inputs a C<sub>in</sub>, which is the Cout of the previous adder. This kind of adder is a ripple carry adder, since each carry bit "ripples" to the next full adder. ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com The proposed architecture of Wallace multiplier algorithm using RCA, Take any 3 values with the same weights and gives them as input into a full adder. Partial product obtained after multiplication is taken at the first stage. The data is taken with 3 wires and added using adders and the carry of each stage is added with next two data's in the same stage. At the final stage, same method of ripple carry adder method is performed and thus product terms p1 to p8 is obtained. Fig.3.2 generation of product terms #### B. proposed Wallace 12 x 12 Multiplier A reliable low-power multiplier design by adopting Wallace tree multiplier architecture to build the reduced precision replica redundancy block (RPR). The Wallace architecture to replace the fixed-width RPR block in the design using reduced precision redundancy, which can not only provide higher computation precision, lower power consumption, and lower area overhead in RPR, more area efficient, lower operating supply voltage, and lower power consumption in realizing the Wallace architecture. The main difference between the existing and proposed methods are, replaced the ANT architecture with an Wallace tree architecture with an error compensation circuit. Since the proposed design with error compensation circuit possess less hardware complexity than the ANT multiplier. In compensation method is to compensate the truncation error between the full-length MDSP multiplier and the Wallace tree multiplier. To further lower the compensation error, also consider the impact of truncated products with the second most significant bits on the error compensation. An error compensation circuit using a simple minor input correction vector to compensation the error remained. In order not to increase the critical path delay, locate the compensation circuit in the noncritical path of the Wallace multiplier. As compared with the fixed-width RPR design, the proposed Wallace tree multiplier not only performs with higher SNR but also with lower circuitry area and lower power consumption. Fig.3.3. implementation of 12x12 bit Wallace multiplier ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com - 1) Error Compensation Circuit To realize the fixed width RPR, construct one directly injecting ICV (β) to basically meet the statistic distribution and one minor compensation vector MICV ( $\alpha$ ) to alter the insufficient error compensation cases. The compensation vector ICV(β) is realized by directly injecting the partial terms of Xn-1Yn/2, Xn-2Y(n/2)+1, Xn-3Y(n/2)+2...,X(n/2)+2Yn-2. These directly injecting compensation terms are labeled as C1, C2, C3, and C (n/2)-1. The other compensation vector used to mend the insufficient error compensation case is constructed by one conditional controlled OR gate. One input of OR gate is injected by X (n/2) Yn-1, which is designed to realize the function of compensation vector $\beta$ . The other input is conditional controlled by the judgment formula used to judge whether $\beta = 0$ and $\beta l = 0$ as well. As shown in fig, the term Cm1 is used to judge whether $\beta = 0$ or not. The judgment function is realized by one NOR gate, while its inputs are Xn-1Yn/2, Xn-2Y(n/2)+1, Xn-3Y(n/2)+2...X(n/2)+2Yn-2. The term Cm2 is used to judge whether $\beta l=0$ . The judgment function is realized by one OR gate, while its inputs are Xn-2Yn/2, Xn-3Y(n/2)+1, Xn-4Y(n/2)+2,...,X(n/2)+1Yn-2. If both of these two judgments are true, a compensation term Cm is generated via a two-input AND gate. Then, Cm is injected together with X (n/2) Yn-1 into a two-input OR gate to correct the insufficient error compensation. Accordingly, in the case of $\beta = 0$ and $\beta l = 0$ as well, one additional carry-in signal C(n/2) is injected into the compensation vector to modify the compensation value as $\beta + 1$ instead of $\beta$ . Moreover, the carry-in signal C (n/2) is injected in the bottom of error compensation vector, which is the farthest location away from the critical path. Therefore, not only the error compensation precision in the Wallace multiplier can be enhanced, the computation delay will also be reduced. Since the critical supply voltage is dominated by the critical delay time of the RPR circuit, preserving the critical path of RPR not be postponed is very important. Finally, the proposed high- precision in Wallace multiplier design, the adder cells can be saved by half as compared with the conventional fixed-width RPR. - Multiplication logic: Considering an example of 8 bit multiplication in which 8 bit input is X7X6X5X4X3X2X1X0 and multiplier is Y7Y6Y5Y4Y3Y2Y1Y0. The multiplication process. There is the requirement of 64 AND logics. First Y0 is multiplied with X7X6X5X4X3X2X1X0 and results X0Y0, XIY0, X2Y0, X3Y0, X4Y0, X5Y0, X6Y0 and X7Y0. After it Y1 is multiplied with X7X6X5X4X3X2X1X0 and results X0Y1, X IY1, X2Y1, X3Y1, X4Y1, X5Y1, X6Y1 and X7Y1. Similarly all multiplications are taken place. In each step there is one binary shift in the resultant logic. All AND logics are represented by one bit representation starting from K0 to K63 sequentially. After completion of 64 AND logic there is an additive process which. The addition can be done using a tree formed itself. This is done using compressors and reduces the latency and increases the speed. In the process the sum output of intermediate compressors is the input for next compressors in the same column and the generated carry for the corresponding adders are propagated to next column adders. The result will be of 16 bits represented by [P15....P0]. With the objective of improving the speed of the parallel multiplier. Wallace introduced a very important iterative realization of parallel multiplier. The advantages and results by considering Wallace multiplier in place of ANT multiplier is given below Compare to normal multiplication delay is very low in Wallace multiplication. Power consumption in the Wallace multiplier is low. Speed is very fast i.e delay and power is inversely proposals to each other. The benefit of the Wallace tree is that there are only reduction layers, and each layer has propagation delay. As making the partial products is and the final addition is, the multiplication is only, not much slower than addition (however, much more expensive in the gate count). By adding partial products with regular adders would require time. These computations only consider gate delays and don't deal with wire delays, which can also be very substantial. The results of design and simulation of ANT multiplier and Wallace multiplier are given below: #### C. Design summary of ANT multiplier | Parameter | Testing using ANT multiplier | |----------------------|------------------------------| | No. of lookup tables | 361 out of 1536 - 23% | | Total realtime delay | 21sec | #### D. Design summary of Wallace multiplier | Parameter | testing using Wallace multiplier | |----------------------|----------------------------------| | No. of lookup tables | 349 out of 1536 - 22% | | Total realtime delay | 13 Sec | ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor : 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com #### III. SIMULATION RESULTS This section describes the performance of the proposed design with RTL schematic and output waveforms. Fig.RTL Schematic of existing method Fig 4.1 output of existing method ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com Fig.4.2.RTL waveform of proposed method Fig 4.3.output of proposed method. #### IV. CONCLUSION This paper provides A reliable Low power multiplier design with fixed width RPR using Wallace multiplier design is presented. In proposed system, the proposed 12-bit Wallace tree multiplier circuit also performs error reduction in signed numbers and reduces propagation delay. This technique helps in reduction in power delay product may be greatly useful in processing units specifically in digital signal processors. ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue II, February 2018- Available at www.ijraset.com #### REFERENCES - [1] ChynWey.I, Chien-Chang Peng, and Feng-Yu Liao(2015), 'Reliable Low-Power Multiplier Design Using Fixed- Width Replica Redundancy Block' IEEE Trans. Very Large Scale Integr. (VLSI) Syst, Vol. 23, No. 1. - [2] Chen J. N and Hu J. H., (2013) 'Energy- efficient digital signal processing via voltage-over scaling-based residue number system,' IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 21, No. 7, pp. 1322–1332. - [3] Chen J. N., Hu J. H., and Li S. Y.,(2012), Low power digital signal processing scheme viastochastic logic protection, in Proc. IEEE Int. Symp. Circuits Syst, pp. 3077–3080. - [4] Gupta V., Mohapatra D., Raghunathan A., and Roy K., (2013), 'Low-power digital signal processing using approximate adders,' IEEE Trans. Comput. Added Des. Integr. Circuits Syst., Vol. 32, No. 1, pp. 124–137. - [5] R. and. Shanbhag N. R,1999, Energy-efficient signal processing via algorithmic noise-tolerance, in Proc. IEEE Int. Symp. Low Power Electron. Des.,pp. 30–35 - [6] Petra N., Caro D. D., Garofalo V., Napoli N., and Strollo A. G. M., (2010) 'Truncated binary multipliers with variable correction and minimum mean square error,' IEEE Trans. Circuits Syst., Vol. 57, No. 6, pp. 1312–1325 - [7] B. Cope, P. Cheung, W. Luk, and L. Howes, "Performance Comparison of Graphics Processors to Reconfigurable Logic: ACase Study," IEEE Trans. Computers, vol. 57, no. 4, pp. 433-446,Apr. 2010. - [8] Shim B, Sridhara S, and Shanbhag N. R.,(2004), 'Reliable low-power digital signal processing via reduced precision redundancy,' IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 12, No. 5, pp. 497–510 - [9] Shim B. and Shanbhag N. R.,(2006), Energy- efficient soft-error tolerant digital signal processing, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., Vol. 14, No. 4, pp. 336–348. [10] Wey I.C. and Wang C.C.,(2010), Low- error and area-efficient fixed width multiplier by using minor input correction vector, in Proc. IEEE Int. Conf. Electron. Inf. Eng., Vol. 1. Kyoto, Japan, pp. 118–122. 10.22214/IJRASET 45.98 IMPACT FACTOR: 7.129 IMPACT FACTOR: 7.429 ## INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Call: 08813907089 🕓 (24\*7 Support on Whatsapp)