

# **A 16-nm FinFET 28.8-mW 800-MHz 8-Bit All-N-Transistor Logic Carry Look-Ahead Adder**

**Chua-Chin Wang1,2 · Oliver Lexter July A. Jose[1](http://orcid.org/0000-0003-4565-9506) · Wen-Shou Yang<sup>1</sup> · Ralph Gerard B. Sangalang1,[3](http://orcid.org/0000-0002-4120-382X) · Lean Karlo S. Tolentino1,[4](http://orcid.org/0000-0002-8014-8229) · Tzung-Je Lee[1](http://orcid.org/0000-0001-6870-7406)**

Received: 18 January 2022 / Revised: 9 October 2022 / Accepted: 10 October 2022 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

### **Abstract**

Low-power and high-speed calculation is very important nowadays for energy-efficient demand of electronic devices. With the usage of ANT (All-N-transistor) logic, the speed constraint caused by PMOS transistors can be overcome through an auxiliary current path across NMOS transistors. This study presents a 800-MHz 28.8-mW 8 bit carry look-ahead adder (CLA) using ANT logic implemented on chip. FinFET technology is utilized to improve carrier mobility and increase device speed. R-C parasitic capacitance in FinFET devices is considered in the analysis of the delay time for the 8-bit CLA to improve PDP (power-delay product). The proposed design is implemented in 16-nm FinFET process with core area of  $206.403 \times 152.506 \,\mu\text{m}^2$ . It has the lowest normalized PDP at 60 pF load by far.

**Keywords** FinFET technology · High-speed · Low-power · PDP · CLA · Dual Path

# **1 Introduction**

In a growing number of portable devices, high-speed and low-power components are critical for energy-efficient demand [\[25\]](#page-20-0). As VLSI is concerned, CMOS process is widely used due to its temperature-stability response and anti-noise stability over other technologies such as emitter-coupled logic (ECL) and transistor-transistor logic (TTL) [\[20](#page-20-1)]. CMOS technology has also conventionally been preferred because of its low static current characteristics that results in low power consumption [\[3,](#page-19-0) [7,](#page-19-1) [21](#page-20-2), [27](#page-20-3)].

Oliver Lexter July A. Jose, Wen-Shou Yang, Ralph Gerard B. Sangalang, Lean Karlo S. Tolentino and Tzung-Je Lee have contributed equally to this work.

B Chua-Chin Wang ccwang@ee.nsysu.edu.tw

Extended author information available on the last page of the article

Even with many advantages, certain technologies have been reported for the speed disadvantage of CMOS. Carbon nanotube field-effect transistor (CN-FET) [\[35](#page-21-0)] and fin field-effect transistor (FinFET)  $[1, 26, 28]$  $[1, 26, 28]$  $[1, 26, 28]$  $[1, 26, 28]$  $[1, 26, 28]$  $[1, 26, 28]$  have recently emerged as viable options for high-speed application because of better carrier mobility and smaller area. CN-FET performs better than CMOS because of its low capacitance  $C_{gg}$  and the  $C_{gg}$  to  $I_{on}$  ratio. However, its incompatibility to the established industry-standard CMOS technology and large  $I_{\text{off}}$  are not preferred for low-power applications. Another technology to enhance the CMOS technology is double-gate FinFET [\[28](#page-20-5)]. A 1-bit full adder using 10 transistors implemented in this FinFET technology was proved to enhance the performance. The double gate technique reduces the leakage current, and the operating power resulting in the decrease of process variation impact. However, the cost is high and the yield is low. Spintronic devices have promising complementary properties such as near-zero static power consumption and are easy to be integrated with CMOS processes. Nevertheless, spin-CMOS-based approximation full adders have very low accuracy [\[16\]](#page-19-3).

Other techniques widely used to reduce power dissipation are pass-transistor logic [\[27](#page-20-3)] and XNOR/XOR logic [\[3,](#page-19-0) [7](#page-19-1), [21](#page-20-2)]. Nevertheless, the operation speed suffers due to the increase of the equivalent resistance formed by the series transistors thereof. The 6 hybrid full adders reported by Naseri *et. al.* were implemented using fullswing XOR/XNOR gates which have the characteristics of equal input capacitances, no NOT gates on the critical path, and small output capacitances resulting in higher speed and lower power [\[21](#page-20-2)]. They all utilized 2-1-MUX which exploited transmission gates without short-circuit and static power dissipation. However, since some inputs are driven through the diffusions, they will have difficulty in driving the said inputs when the load capacitance is large. Moreover, they are noise prone as well. Akhter *et. al.* proposed a full adder using a new topology for dynamic logic-based [\[2\]](#page-19-4). This new topology employs XNOR and XOR logic gates in the implementation of the full adder circuit, which improves the power dissipation and speed over XNOR/XOR logic alone.

The traditional Domino logic can be used also in logic operation in full adder design, but it is only limited to non-inverting operations [\[6](#page-19-5)]. Thus, the limitation of the noninverting operation logic can be resolved by complementary all-N-transistor (CANT) which can realize both inverting and non-inverting logic operation [\[15\]](#page-19-6). However, due to the current path across the slow PMOS devices, the operating speed of CANT would be limited. The all-N-logic (ANL) topology is also a good solution because of its assistant driving current path that provides an increase of operation speed through NMOS transistors [\[13\]](#page-19-7). Even so, ANL circuit speed varies inversely proportional to the number of series transistors in the pull-down block.

To address all of the issues raised, the proposed 8-bit All-N-transistor logic CLA (carry look-ahead adder) reduced the power consumption and enhanced the operation speed, which is justified by the prototype fabricated by 16-nm FinFET process.

<span id="page-2-0"></span>

# **2 FinFET Design Methodology**

In order to enhance the operation frequency of the adder, FinFET technology is used in this research. Referring to [\[12\]](#page-19-8), there are lots of accurate principles that scientists and engineers can use to evaluate FinFET design trade-offs. Electrical and physical properties of FinFET are associated to Berkeley Short-channel IGFET Model—Common Multi-Gate (BSIM-CMG) SPICE parameters such as GEOMOD and NFIN. Figure [1](#page-2-0) shows the rectangular geometry of the FinFET [\[23\]](#page-20-6). The drain current in saturation region can be expressed as Eq.  $(1)$  [\[11,](#page-19-9) [12\]](#page-19-8).

<span id="page-2-1"></span>
$$
I_d = W_{\text{eff}} \cdot \mu_0 \cdot \frac{\varepsilon_{\text{ox}}}{t_{\text{ox}}} \left(\frac{2k}{2} \right)^2 \left(\frac{q_{\text{is}} - q_{\text{id}}}{L} + \frac{1}{2} \cdot \frac{q_{\text{is}}^2 - q_{\text{id}}^2}{L - \Delta L}\right) \tag{1}
$$

where  $q_{\rm is}$  and  $q_{\rm id}$  are the normalized inversion sheet-charge densities calculated at the source and drain, respectively,  $\Delta L$  is the gap between  $L$  and the channel pinchoff with effective width ( $W_{\text{eff}}$ ) or the wire's circumference is as follows.

$$
W_{\text{eff}} = N_F \cdot (2 \cdot H_F + W_F) \tag{2}
$$

where  $N_F$  represents the number of fins, while  $H_F$  and  $W_F$  is the height and thickness of a fin, respectively.

Figure [2a](#page-3-0), b is top view and side view, respectively, for the derivation of parasitic capacitances. The fringe  $C_f$  and overlap  $C_{ov}$  capacitors affect the device characterization in short channel devices, including the gate-to-drain  $(C_{gd})$  and gate-to-source  $(C_{gs})$  parasitic capacitance [\[17](#page-20-7)].

Referring to Fig. [3,](#page-3-1) the parasitic capacitor can be characterized by five different parasitic capacitors. Therefore, the equation for  $(C_{gd})$  as a function of  $N_F$ ,  $C_{fr}$ , and  $C_{\text{ov}}$  is defined in Eq. [\(3\)](#page-2-2),

<span id="page-2-2"></span>
$$
C_{\text{gd}} = N_F \cdot (C'_{\text{gd},\text{ov}} + C'_{\text{gd},\text{fr}}) \tag{3}
$$

where  $C'_{\text{gd,fr}}$  and  $C'_{\text{gd,ov}}$  represent the combined fringe and overlap capacitances, respectively.



<span id="page-3-0"></span>**Fig. 2** FinFET **a** top view, and **b** cross-sectional view

<span id="page-3-1"></span>



A large parasitic resistance is formed by the drain and source terminals on both ends of the narrow fins  $[12]$  $[12]$ . In digital applications, the equivalent switching resistance  $R_n$ can be approximated to the reciprocal slope of the line starting  $V_{ds} = V_{DD}$  up to  $V_{ds} = 0$  in the current–voltage (I–V) curve [\[6\]](#page-19-5). The equation is characterized by Eq. [\(4\)](#page-3-2).

<span id="page-3-2"></span>
$$
R_n = \frac{V_{\text{DD}}}{W_{\text{eff}} \cdot \mu_0 \cdot \frac{\varepsilon_{\text{ox}}}{t_{\text{ox}}} \left(\frac{2kT}{2}\right)^2 \left(\frac{q_{\text{is}} - q_{\text{id}}}{L} + \frac{1}{2} \cdot \frac{q_{\text{is}}^2 - q_{\text{id}}^2}{L - \Delta L}\right)}
$$
(4)

or

$$
R_n' = \frac{L}{W_{\text{eff}}}
$$
 (5)

where  $R'_n$  is the effective resistance.

Figure [4a](#page-4-0), b shows the setup to determine the  $I<sub>d</sub>$  dc characteristic curve for FinFET and cell layout, respectively, where nFin is the number fin equal to 20 with  $l = 60$  nm.  $M_N$  has cell size of 0.625  $\mu$ m  $\times$  1.114  $\mu$ m with standard threshold voltage ( $V_{th}$ ) equal to 375 mV,  $V_{ds(sat)} = 198.7$  mV,  $I_d = 864$  mA, and  $I_g = 5.9$  pA given that  $V_{gs}$  and  $V_{ds}$ are 774.7 mV and 749.4 mV, respectively. Figure  $5a$ , b illustrates the  $I_d$  characteristic curve of the FinFET in relation to  $V_{ds}$  and  $V_{gs}$ .  $I_d$  has a value of 651  $\mu$ A when  $V_{ds}$  is 200 mV.

Tabulated in Table [1](#page-4-2) are the device specifications of standard  $V_{th}$  NMOS and  $V_{th}$ PMOS in a 16-nm FinFET CMOS process. These include the  $V_{\text{th}}$ ,  $V_{DS(sat)}$ , and area of the transistors for the number of fins, fingers, and total width equal to 1, 1, and 10 nm, respectively.

### **Birkhäuser**



<span id="page-4-0"></span>**Fig. 4** FinFET **a** setup for  $I_d$  dc characteristic curve and **b** cell layout at nFin = 20 and 1 = 60 nm



<span id="page-4-1"></span>**Fig. 5** FinFET  $I_d$  characteristic curve vs **a**  $V_{ds}$  and **b**  $V_{gs}$ 

<span id="page-4-2"></span>**Table 1** Device specifications of transistors in a 16-nm FinFET CMOS process

|                     | Standard $V_{th}$ NMOS | Standard $V_{th}$ PMOS |  |
|---------------------|------------------------|------------------------|--|
| Length $(nm)$       | 16                     | 16                     |  |
| $V_{\text{th}}$ (V) | 0.4101                 | $-0.4083$              |  |
| $V_{DS(sat)}$       | 0.2240                 | 0.2783                 |  |
| Number of fins      |                        |                        |  |
| Number of fingers   |                        |                        |  |
| Total Width (nm)    | 10                     | 10                     |  |
| Area $(nm^2)$       | $0.49 \times 0.626$    | $0.49 \times 0.626$    |  |

# **3 CLA Realized by All-N-transistor logic**

Figure [6](#page-5-0) shows the proposed 8-bit CLA adder block diagram.  $A_0 \sim A_7$  and  $B_0 \sim B_7$ are inputs of the adder. Two sets of true single-phase clock DFFs (TSPC-DFF) are used as the input registers. The outputs of TSPC-DFFs are coupled to 8-bit generate



<span id="page-5-0"></span>**Fig. 6** Proposed 8-bit CLA block diagram

and propagate generator (8-bit G/P generator) synchronized by the system clock, clk. The simplified equations for  $P_i$  (propagate) and  $G_i$  (generate) are,  $i = 0 \sim 7$ , shown in Eqs.  $(6)$  and  $(7)$ , respectively  $[36]$ .

<span id="page-5-1"></span>
$$
P_i = A_i \oplus B_i \tag{6}
$$

and

<span id="page-5-2"></span>
$$
G_i = A_i \cdot B_i \tag{7}
$$

 $S_i$  (sum) and  $C_i$  (carry) are then functions of  $G_i$  and  $P_i$  as given in Eqs. [\(8\)](#page-5-3) and [\(9\)](#page-5-4), respectively [\[36](#page-21-1)].

<span id="page-5-3"></span>
$$
S_i = C_{i-1} \oplus P_i \tag{8}
$$

and

<span id="page-5-4"></span>
$$
C_i = G_i + G_{i-1}P_i + G_{i-2}P_{i-1}P_i + \dots + C_{in}P_0 + \dots + P_{i-1}P_i \tag{9}
$$

Apparently, the critical path is the generation of  $C_i$ ,  $i = 0 \sim 7$ . The generated  $C_{i-1}$ and  $P_i$  are the input of the 8-bit sum generator. Finally, the  $C_{out}$  is computed through  $S_i$ and *C*7. The final stage of the adder is output buffers to drive external loads, including pads and bondwires.

#### **3.1 All-N-transistor (ANT) logic generic cell**

The ANT logic is used to meet the low power-delay product (PDP) criterion. Figure [7a](#page-7-0) shows the schematic diagram of a generic ANT logic cell. When the clock (clk) signal in the ANT is 0, the circuit will go to the pre-charge phase, switching transistor P2 to OFF state and charging  $V_A$  to  $V_{dd}$ . Also, this phase turns OFF transistors N1 and N4.

Given the effects of the zero logic at the clk, the output of the ANT is the same as the previous state.

Changing the clk signal to 1, the ANT cell will go to the evaluation mode. In this mode, the output is based on the state inside the N-block. Thus, the operation can be expressed as Eq. [\(10\)](#page-6-0).

<span id="page-6-0"></span>
$$
Y = \overline{f(X_1, X_2, ..., X_n)}
$$
 (10)

where  $X_i$ ,  $i = 1, \dots, n$ , denotes the logic inputs. The ANT logic operates in one of four distinct cases during the evaluate phase in Fig. [7b](#page-7-0), depending on the previous state of the logic output  $V_{v,pre}$  and the logic operation of the N-block. The 4 different cases are discussed as follows:

**Case 1:** The ANT logic works in this case given  $V_{Y,pre}$  the same as  $V_{dd}$  and the Nblock is ON. The weak operation in transistor N3 and the discharge of  $V_A$  from  $V_{dd}$ to  $V_{dd}$ − |  $V_{thp}$  | are caused by the voltage transition  $V_B$  to 0 V when clk changes to 1. Initially, *VC* is clamped to zero potential, also through the pre-charge state while the output *VY* is pulled a bit lower by transistor N4.

The N-block together with the loop generated by P3 and N3 was immediately pulled down *V<sub>A</sub>*, when *V<sub>A</sub>* is less than *V*<sub>dd</sub>− | *V*<sub>thp</sub> |. The voltage at the output node is drawn back by transistors P2 and N4. Hence, the delay in case 1 can be expressed from the time  $V_Y$  changes from  $V_{dd}$  to  $V_{thp}$  by Eq. [\(11\)](#page-6-1) and then Eq. [\(12\)](#page-6-1).

<span id="page-6-1"></span>
$$
\tau_{11} = k_1 \cdot R'_{n4} \cdot \frac{L}{W_{\text{eff}}} \cdot C_A \tag{11}
$$

$$
\tau_{11} = k_2 \cdot (R'_{p2} \parallel (R'_{n4} + R'_{p3})) \cdot \frac{L}{W_{\text{eff}}} \cdot C_Y \tag{12}
$$

where the value of  $k_1 = \frac{|V_{\text{thpl}}|}{V_{\text{dd}}}$  and  $k_2 = \frac{V_{\text{dd}} - |V_{\text{thpl}}|}{V_{\text{dd}}}$ , or ratio of the duration for the two-step operation. While  $C_Y^T$  and  $C_A$  are the parasitic capacitances at node A and Y, respectively.

When the clk signal is 0, P1 is ON, voltage at  $V_A$  is high, and P2 and N4 are short circuit that will give an output same as the previous state.

**Case 2:** Referring to Fig. [7,](#page-7-0) case 2 occurs when  $V_{Y,pre} = 0$ , and the N-block is ON. In the first step of this case, N-block discharges voltage  $V_A$  from  $V_{dd}$  to  $V_{dd}$  | *V*<sub>thp</sub> |. Once the voltage at node A drops below  $V_{dd} - |V_{thp}|$ , it is pulled down to the gnd quickly through the loop formed by transistors P3 and N3, and the N-block. Thus, since  $V_A$  drops below  $V_{dd}$ − |  $V_{thp}$  |, the output voltage across  $V_Y$  is then charged to *V*<sub>dd</sub> through transistors P2 and N4. The first step delay, in this case, can be derived by the given equation for  $\tau_{21}$ .

$$
\tau_{21} = k_1 \cdot R'_{\text{N-block}} \cdot \frac{L}{W_{\text{eff}}} \cdot C_A \tag{13}
$$

For the second step, the delay is the same as case 1, namely,  $\tau_{22} = \tau_{12}$ 



<span id="page-7-0"></span>**Fig. 7** ANT logic **a** schematic diagram and **b** illustrated waveforms

<span id="page-7-2"></span>

**Case 3:** When the value of  $V_{Y,pre} = V_{dd}$  and N-block is off,  $V_y$  will be discharged from  $V_{dd}$  to 0 through transistor N4 due to the initial condition of  $V_c$  which is equal to 0. In this case, the delay can be found as Eq. [\(14\)](#page-7-1).

<span id="page-7-1"></span>
$$
\tau_3 = (R'_{N4}) \cdot \frac{L}{W_{\text{eff}}} \cdot C_Y. \tag{14}
$$

If the input clk signal is 1, N-block is turned on, and  $V_{dd} - V_A > |V_{thp}|$ , P3 will also go into ON-state through N3. *VA* discharges faster and N-block will be charged to *V*<sub>dd</sub>. The high potential output will pass to P2 and N4.

**Case 4:** When clk input is set to 1, N-block logic is not conducting, and voltage  $V_Y = 0$ , the output will be maintained at logic 0. This is because node A discharges through the short circuit consisting of transistors N3 and N4. If the previous state is high, node A will start to discharge the output through transistors N2 and N4. There is no transition in this case.

In short, the summary of the above 4 cases is tabulated in Table [2.](#page-7-2)



<span id="page-8-0"></span>**Fig. 8** 1-bit TSPC DFF schematic [\[4\]](#page-19-10)

# **3.2 True Single Phase Clock (TSPC) DFF**

To ensure the synchronization of input signals, TSCP-DFF circuit is used as the input block of the CLA. A single bit of TSCP-DFF is shown in Fig. [8.](#page-8-0) When reset  $= 1$  and  $clk = 1$ , the output O is equal to whatever the value of the input D is. Changing the clk input to 0, Q will retain the previous state of D. Once the reset is set to zero, the output Q will always be zero, regardless of the values of clk and D. All conditions of the single-bit TSCP-DFF are applicable to 2 sets of 8-bit TSCP-DFFs that are used as input registers of the adder circuit. The advantage of this architecture is it is adaptive to high-frequency clock signals for high-speed computation.

Figure [9](#page-9-0) shows a 1-bit G/P generator consisting of AND and XOR based on ANT logic [\[36\]](#page-21-1). Notably, in the AND block,  $MN_{215}$  and  $MN_{216}$  are connected in series and driven by the input  $A_i$  and  $B_i$ , respectively. In the XOR block, as series transistors  $MN_{221}$  and  $MN_{222}$  are connected in parallel to another series pair composed of  $MN_{223}$ and  $MN_{224}$ . It is also driven by input  $A_i$  and  $B_i$  to perform an XOR logic operation. Figure [10](#page-10-0) shows an 8-bit G/P block diagram based on single-bit G/P generators.

Referring to Fig. [7a](#page-7-0) again, P1 is sized with a large width to accommodate high current such that when clk is low, P2's gate will be pulled to high making it operates at cutoff (P2 is sized the same as P1 for symmetry). Since P2 is at cutoff, N4 is also at cutoff where output Y's condition is the same as the prior state.

Meanwhile, when clk is high, P1 is it cutoff while N1 and N3 are on. P3 is on, since its gate and node *VA* are grounded. N-device transistor N3, and footer transistor N1 are sized based on inverter switching point or threshold with reference to P1. Since drain current for FinFET can be approximately modeled using *n*th power law [\[9\]](#page-19-11) as shown in Eq.  $(15)$ ,



<span id="page-9-0"></span>**Fig. 9** 1-bit generation and propagation (G/P) circuit

<span id="page-9-1"></span>
$$
I_D = \frac{W}{L} \cdot \beta \cdot (V_G - V_{\text{th}})^n; V_{\text{DS}} \rangle = V_{\text{Dsat}} = k(V_{\text{GS}} - V_{\text{th}})^m
$$
 (15)

and the current for NMOS and PMOS for the inverter is given in Eq. [\(16\)](#page-9-2).

<span id="page-9-2"></span>
$$
I_{\text{D}(NMOS)} = I_{\text{D}(PMOS)} \tag{16}
$$

Substituting Eq.  $(15)$  in  $(16)$ , where  $V_{GS}$  of NMOS is equal to the switching point voltage  $V_{SP}$ , and  $V_{GS}$  of PMOS is equal to  $V_{dd}-V_{SP}$ , the following equation is derived.

<span id="page-9-3"></span>
$$
\left(\frac{W}{L}\right)_N \cdot (\beta_N) \cdot (V_{\text{SP}} - V_{\text{thn}})^n = \left(\frac{W}{L}\right)_P \cdot (\beta_P) \cdot (V_{\text{dd}} - V_{\text{SP}} - V_{\text{thp}})^n
$$
\n
$$
V_{\text{SP}} = \frac{\sqrt{\left(\frac{W}{L}\right)_N \cdot (\beta_N)}}{\sqrt{\left(\frac{W}{L}\right)_P \cdot (\beta_P)}} \cdot V_{\text{thn}} + (V_{\text{dd}} - V_{\text{thp}})}{\sqrt{\frac{n}{1 + \left(\frac{W}{L}\right)_N \cdot (\beta_N)}}}
$$
\n(17)

Moreover, since P1 is off when clk is high and node  $V_A$  is grounded, P2 and N4 are on while N2 is off. The sizing of P2 and P3 is treated as sizing for an inverter using again Eq. [\(17\)](#page-9-3). N4 and P3 are sized such the switching point voltage equals to half of *V*<sub>dd</sub>. The overall transistor aspects are tabulated in Table [3.](#page-10-1)

#### **B** Birkhäuser



<span id="page-10-1"></span><span id="page-10-0"></span>**Fig. 10** 8-bit generation and propagation (G/P) circuit



<span id="page-10-3"></span>**Fig. 11** ANT-based AND equivalent circuit (by De Morgan's law)



# **3.4 Carry Generation**

In generating carries  $C_0 \sim C_7$  to be needed by the adder, Eq. [\(9\)](#page-5-4) is generalized for each carry as shown in Eq. [\(18\)](#page-10-2).

<span id="page-10-2"></span>
$$
C_0 = G_0 + C_{in} P_0
$$
  
\n
$$
C_1 = G_1 + G_0 P_1 + C_{in} P_0 P_1
$$
  
\n
$$
C_2 = G_2 + G_1 P_2 + G_0 P_1 P_2 + C_{in} P_0 P_1 P_2
$$
  
\n
$$
\vdots
$$
  
\n
$$
C_7 = G_7 + G_6 P_7 + G_5 P_6 P_7 + \dots + C_{in} P_0 P_1 P_2 P_3 P_4 P_5 P_6 P_7
$$
\n(18)

Therefore, there is a need to use an AND gate with multiple inputs. A multiple series input AND logic is replaced with a multiple parallel connected NMOS devices



<span id="page-11-0"></span>**Fig. 12** 8-bit carry generation circuit

<span id="page-11-1"></span>**Fig. 13** 1-bit sum generator



<span id="page-11-2"></span>**Fig. 14** 8-bit sum generator

<span id="page-12-1"></span>

<span id="page-12-2"></span>**Fig. 16** 16-nm CMOS FinFET ANT CLA layout and die photo

by De Morgan's law. Referring to Fig. [11,](#page-10-3) the inverted input, A and B, is sent to OR gate and the output of the OR gate must be inverted again. An example is shown in Eq. [\(19\)](#page-12-0).

<span id="page-12-0"></span>
$$
A \cdot B = \overline{A} + \overline{B} \tag{19}
$$

Figure [12](#page-11-0) shows the overall 8-bit carry generation circuit which consists of 3 segments. They are detailed as follows:

**Segment I:** All AND logic cells presented in Eq. [\(18\)](#page-10-2) are grouped together in this segment. The gates are designed using De Morgan's law as shown in the top part of Fig. [12.](#page-11-0)

**Segment II:** This segment shows the ANT logic cells which use the N-blocks from Segments I as shown in the middle part of Fig. [12.](#page-11-0)

**Segment III:** This shows the OR gates which are grouped derived from Eq. [\(18\)](#page-10-2). Referring to Eq. [\(18\)](#page-10-2), each carry can be derived depending on its significant bit and can be obtained as a function of  $C_{in}$ ,  $G_i$ , and  $P_i$ ,  $i = 0 \sim 7$ . The summation of terms from the equation can be realized using multiple-input ANT-based OR gate.



#### Oscilloscope (Keysight DSAV134): DC-13 GHz

<span id="page-13-0"></span>**Fig. 17** Measurement setup

## 3.5 Sum Generator

The 1-bit sum generator circuit can be realized using XOR logic based on Eq. [\(8\)](#page-5-3). Referring to Fig. [13,](#page-11-1) *M N*<sup>231</sup> and *M N*<sup>232</sup> are connected in series, which are parallel to  $MN_{233}$  and  $MN_{234}$ . Each  $S_i$  is then attained by the XOR ANT logic driven by  $C_{i-1}$ and  $P_i$ . Finally, the block diagram for an 8-bit sum generator based on single bit sum generators is shown in Fig. [14.](#page-11-2)

# **3.6 Output Buffer**

The output buffer of the CLA consists of 6 stages of tappered inverters shown in Fig. [15.](#page-12-1) Each stage size is three times more than that of the preceding stage. A PMOS stacked to the last inverter giving a cascode connection. According to TSMC 16-nm FinFET process specifications, if the drain and source terminals of the PMOS are connected to the output PAD pin at the same time, it is recommended to use cascode connection. The additional PMOS also avoids the problem of latch up hazard at the output pad.

### **4 Implementation and Measurement**

Illustrated in Fig. [16,](#page-12-2) the die photo and layout of the chip implemented using TSMC 16 nm CMOS LOGIC FinFET Compact (Shrink) LL ELK Cu 1P13M 0.8/1.8V process layout are demonstrated. The chip used 40-SB packaging with chip area (overall area) of 618  $\times$  618 $\mu$ m<sup>2</sup> and 206.403  $\times$  152.506  $\mu$ m<sup>2</sup> core area. The size ratio and location of each sub-circuit is also shown in the figure as well.

For on-silicon measurement, PCB SMA connections are used for both input and output signals, including a dip switch for reset. The power supply Agilent N6761A

<span id="page-14-1"></span>



<span id="page-14-0"></span>**Fig. 18** S0 ∼ S3 at operating frequency of 800 MHz (Test 1)

provides 0.8 V power, and the data generator produced A0  $\sim$  A7, B0  $\sim$  B7, and clock input of the CLA adder. The output waveform is observed through Keysight DSAV134 high-frequency oscilloscope. The complete measurement setup for this investigation is shown in Fig. [17.](#page-13-0)

Table [4](#page-14-1) shows the input pattern and expected output for the CLA adder at  $clk =$ 800 MHz. Figure [18](#page-14-0) and [19](#page-15-0) shows the actual results of the input pattern (Test 1). Another round of measurement is given in Table [5.](#page-15-1) The corresponding measurement



<span id="page-15-0"></span>**Fig. 19** S4  $\sim$  S7 at operating frequency of 800 MHz (Test 1)

<span id="page-15-1"></span>**Table 5** Input pattern and expected output of Test 2

(Figs. [20,](#page-16-0) [21\)](#page-17-0)



waveforms are given in Figs. [20](#page-16-0) and [21](#page-17-0) (Test 2). Both measurement results prove the functionality and performance of the proposed ANT-based CLA.

Referring to Fig. [22,](#page-17-1) this work shows second lowest PDP per bit while most of the previous works are clustered to ten times higher. Notably, the lowest PDP in [\[18](#page-20-8)] was only by simulations, not measurements. Recent researches in CLA adder are compared shown in Table [6.](#page-18-0) Our CLA is the only work that was fabricated and tested on-silicon. We also used the highest value of load capacitance equal to 60 pF. It has the maximum



<span id="page-16-0"></span>**Fig. 20** S0  $\sim$  S3 at operating frequency of 800 MHz (Test 2)

operating clock frequency of 800 MHz. This work also has the lowest simulation value of normalized power-delay product (PDP), which means it dissipates less energy per switching event compared with all prior works.

# **5 Conclusion**

This research study is focused on 800-MHz 8-bit CLA using ANT logic implemented in 16-nm FinFET process with a 60 pF output load. The parasitic RC is derived using FinFET geometry. The low power-delay product (PDP) criterion is met using ANT logic. It is the only CLA design physically fabricated and tested with the highest value of load capacitance. This work has the highest energy efficiency thanks to its lowest amount of energy consumed per switching event compared to the other studies. Future work may propose the use of ANT logic in the BF16 floating adder. Since BF16 floating adder computation accuracy consumes a lot of power, the integration of this research to the adder will benefit the computation process due to its lowest value of norm. PDP.



<span id="page-17-0"></span>**Fig. 21** S4 ∼ S7 at operating frequency of 800 MHz (Test 2)



<span id="page-17-1"></span>**Fig. 22** Technology Roadmap of CLA designs in the past 2 decades

<span id="page-18-0"></span>

**Acknowledgements** This study was funded partially by the Ministry of Science and Technology, Taiwan under Grant Nos. MOST 109-2218-E-110-007-, MOST 109-2221- E-230-007-, MOST 109-2224- E-110- 001- and MOST 110-2218-E-110-008-. In addition, the authors would also like to convey their heartfelt thanks to Taiwan Semiconductor Research Institute (TSRI) of National Applied Research Laboratories (NARL), Taiwan, for chip fabrication and EDA tool assistance.

# **References**

- <span id="page-19-2"></span>1. M. Ahmadinejad, M.H. Moaiyeri, Energy- and quality-efficient approximate multipliers for neural network and image processing applications. IEEE Trans. Emerging Topics Comput. **10**(2), 1105–1116 (2022). <https://doi.org/10.1109/TETC.2021.3072666>
- <span id="page-19-4"></span>2. S. Akhter, S. Chaturvedi, S. Khan, A. Bhardwaj, An efficient CMOS dynamic logic-based full adder, in *Proceedings of 2020 6th International Conference on Signal Processing and Communication (ICSC)*, pp. 226–229 (2020). <https://doi.org/10.1109/ICSC48311.2020.9182729>
- <span id="page-19-0"></span>3. W. Al-Akel, K. Abugharbieh, A. Hasan, H.W. Marar, A power efficient 500MHz adder, in *Proceedings of 2019 SoutheastCon*, pp. 1–6 (2019). <https://doi.org/10.1109/SoutheastCon42311.2019.9020285>
- <span id="page-19-10"></span>4. S. G. Alie, T. A. Chandel, J. R. Dar, Power and delay optimized edge triggered flip-flop for low power microcontrollers. Int. J Sci. Res. Publ. **4**(5) (2014)
- 5. S.S. Aphale, K. Fakir, S. Kodagali, S.S. Mande, Analysis of various adder circuits in deep submicron process, in *Proceedings of 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT)*, pp. 307–311 (2016). [https://doi.org/10.1109/ICACDOT.2016.](https://doi.org/10.1109/ICACDOT.2016.7877599) [7877599](https://doi.org/10.1109/ICACDOT.2016.7877599)
- <span id="page-19-5"></span>6. R.J. Baker, H.W. Li, D.E. Boyce, *CMOS–Circuit Design, Layout, and Simulation*, 3rd edn. (Wiley-Interscience, New York, 2013)
- <span id="page-19-1"></span>7. P. Bhattacharyya, B. Kundu, S. Ghosh, V. Kumar, A. Dandapat, Performance analysis of a low-power high-speed hybrid 1-bit full adder circuit. IEEE Trans. Very Large Scale Integr. Syst. **23**(10), 2001–2008 (2015). <https://doi.org/10.1109/TVLSI.2014.2357057>
- <span id="page-19-14"></span>8. S.-K. Chang, C.-L. Wey, A fast 64-bit hybrid adder design in 90nm CMOS process, in *Proceedings of 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS)*, pp. 414–417 (2012). <https://doi.org/10.1109/MWSCAS.2012.6292045>
- <span id="page-19-11"></span>9. A. Datta, A. Goel, R. T. Cakici, H. Mahmoodi, D.Lekshmanan, K. Roy, Modeling and circuit synthesis for independently controlled double gate FinFET devices. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. **26**(11), 1957–1966 (2007). <https://doi.org/10.1109/TCAD.2007.896320>
- <span id="page-19-12"></span>10. H. Eriksson, P. Larsson-Edefors, T. Henriksson, C. Svensson, Full-custom vs. standard-cell design flow—an adder case study, in *Proceedings of 2003 Asia South and Pacific Design Automation Conference (ASP-DAC)*, pp. 507–510 (2003). <https://doi.org/10.1109/ASPDAC.2003.1195069>
- <span id="page-19-9"></span>11. N. Fasarakis, A. Tsormpatzoglou, D.H. Tassis, I. Pappas, K. Papathanasiou, M. Bucher, G. Ghibaudo, C.A. Dimitriadis, Compact model of drain current in short-channel triple-gate FinFETs. IEEE Trans. Electron Devices **59**(7), 1891–1898 (2012). <https://doi.org/10.1109/TED.2012.2195318>
- <span id="page-19-8"></span>12. S. Goodnick, A. Korlin, R. Nemanich, Semiconductor Nanotechnology: Advances in Information and Energy Processing and Storage. Springer, Switzerland (2018). [https://doi.org/10.1007/978-3-319-](https://doi.org/10.1007/978-3-319-91896-9) [91896-9](https://doi.org/10.1007/978-3-319-91896-9)
- <span id="page-19-7"></span>13. R.X. Gu, M.I. Elmasry, An all-N-logic high-speed single-phase dynamic CMOS logic, in *Proceedings of 1994 IEEE International Symposium on Circuits and Systems (ISCAS)*, vol. 4, pp. 7–104 (1994). <https://doi.org/10.1109/ISCAS.1994.409183>
- <span id="page-19-13"></span>14. M.A. Hernandez, M.L. Aranda, A low-power bootstrapped CMOS full adder, in *Proceedings of 2005 2nd International Conference on Electrical and Electronics Engineering,* pp. 243–246 (2005). [https://](https://doi.org/10.1109/ICEEE.2005.1529618) [doi.org/10.1109/ICEEE.2005.1529618](https://doi.org/10.1109/ICEEE.2005.1529618)
- <span id="page-19-6"></span>15. C.-H. Hsu, G.-N. Sung, T.-Y. Yao, C.-Y. Juan, Y.-R. Lin, C.-C. Wang, Low-power 7.2 GHz complementary all-n-transistor logic using 90 nm CMOS technology, in *Proceedings of 2009 IEEE International Symposium Circuits and Systems (ISCAS)*, pp. 389–392 (2009). [https://doi.org/10.1109/ISCAS.2009.](https://doi.org/10.1109/ISCAS.2009.5117767) [5117767](https://doi.org/10.1109/ISCAS.2009.5117767)
- <span id="page-19-3"></span>16. H. Jiang, S. Angizi, D. Fan, J. Han, L. Liu, Non-volatile approximate arithmetic circuits using scalable hybrid spin-CMOS majority gates. IEEE Trans. Circuits Syst. I: Regul. Pap. **68**(3), 1217–1230 (2021). <https://doi.org/10.1109/TCSI.2020.3044728>
- <span id="page-20-7"></span>17. J.-R. Kahng, J.-W. Moon, J.-H. Kim, C-V extraction method for gate fringe capacitance and gate to source-drain overlap length of LDD MOSFET, in *Proceedings of 2001 International Conference on Microelectronics Test Structures (ICMTS)*, pp. 59–63 (2001). [https://doi.org/10.1109/ICMTS.2001.](https://doi.org/10.1109/ICMTS.2001.928638) [928638](https://doi.org/10.1109/ICMTS.2001.928638)
- <span id="page-20-8"></span>18. M. Keerthana, T. Ravichandran, Implementation of low power 1-bit hybrid full adder using 22-nm CMOS technology, in *Proceedings of 2020 International Conference on Communication System (ICACCS)*, pp. 1215–1217 (2020). <https://doi.org/10.1109/ICACCS48705.2020.9074256>
- <span id="page-20-12"></span>19. M. Linares-Aranda, R.Báez, O. Gonzalez-Diaz, Hybrid adders for highspeed arithmetic circuits: A comparison, in *Proceedings of 2010 7th International Conference on Electrical Engineering Computing Science and Automatic Control (CCE)*, pp. 546–549 (2010). [https://doi.org/10.1109/ICEEE.2010.](https://doi.org/10.1109/ICEEE.2010.5608566) [5608566](https://doi.org/10.1109/ICEEE.2010.5608566)
- <span id="page-20-1"></span>20. Y. Liu, Advantages of CMOS technology in very large scale integrated circuits, in *Proceedings of 2021 2nd International Conference Artificial Intelligence Electronics Engineering (AIEE 2021)*, pp. 82–88. Association for Computing Machinery, New York, NY, USA (2021). [https://doi.org/10.1145/](https://doi.org/10.1145/3460268.3460280) [3460268.3460280](https://doi.org/10.1145/3460268.3460280)
- <span id="page-20-2"></span>21. H. Naseri, S. Timarchi, Low-power and fast full adder by exploring new XOR and XNOR gates. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. **26**(8), 1481–1493 (2018). [https://doi.org/10.1109/TVLSI.](https://doi.org/10.1109/TVLSI.2018.2820999) [2018.2820999](https://doi.org/10.1109/TVLSI.2018.2820999)
- 22. V.S.P. Nayak, N. Ramchander, R.S. Reddy, T.H.S.P. Redy, M.S. Reddy, Analysis and design of low-power reversible carry select adder using D-latch, in *Proceedings of 2016 IEEE International Conference Recent Trends Electronic Information Communication Technology (RTEICT)*, pp. 1917– 1920 (2016). <https://doi.org/10.1109/RTEICT.2016.7808169>
- <span id="page-20-6"></span>23. R.S. Pal, S. Sharma, S. Dasgupta, Recent trend of FinFET devices and its challenges: A review, in *Proceedings of 2017 Conference on Emerging Devices and Smart System (ICEDSS)*, pp. 150–154 (2017). <https://doi.org/10.1109/ICEDSS.2017.8073675>
- <span id="page-20-14"></span>24. M.O.V. Pavan Kumar, M. Kiran, Design of optimal fast adder, in *Proceedings of 2013 International Conference on Advanced Computing and Communication System*, pp. 1–4 (2013). [https://doi.org/10.](https://doi.org/10.1109/ICACCS.2013.6938692) [1109/ICACCS.2013.6938692](https://doi.org/10.1109/ICACCS.2013.6938692)
- <span id="page-20-0"></span>25. M. Rafiee, M.B. Ghaznavi-Ghoushchi, An output node split CMOS logic for high-performance and large capacitive-load driving scenarios. Microelectron. J. **72**, 109–119 (2018)
- <span id="page-20-4"></span>26. A. Raghunandan, D.R. Shilpa, Design of high-speed hybrid full adders using FinFET 18nm technology, in *Proceedings of 2019 4th International Conference on Recent Trends Electronic, Information and Communication Technology (RTEICT)*, pp. 410–415 (2019). [https://doi.org/10.1109/RTEICT46194.](https://doi.org/10.1109/RTEICT46194.2019.9016866) [2019.9016866](https://doi.org/10.1109/RTEICT46194.2019.9016866)
- <span id="page-20-3"></span>27. G.K. Reddy, Low power-area pass transistor logic based alu design using low power full adder design, in *Proceedings of 2015 IEEE 9th International Conference on Intelligence Systems and Control (ISCO)*, pp. 1–6 (2015). <https://doi.org/10.1109/ISCO.2015.7282289>
- <span id="page-20-5"></span>28. R. Saraswat, S. Akashe, S. Babu, Designing and simulation of full adder cell using FinFET technique, in *Proceedings of 2013 7th International Conference Intelligence System Control (ISCO)*, pp. 261–264 (2013). <https://doi.org/10.1109/ISCO.2013.6481159>
- <span id="page-20-11"></span>29. N. Taherinejad, A. Abrishamifar, A new high speed, low power adder; using hybrid analog-digital circuits, in *Proceedings of 2009 European Conference on Circuit Theory and Design*, pp. 623–626 (2009). <https://doi.org/10.1109/ECCTD.2009.5275072>
- <span id="page-20-13"></span>30. R. Tripathi, S. Mishra, S.G. Prakash, A novel 14-transistors low-power high-speed PPM adder, in *Proceedings of 2011 International Symposium on Electronic System Design*, pp. 124–128 (2011). <https://doi.org/10.1109/ISED.2011.19>
- <span id="page-20-10"></span>31. C.-K. Tung, Y.-C. Hung, S.-H. Shieh, G.-S. Huang, A low-power highspeed hybrid CMOS full adder for embedded system, in *Proceedings of 2007 IEEE Design and Diagnostics of Electronic Circuits and Systems*, pp. 1–4 (2007). <https://doi.org/10.1109/DDECS.2007.4295280>
- <span id="page-20-9"></span>32. C.-K. Tung, S.-H. Shieh, Y.-C. Hung, M.-C. Tsai, High-performance low-power full-swing full adder cores with output driving capability, in *Proceedings of 2006 IEEE Asia-Pacific Conference on Circuits and Systems (APCCAS)*, pp. 614–617 (2006). <https://doi.org/10.1109/APCCAS.2006.342063>
- 33. P. Valsalan, O. Shihi, CMOS-DRPTL adder topologies, in *Proceedings of 2018 International Conference on Current Trends towards Converging Technologies*, pp. 1–5 (2018). [https://doi.org/10.1109/](https://doi.org/10.1109/ICCTCT.2018.8551109) [ICCTCT.2018.8551109](https://doi.org/10.1109/ICCTCT.2018.8551109)
- <span id="page-21-2"></span>34. F. Vasefi, Z. Abid, 10-transistor 1-bit adders for n-bit parallel adders, in *Proceedings of 16th International Conference Microelectronics 2004(ICM 2004)*, pp. 174–177 (2004). [https://doi.org/10.1109/](https://doi.org/10.1109/ICM.2004.1434237) [ICM.2004.1434237](https://doi.org/10.1109/ICM.2004.1434237)
- <span id="page-21-0"></span>35. S. Vidhyadharan, S.S. Dan, An efficient ultra-low-power and superior performance design of ternary half adder using CNFET and gate-overlap TFET devices. IEEE Trans. Nanotechnol. **20**, 365–376 (2021). <https://doi.org/10.1109/TNANO.2020.3049087>
- <span id="page-21-1"></span>36. C.-C. Wang, K.-C. Tsai, VLSI design of a 1.0 GHz 0.6-μm 8-bit CLA using PLA-styled all-N-transistor logic, in *Proceedings of 1998 IEEE International Symposium on Circuits and Systems (ISCAS)*, vol. 2, pp. 236–2392 (1998). <https://doi.org/10.1109/ISCAS.1998.706885>
- 37. A.K. Yadav, B.P. Shrivatava, A.K. Dadoriya, Low power high speed 1- bit full adder circuit design at 45nm CMOS technology, in *Proceedings of 2017 International Conference on Recent Innovation in Signal Processing an dEmbedded Systesms (RISE)*, pp. 427–432 (2017). [https://doi.org/10.1109/](https://doi.org/10.1109/RISE.2017.8378203) [RISE.2017.8378203](https://doi.org/10.1109/RISE.2017.8378203)

**Publisher's Note** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

## **Authors and Affiliations**

**Chua-Chin Wang1,2 · Oliver Lexter July A. Jose[1](http://orcid.org/0000-0003-4565-9506) · Wen-Shou Yang<sup>1</sup> · Ralph Gerard B. Sangalang1,[3](http://orcid.org/0000-0002-4120-382X) · Lean Karlo S. Tolentino1,[4](http://orcid.org/0000-0002-8014-8229) · Tzung-Je Lee[1](http://orcid.org/0000-0001-6870-7406)**

Oliver Lexter July A. Jose oljajose08@vlsi.ee.nsysu.edu.tw

Wen-Shou Yang yangbs1023@yahoo.com.tw

Ralph Gerard B. Sangalang ralphgerard.sangalang@g.batstate-u.edu.ph

Lean Karlo S. Tolentino leankarlo\_tolentino@tup.edu.ph

Tzung-Je Lee tjlee@ee.nsysu.edu.tw

- <sup>1</sup> Department of Electrical Engineering, National Sun Yat-Sen University, No. 70 Lian-Hai Road, Gushan District, Kaohsiung 80424, Taiwan
- <sup>2</sup> Institute of Undersea Technology, National Sun Yat-Sen University, No. 70 Lian-Hai Road, Gushan District, Kaohsiung 80424, Taiwan
- <sup>3</sup> Department of Electronics Engineering, Batangas State University The National Engineering University, Alangilan, Batangas 4200, Philippines
- <sup>4</sup> Department of Electronics Engineering, Technological University of the Philippines, Ayala Boulevard, Ermita, Manila 1000, Philippines