### **EECE 481**

### High-Speed CMOS Gate Design (Logical Effort) Lecture 10

### Reza Molavi Dept. of ECE University of British Columbia reza@ece.ubc.ca

Slides Courtesy : Dr. Res Saleh (UBC), Dr. D. Sengupta (AMD), Dr. B. Razavi (UCLA)

## **Optimal Path Delay Design - Review**

There are two unknowns, the number of gates and the size of each (two degrees of freedom)

$$C_{in} = C_{g}(W_{n} + W_{g}) = C_{g}(W_{n} + 2W_{n}) = C_{g}(3W_{n})$$
$$R_{eff} = R_{eqn}\left(\frac{L_{n}}{W_{n}}\right)$$
$$\tau_{inv} = R_{eff}C_{in} = R_{eqn}\left(\frac{L_{n}}{W_{n}}\right)C_{g}(3W_{n}) = 3R_{eqn}C_{g}L_{n}$$

This is the *intrinsic delay* of a gate (specific tag) used many places



# Gate Sizing for Optimal Path delay – All inverters

total\_delay = 
$$\sum_{j=1}^{N} \tau_{inv} \left( \frac{C_{j+1}}{C_j} + \gamma_{inv} \right)$$

$$C_{in} = \underbrace{c_{j-1}}_{j-1} \underbrace{c_{j+1}}_{j+1} \underbrace{c_{in}}_{j+1} \underbrace{c_{in}}_{j+1}$$

$$\text{total\_delay} = \sum_{j} \tau_{\text{inv}} \left( \frac{C_{g} W_{j+1}}{C_{g} W_{j}} + \gamma_{\text{inv}} \right) = \sum_{j} \tau_{\text{inv}} \left( \frac{W_{j+1}}{W_{j}} + \gamma_{\text{inv}} \right)$$

Let's consider two consecutive ones

Minimize the delay (find a W that makes  $dD_j/dW = 0$ )

$$D_{j} = \tau_{inv} \left( \frac{W_{j}}{W_{j-1}} + \gamma_{inv} \right) + \tau_{inv} \left( \frac{W_{j+1}}{W_{j}} + \gamma_{inv} \right)$$
$$\frac{\partial D_{j}}{\partial D_{j}} = 1 \qquad W_{j+1} = 0$$

$$\frac{\overline{W_j}}{\partial W_j} = \tau_{irw} \frac{1}{W_{j-1}} - \tau_{irw} \frac{1}{W_j^2} = 0$$
$$\therefore \frac{W_j}{W_{j-1}} = \frac{W_{j+1}}{W_j}$$

$$\therefore W_j = \sqrt{W_{j+1}W_{j-1}}$$

If the size of each gate is the geometric mean of the two gates (previous and after) the delay Is minimum!

### Gate Sizing for Optimal Path delay - Review



Therefore, we can consider the size of gates a geometric sequence with factor *f* 

$$f^{N}C_{in} = C_{ioad}$$

$$\therefore N = \frac{\ln(C_{ioad}/C_{in})}{\ln f}$$

$$total_{delay} = N \times \tau_{inv} \left(\frac{C_{j}}{C_{j-1}} + \gamma_{inv}\right)$$

$$total_{delay} = \frac{\ln(C_{ioad}/C_{in})}{\ln f} \times \tau_{inv}(f + \gamma_{inv})$$

What *f* makes the total delay a minimum?

# **Optimum Fan-out for CMOS Gates**



The optimum value of *f* depends on  $\gamma$  (for  $\gamma = 0$ , it is *e*).

It is similar calculation if we have chain of NAND and NORs:

$$\begin{aligned} \tau_{\text{rand}} &= R_{egn}C_{\text{in}} = R_{egn}\left(\frac{L_n}{W_n}\right) 4 W_n C_g = 4 R_{egn}C_g L_n \\ \tau_{\text{ror}} &= R_{egn}C_{\text{in}} = R_{egn}\left(\frac{L_n}{W_n}\right) 5 W_n C_g = 5 R_{egn}C_g L_n \end{aligned}$$
 total\_delay =  $\sum_j \tau_{\text{rand}}\left(\frac{C_{j+1}}{C_j} + \gamma_{\text{rand}}\right)$ 

# **Optimum Fan-out for CMOS Gates - Review**



 $:: \tau_{nand} FO_j = \tau_{inv} FO_{j+1} \leftarrow when the Fan-out portion of delays are equal for all gates$ 

# **Optimum Fan-out for CMOS Gates**

Example: Find the device sizes that optimize the delay through the indicated path for the circuit below.



# **Optimum Fan-out for CMOS Gates**

We must equalize the fanout portion of the delay. Therefore,

$$\therefore \tau_{\text{read}} \left( \frac{C_{j+1}}{C_{\text{in}}} \right) = \tau_{\text{inv}} \left( \frac{C_{j+2}}{C_{j+1}} \right) = \tau_{\text{ror}} \left( \frac{C_{\text{load}}}{C_{j+2}} \right)$$

We take the product of these three components and then obtain the geometric mean:

$$\begin{aligned} \mathsf{Fanout\_delay} &= \sqrt[3]{\tau_{\mathsf{rand}} \left(\frac{C_{j+1}}{C_{\mathsf{in}}}\right) \times \tau_{\mathsf{inv}} \left(\frac{C_{j+2}}{C_{j+1}}\right) \times \tau_{\mathsf{nor}} \left(\frac{C_{\mathsf{load}}}{C_{j+2}}\right)} \\ &= \sqrt[3]{\tau_{\mathsf{rand}} \times \tau_{\mathsf{inv}} \times \tau_{\mathsf{nor}} \left(\frac{C_{\mathsf{load}}}{C_{\mathsf{in}}}\right)} = \sqrt[3]{4 \times 3 \times 5 \left(\frac{200}{2}\right) \times R_{\mathsf{eqn}} C_g L_n} \\ &= 18.2 R_{\mathsf{eqn}} C_g L_n \end{aligned}$$

Therefore, the input capacitance for each gate can be computed by setting the fanout delay to the above result:

$$\begin{aligned} \operatorname{Fror}\left(\frac{C_{\text{load}}}{C_{j+2}}\right) &= 5R_{\text{eq}p}C_gL_p\left(\frac{200 \text{ fF}}{C_{j+2}}\right) = 18.2\,R_{\text{eq}p}C_gL_p\\ \therefore C_{j+2} &= 55 \text{ fF}\\ \tau_{\text{inv}} &= \left(\frac{C_{j+2}}{C_{j+1}}\right) = 3R_{\text{eq}p}C_gL_p\left(\frac{55 \text{ fF}}{C_{j+1}}\right) = 18.2\,R_{\text{eq}p}C_gL_p\\ \therefore C_{j+1} &= 9.1 \text{ fF}\\ \tau_{\text{rand}}\left(\frac{C_{j+1}}{C_{\text{in}}}\right) &= 4\,R_{\text{eq}p}C_gL_p\left(\frac{9.1 \text{ fF}}{C_{\text{in}}}\right) = 18.2\,R_{\text{eq}p}C_gL_p\\ \therefore C_{\text{in}} &= 2 \text{ fF}\end{aligned}$$

### Steps to solve

- 1. Make the fan-out portions equal = X
- 2. Multiply all of them =  $X^n$ 
  - 1. This will make the internal

fan-outs vanish and only remaining Terms are the input/output load and Intrinsic delays, take nth root to find X

- 3. Work backwards from the load to find the size of each gate and then device
- This technique gives you the size first and then you calculate the delay

# Logical Effort Method (normalized delay)

One way to simplify delay calculation is to normalize the total delay w.r.t  $\tau_{inV}$ 

$$T_{inV} = R_{eqn}L^*3C_g = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m m^*.1\mu m = 7.5 \text{ ps}$$

$$\frac{1}{2 \text{ fF}} = 12.5k *3*2fF/\mu m^*.1\mu m m^*.1\mu m m^*.1\mu m m m^*.1\mu m m m^*.1\mu m m^*.1\mu m m^*.1\mu m m m^*.1\mu m m^*.1\mu m m^*.1\mu m m^*.1\mu m m^*.1\mu m m m^*.1\mu$$

$$D = (LE_{\text{nand}}FO_1 + P_{\text{nand}}) + (LE_{\text{inv}}FO_2 + P_{\text{inv}}) + (LE_{\text{hor}}FO_3 + P_{\text{nor}})$$

we need to learn how to calculate *LE*, and *P* for different gates

**Ex.** If path delay is 13D therefore the total delay 13\*7.5ps = 97.5ps

\* All we need to do (for delay optimization) is to make SE equal for cascaded gates in a chain

# Logical Effort – Calculating LE

In order to calculate LE for any gate we first make its on-resistance (both pull-up and pull-down paths) equal to an inverter (if needed scale devices) and then take the ratio of input capacitors

For the NAND gate: 
$$LE = \frac{(C_{in})_{rand}}{(C_{in})_{inv}} = \frac{2+2}{3} = \frac{4}{3}$$
  
For the NOR gate:  $LE = \frac{(C_{in})_{ror}}{(C_{in})_{inv}} = \frac{4+1}{3} = \frac{5}{3}$ 



### Logical Effort – Calculation P

For Inverter:  

$$P = LE_{inv} \times \gamma_{inv} = LE \times \frac{C_{self}}{C_{in}} = LE \times \frac{C_{eff} 3W}{C_g 3W} = LE \times \frac{C_{eff}}{C_g}$$

$$P_{inv} = (1) \times \frac{1 \text{ fF}/\mu\text{m}}{2 \text{ fF}/\mu\text{m}} = \frac{1}{2}$$

For NAND:  

$$P_{\text{rand}} = LE_{\text{rand}} \times \gamma_{\text{rand}} = LE \times \frac{C_{\text{self}}}{C_{\text{in}}}$$

$$= LE \times \frac{C_{\text{eff}}(2W + 2W + 2W)}{C_{\text{g}}(2W + 2W)} = LE \times \frac{C_{\text{eff}}}{C_{\text{g}}} \times \frac{3}{2} = \frac{4}{3} \times \frac{1}{2} \times \frac{3}{2} = 1$$

For NOR 
$$P_{nor} = LE_{nor} \times \gamma_{nor} = LE \times \frac{C_{\text{self}}}{C_{in}}$$

$$= LE \times \frac{C_{\text{eff}}(W + 4W + 4W)}{C_{g}(W + 4W)} = LE \times \frac{C_{\text{eff}}}{C_{g}} \times \frac{9}{5} = \frac{5}{3} \times \frac{1}{2} \times \frac{9}{5} = \frac{3}{2}$$

# Logical Effort – Interpreting LE and P

Stage Delay = LE \* EE (FO) + P

Let's plot Stage delay vs. FO for different gates



The graph indicates that for a fixed FO (EE) inverters shows the lowest delay, next NAND and finally NOR (Why NOR is the slowest?)

EE or Electrical effort is the Fan-out of the gate

In other words, in optimizing the delay when we make the SE equal for all gates, NOR will drive the lowest Fan-out (load) \*<u>intuitively speaking</u>, the weakest guys (gate) in

a team (chain) should do the least amount of

work (Fan-out) to deliver the fastest results

(unfortunately, no ethics in CMOS world!)

# Logical Effort – Example

#### Path Optimization Using Logical Effort

#### Problem:

Repeat Example 6.10 using logical effort techniques. However, before specifying the sizes, compute the optimal delay.



### (Board Notes)

# Logical Effort – Example2

In Lecture 8 we learnt that a one-stage 8-input CMOS AND is not a good idea and We presented two alternative implementations, now can analyze them quantitatively and find Better of the two in terms of optimum delay.





(Board Notes)

# Branching Effort

Sometimes a signal drives more than one path, if the paths are identical we can consider Their effect by multiplying up the entire path effort by number of identical paths (also called Branching effort)



total\_path\_effort = 
$$\prod (LE \times BE \times FO) = \prod (LE \times BE) \times \frac{C_{\text{bad}}}{C_{\text{in}}}$$

If the other paths are not identical (*sideload*), you should write the total Delay and optimize gate sizes based on the minimizing the delay locally (between every two stages derive the optimum gate size from  $\delta D/\delta$ size) = 0, Review Example **6.16** 

# Example with Branching effort

#### Branching Effort Example

#### Problem:

Select gate sizes y and z to minimize delay in the highlighted path:



Note that in backward size calculations We have to consider the effect of branching Effort on the size of each previous stage

#### Solution:

Logical Effort:  $LE_p = (4/3)^3$ Electrical Effort:  $FO_p = C_{out}/C_{in} = 4.5$ Branching Effort:  $BE_p = (2)(3) = 6$ Path Effort:  $PE = (LE_p)(BE_p)(FO_p) = 64$ Optimal Stage Effort:  $SE^* = (PE)^{1/3} = 4$ Delay:  $D = (N)(SE^*) + Parasitics$ Delay: D = (3)(4) + (3)(1) = 15