#### Topics

#### - Low Power Techniques

Based on Penn State CSE477 Lecture Notes ©2002 M.J. Irwin and adapted from *Digital Integrated Circuits* ©2002 J. Rabaey

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

#### Review: Energy & Power Equations $\mathbf{E} = \mathbf{C}_{||} \mathbf{V}_{DD}^{2} \mathbf{P}_{0 \rightarrow 1} + \mathbf{t}_{sc} \mathbf{V}_{DD} \mathbf{I}_{peak} \mathbf{P}_{0 \rightarrow 1} + \mathbf{V}_{DD} \mathbf{I}_{leakage}$ $f_{0 \rightarrow 1} = P_{0 \rightarrow 1} * f_{clock}$ $\mathbf{P} = \mathbf{C}_{\mathsf{L}} \, V_{\mathsf{DD}}^2 \, \mathbf{f}_{0 \to 1} + \mathbf{t}_{\mathsf{sc}} \mathbf{V}_{\mathsf{DD}} \, \mathbf{I}_{\mathsf{peak}} \, \mathbf{f}_{0 \to 1} + \mathbf{V}_{\mathsf{DD}} \, \mathbf{I}_{\mathsf{leakage}}$ Short-circuit Leakage power Dynamic power (~90% today and (~2% today and power decreasing (~8% today and increasing) relatively) decreasing

absolutely)

© John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

#### Power and Energy Design Space

|         | Constant<br>Throughput/Latency                    |                                   | Variable<br>Throughput/Latency |                                                   |
|---------|---------------------------------------------------|-----------------------------------|--------------------------------|---------------------------------------------------|
| Energy  | Design Time                                       | Non-active Modules                |                                | Run Time                                          |
| Active  | Logic Design<br>Reduced V <sub>dd</sub><br>Sizing | Clock Gating<br>Sleep Transistors |                                | DFS, DVS<br>(Dynamic<br>Freq, Voltage<br>Scaling) |
|         | Multi-V <sub>dd</sub>                             |                                   |                                | - Coamigy                                         |
| Leakage | + Multi-V <sub>T</sub>                            | Multi-<br>Variab                  | <b></b>                        | + Variable $V_T$                                  |

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

#### **Bus Multiplexing**

- Buses are a significant source of power dissipation due to high switching activities and large capacitive loading
  - 15% of total power in Alpha 21064
  - 30% of total power in Intel 80386
- Share long data buses with time multiplexing (S<sub>1</sub> uses even cycles, S<sub>2</sub> odd)



• But what if data samples are correlated (e.g., sign bits)?

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

## **Correlated Data Streams**



- For a shared (multiplexed) bus advantages of data correlation are lost (bus carries samples from two uncorrelated data streams)
  - Bus sharing should not be used for positively correlated data streams
  - Bus sharing may prove advantageous in a negatively correlated data stream (where successive samples switch sign bits) more random switching

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

# **Glitch Reduction by Pipelining**

- Glitches depend on the logic depth of the circuit gates deeper in the logic network are more prone to glitching
  - arrival times of the gate inputs are more spread due to delay imbalances
  - usually affected more by primary input switching
- Reduce logic depth by adding pipeline registers
  - additional energy used by the clock and pipeline registers



# Power and Energy Design Space

|         | Constant<br>Throughput/Latency |                                                                       | Variable<br>Throughput/Latency |                      |
|---------|--------------------------------|-----------------------------------------------------------------------|--------------------------------|----------------------|
| Energy  | Design Time                    | Non-active Modules                                                    |                                | Run Time             |
|         | Logic Design                   | Clock Gating                                                          |                                | DFS, DVS<br>(Dynamic |
| Active  | Reduced $V_{dd}$               |                                                                       |                                |                      |
|         | Sizing                         |                                                                       |                                | Freq, Voltage        |
|         | Multi-V <sub>dd</sub>          |                                                                       |                                | Scaling)             |
|         |                                | Sleep Transistors<br>Multi-V <sub>dd</sub><br>Variable V <sub>T</sub> |                                | + Variable $V_{T}$   |
| Leakage | + Multi-V <sub>⊤</sub>         |                                                                       |                                |                      |
|         |                                |                                                                       |                                |                      |

# **Clock Gating**

٠

 Most popular method for power reduction of clock signals and functional units



 gating OR gate can replace a buffer in the clock distribution tree

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

## Clock Gating in a Pipelined Datapath

• For idle units (e.g., floating point units in Exec stage, WB stage for instructions with no write back operation)



# Power and Energy Design Space

|         | Constant<br>Throughput/Latency |                    | Variable<br>Throughput/Latency |                  |
|---------|--------------------------------|--------------------|--------------------------------|------------------|
| Energy  | Design Time                    | Non-active Modules |                                | Run Time         |
|         | Logic Design                   | Clock Gating       |                                | DFS, DVS         |
| Active  | Reduced $V_{dd}$               |                    |                                | (Dynamic         |
|         | Sizing                         |                    |                                | Freq, Voltage    |
|         | Multi-V <sub>dd</sub>          |                    |                                | Scaling)         |
|         |                                | Sleep Transistors  |                                |                  |
| Leakage | + Multi-V <sub>⊤</sub>         | Multi              | -V <sub>dd</sub>               | + Variable $V_T$ |
|         |                                | Variable $V_T$     |                                |                  |

# Review: Dynamic Power as a Function of $V_{DD}$

- Decreasing the V<sub>DD</sub> decreases dynamic energy consumption (quadratically)
- But, increases gate delay (decreases performance)



 Determine the critical path(s) at design time and use high V<sub>DD</sub> for the transistors on those paths for speed. Use a lower V<sub>DD</sub> on the other logic to reduce dynamic energy consumption.

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

# **Dynamic Frequency and Voltage Scaling**

- Intel's SpeedStep
  - Hardware that steps down the clock frequency (dynamic frequency scaling – DFS) when the user unplugs from AC power
    - PLL from 650MHz  $\rightarrow$  500MHz
  - CPU stalls during SpeedStep adjustment

# **Dynamic Frequency and Voltage Scaling**

- Transmeta LongRun
  - Hardware that applies both DFS and DVS (dynamic supply voltage scaling)
    - 32 levels of  $V_{DD}$  from 1.1V to 1.6V
    - PLL from 200MHz  $\rightarrow$  700MHz in increments of 33MHz
  - Triggered when CPU load change is detected by software
    - heavier load  $\rightarrow$  ramp up V<sub>DD</sub>, when stable speed up clock
    - lighter load  $\rightarrow$  slow down clock, when PLL locks onto new rate, ramp down  $V_{\text{DD}}$
  - CPU stalls only during PLL relock (< 20 microsec)</li>

# **Dynamic Thermal Management (DTM)**



Trigger Mechanism: When do we enable DTM techniques? Initiation Mechanism: How do we enable technique?



**Response Mechanism:** 

What technique do we enable?

© John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

# **DTM Trigger Mechanisms**



- Mechanism: How to deduce temperature?
- Direct approach: on-chip temperature sensors
  - Based on differential voltage change across 2 diodes of different sizes
  - May require >1 sensor
  - Hysteresis and delay are problems

- Policy: When to begin responding?
  - Trigger level set too high means higher packaging costs
  - Trigger level set too low means frequent triggering and loss in performance
- Choose trigger level to exploit difference between average and worst case power<sub>© John A. Chandy</sub>

Dept. of Electrical and Computer Engineering

University of Connecticut

# **DTM Initiation and Response Mechanisms**



- Operating system or microarchitectural control?
  - Hardware support can reduce performance penalty by 20-30%
- Initiation of policy incurs some delay
  - When using DVS and/or DFS, much of the performance penalty can be attributed to enabling/disabling overhead
  - Increasing policy delay reduces overhead; smarter initiation techniques would help as well
- Thermal window (100Kcycles+)
  - Larger thermal windows "smooth" short thermal spikes

## **DTM Activation and Deactivation Cycle**



- □ Initiation Delay OS interrupt/handler
- Response Delay Invocation time (e.g., adjust clock)
- Policy Delay Number of cycles engaged
- Shutoff Delay Disabling time (e.g., re-adjust clock)

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

## **DTM Savings Benefits**

Designed for cooling capacity without DTM



ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20 © John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Temperature

# Power and Energy Design Space

|         | Constant<br>Throughput/Latency |                       | Variable<br>Throughput/Latency |                  |
|---------|--------------------------------|-----------------------|--------------------------------|------------------|
| Energy  | Design Time                    | Non-active Modules    |                                | Run Time         |
|         | Logic Design                   | Clock Gating          |                                | DFS, DVS         |
| Active  | Reduced $V_{dd}$               |                       |                                | (Dynamic         |
|         | Sizing                         |                       |                                | Freq, Voltage    |
|         | Multi-V <sub>dd</sub>          |                       |                                | Scaling)         |
|         |                                | Sleep Tra             | Insistors                      |                  |
| Leakage | + Multi-V <sub>T</sub>         | Multi-V <sub>dd</sub> |                                | + Variable $V_T$ |
|         |                                | Variab                | le V <sub>T</sub>              |                  |

ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

## Speculated Power of a 15mm $\mu$ P



# Review: Leakage as a Function of Design Time $V_{\mathsf{T}}$

- Reducing the V<sub>T</sub> increases the subthreshold leakage current (exponentially)
- But, reducing V<sub>T</sub> decreases gate delay (increases performance)



Determine the critical path(s) at design time and use low V<sub>T</sub> devices on the transistors on those paths for speed.
Use a high V<sub>T</sub> on the other logic for leakage control.

## Review: Variable $V_T$ (ABB) at Run Time

 $V_T = V_{T0} + \gamma \left( \sqrt{\left| -2\phi_F + V_{SB} \right|} - \sqrt{\left| -2\phi_F \right|} \right)$ 

where  $V_{T0}$  is the threshold voltage at  $V_{SB}$  = 0

 $V_{\text{SB}}$  is the source-bulk (substrate) voltage

 $\gamma$  is the body-effect coefficient

For an n-channel device,
the substrate is normally tied
to ground

 □ A negative bias causes V<sub>T</sub> to increase from 0.45V to 0.85V

Adjusting the substrate bias at run time is called adaptive body-biasing (ABB)



ECE 249 VLSI Design and Simulation Spring 2005 Lecture 20

### Next class

- Testing and Verification
- Exam April 12th
- No lab tomorrow
  - Work on final project