Low-Power Embedded Processor
|
|
Download this document [pdf] [doc] Download the Proposal Presentation [pdf] [ppt] Project ProposalIntroduction:An embedded processor is a processor that has been “embedded” into a device. It can be programmed to interact with different pieces of hardware. Performance wise, an embedded processor can outperform a microcontroller, but does not have as much performance as a general-purpose microprocessor. Low-power embedded processors are used in a wide variety of applications including cars, phones, digital cameras, printers, and other such devices. The reason for their wide use is that embedded processors are small; therefore, they do not take up much die area and are cheap to fabricate. Also embedded processors are verified, eliminating the need to spend additional engineering man-hours tracking down hardware flaws. Another great advantage in using embedded processors is that they run software, which enables one to deal with changing specifications as various system requirements change. Low power processors are the key to the realization of portable electronic devices, in which power consumption is an important factor. Low-power consumption helps to reduce heat dissipation, lengthen battery life, and increase device reliability. In this project we will implement a 16-bit RISC type embedded processor that will support a pre-defined instruction set. This processor will follow RISC architecture because it allows for a simpler implementation of our design. Various power saving techniques, such as reducing supply voltage, clock gating, and full custom design, will be employed in the processor architecture.
Ways of reducing power consumption:There are many ways to reduce power consumption of a processor. Some of the methods that we will employ in our design are listed below. Reduced supply voltage: Power consumption (P) of a CMOS based processor is related to the supply voltage (V), switching frequency (f), and CMOS gate capacitance (C). The relationship can be described as: P µ C · f · V2 The above relationship shows that power consumption can be reduced by reducing the supply voltage. But we also need to be aware of the fact that switching frequency is directly proportional to the supply voltage as well; that is: Vµ f Therefore, while lowering supply voltage will reduce power consumption, it will also result in lower switching frequency, and thus slower processor speed. A trade off must be made in the form of speed in order to reduce power usage. Full custom design: Full custom design will help to reduce the number of logic gates in the implementation of the necessary functionality of the processor. Since each logic gate requires power to operate, lower gate count means fewer switching and thus power need. Clock gating: Clock gating is a method where certain parts of the processor are prevented from receiving the clock signal. If a part of the processor is not needed for a given operation, then the clock signal to that part can be stopped. Since switching requires power and in the absence of the clock signal no switching will take place, gating the clock will lower power need.
Block Diagram:The block diagram for the processor is shown in diagram 1 below.
Diagram 1: Processor Block Diagram
Program Counter (PC): Program counter holds the address of the next instruction to be fetched from the Instruction memory. After each fetch cycle, the PC will be incremented by 2 to point to the next instruction, or if the previous instruction was a branch instruction then PC will hold the address of the instruction pointed to by the branch target. Instruction Memory: As the name indicates, instruction memory holds the instructions that the processor will execute. Since the address bus is 16 bits wide, the size of the instruction memory can be at most 216 or 64 kilo bytes. Control Unit: The control unit is responsible for decoding the opcode and generating the necessary control signals. The control signals generated by this unit go to the ALU, multiplier, data memory, register file, and the branch decide unit. These signals decide which of the module(s) to use for any given instruction. Register file: This two-port register file contains all sixteen general-purpose registers supported by this processor. Each of the registers is 16 bits wide. This unit supports two concurrent read and one write operation to the registers in each clock cycle. Sign extension unit: This unit takes an 8-bit input and sign extends the value to 16 bits. This unit is necessary for instructions that specify operation on immediate values. Since immediate values specified in instructions are 8-bits wide, they need to be sign extended to 16-bits before going to the next stage. ALU Control: This unit provides the signal that specifies which ALU operation is to be performed on the present set of data. Arithmetic Logic Unit (ALU): ALU is responsible for performing all the arithmetic and logical operations on data. Some of these operations require two operands and while others operate on only one operand. The operations include add, subtract, compare, and, or, not, xor, logical shift, and arithmetic shift. The output of the ALU goes either to the data memory (in the case where the output is an address) or through a multiplexer back to the register file. Multiplier: This unit is responsible for performing the multiplication operation. The inputs to this unit are two 16-bit numbers and the output is a 32 bit number. The output of the multiplier goes back to the register file through a multiplexer. Flag Register: This register holds all the flag bits. The bits are the Z (Zero), V (Overflow), C (Carry-out), and N (Negative). Branch Decide Unit: This unit is responsible for deciding whether to execute a branch instruction or not. It compares the Zero flag and the Branch signal from the control unit to decide whether the branch is to be taken. Output of this unit is a one bit value, which is ‘1’ when branch is taken and ‘0’ otherwise. Data Memory: This unit is similar to the instruction memory unit, but in this case this memory holds data instead of instructions. Like the instruction memory, the maximum possible data memory is also 64kb.
Instruction Table:The instruction set that the processor will support is shown in table 1 below.
Table 1: Instruction Table FPGA Design Flow:While the goal of this project is to create a full custom VLSI design, due to the high time and cost constrains associate with fabrication of a VLSI chip the final fabrication will not be completed. Instead, the processor will be implemented on a FPGA chip. Diagram 2 shows a standard design flow for a FPGA design:
Diagram 2: FPGA Design Flow
Schematic Entry: The design is entered into a synthesis design system using a hardware description language. The language we will be using is VHDL. Synthesis: A netlist is generated using the VHDL code and a logic synthesis tool. Place and Route: The place process decides the best location of the cells in a block based on the logic and desired performance. The route process makes the connections between the cells and the blocks Configuration: This is done by loading the configuration data into the internal memory. Encoding Chip: This is the final step in which the FPGA chip is programmed with the desired functionality.
VLSI Design Flow:Diagram 2 shows the VLSI design flow that we will follow for this project. Various steps for the design phase are described below the diagram.
Diagram 2: VLSI Design Flow
Synthesis: The synthesis stage creates a high-level description of the circuit structure. This description is created using a hardware description language, VHDL (Very High Speed Integrated Circuit Hardware Description Language). Place and Route: These two phases actually make up the layout phase. The main goal of the placement phase is to ease the routing of the synthesized design. During this phase, placement algorithms are used to minimize the total expected length of interconnects required for the placement of our design. The goal of the routing phase is complete routed interconnect, and also the minimization of routing delays. These delays are imposed due to the parasitic effects on interconnect and routing resources. Verification: After the layout phase, final verification is based on the parameters obtained with layout extractions programs. The final chip is tested with the combination of functional patterns obtained during the simulation and those obtained from the test pattern generator. Fabrication: Once verification of the design by simulation has been done, and the chip layout has been determined, the data can then be sent to a fabrication center for fabrication and post-fabrication testing. In our project, this phase of the VLSI design will not be completed due to the cost associated with fabrication; instead, a FPGA implementation will be realized.
Timeline:Below is the timeline that we will follow for this project.
References:1) Brake, Cliff, “Power Management In Portable ARM Based Systems”, http://www.microsoft.com/windows/embedded/docs/Power_Management.doc 2) “MIPS IV Instruction Set”, http://techpubs.sgi.com/library/manuals/2000/ 007-2597-001/pdf/007-2597-001.pdf 3) Brown, Richard, “A Microprocessor Design Project in an Introductory VLSI Course”, IEEE Transactions on Education, Vol. 43, No. 3, August 2000. 4) Hamblen, James and Furman, Michael, Rapid Prototyping of Digital Systems, 2nd edition, Boston: Kluwer Academic Publishers, 2001. 5) Andrej Zemva, VLSI Design Synthesis Flow, http://www.cbl.ncsu.edu/publications/1996-Thesis-PhD-Zemva/1996-Thesis-PhD-Zemva-HTML/node5.html 6) “Introduction to Embedded Processors”, http://www.cs.ucsd.edu/classes/sp02/cse291_E/slides/armlect.pdf 7) “A Microelectronics Primer”, http://www.cmc.ca/about/corporation/plan/Module5/appendix5a.html 8) “ECE 252 / CSE 252 Digital Systems Design Lecture 2”, http://www.engr.uconn.edu/~chandy/ece252/252ln02.pdf
Note: This document was last modified on Nov. 10, 2003
|
|
|