1 |
2 |
jsauermann |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
2 |
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
3 |
|
|
<HTML>
|
4 |
|
|
<HEAD>
|
5 |
|
|
<TITLE>html/Cpu_Core</TITLE>
|
6 |
|
|
<META NAME="generator" CONTENT="HTML::TextToHTML v2.46">
|
7 |
|
|
<LINK REL="stylesheet" TYPE="text/css" HREF="lecture.css">
|
8 |
|
|
</HEAD>
|
9 |
|
|
<BODY>
|
10 |
|
|
<P><table class="ttop"><th class="tpre"><a href="03_Pipelining.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="05_Opcode_Fetch.html">Next Lesson</a></th></table>
|
11 |
|
|
<hr>
|
12 |
|
|
|
13 |
|
|
<H1><A NAME="section_1">4 THE CPU CORE</A></H1>
|
14 |
|
|
|
15 |
|
|
<P>In this lesson we will discuss the core of the CPU. These days, the same
|
16 |
|
|
kind of CPU can come in different flavors that differ in the clock
|
17 |
|
|
frequency that that support, bus sizes, the size of internal caches
|
18 |
|
|
and memories and the capabilities of the I/O ports they provide.
|
19 |
|
|
We call the common part of these different CPUs the <STRONG>CPU core</STRONG>.
|
20 |
|
|
The CPU core is primarily characterized by the instruction set that it
|
21 |
|
|
provides. One could also say that the CPU core is the implementation
|
22 |
|
|
of a given instruction set.
|
23 |
|
|
|
24 |
|
|
<P>The details of the instruction set will only be visible at the next lower
|
25 |
|
|
level of the design. At the current level different CPUs (with
|
26 |
|
|
different instruction sets) will still look the same because they
|
27 |
|
|
all use the same structure. Only some control signals will be different
|
28 |
|
|
for different CPUs.
|
29 |
|
|
|
30 |
|
|
<P>We will use the so-called <STRONG>Harvard architecture</STRONG> because it fits better
|
31 |
|
|
to FPGAs with internal memory modules. Harvard architecture means that
|
32 |
|
|
the program memory and the data memory of the CPU are different. This
|
33 |
|
|
gives us more flexibility and some instructions (for example <STRONG>CALL</STRONG>,
|
34 |
|
|
which involves storing the current program counter in
|
35 |
|
|
memory while changing the program counter and fetching the next
|
36 |
|
|
instruction) can be executed in parallel).
|
37 |
|
|
|
38 |
|
|
<P>Different CPU cores differ in the in the instruction set that
|
39 |
|
|
they support. The types of CPU instructions (like arithmetic
|
40 |
|
|
instructions, move instructions, branch instructions, etc.) are
|
41 |
|
|
essentially the same for all CPUs. The differences are in the details
|
42 |
|
|
like the encoding of the instructions, operand sizes, number of
|
43 |
|
|
registers addressable, and the like).
|
44 |
|
|
|
45 |
|
|
<P>Since all CPUs are rather similar apart from details, within
|
46 |
|
|
the same base architecture (Harvard vs. von Neumann), the same
|
47 |
|
|
structure can be used even for different instruction sets. This
|
48 |
|
|
is because the same cycle is repeated again and again for the
|
49 |
|
|
different instructions of a program. This cycle consists of 3
|
50 |
|
|
phases:
|
51 |
|
|
|
52 |
|
|
<UL>
|
53 |
|
|
<LI>Opcode fetch
|
54 |
|
|
<LI>Opcode decoding
|
55 |
|
|
<LI>Execution
|
56 |
|
|
</UL>
|
57 |
|
|
<P><STRONG>Opcode fetch</STRONG> means that for a given value of the program counter
|
58 |
|
|
<STRONG>PC</STRONG>, the instruction (opcode) stored at location PC is read from the
|
59 |
|
|
program memory and that the PC is advanced to the next instruction.
|
60 |
|
|
|
61 |
|
|
<P><STRONG>Opcode decoding</STRONG> computes a number of control signals that will
|
62 |
|
|
be needed in the execution phase.
|
63 |
|
|
|
64 |
|
|
<P><STRONG>Execution</STRONG> then executes the opcode which means that a small number
|
65 |
|
|
of registers or memory locations is read and/or written.
|
66 |
|
|
|
67 |
|
|
<P>In theory these 3 phases could be implemented in a combinational way
|
68 |
|
|
(a static program memory, an opcode decoder at the output of the program
|
69 |
|
|
memory and an execution module at the output of the opcode decoder).
|
70 |
|
|
We will see later, however, that each phase has a considerable complexity
|
71 |
|
|
and we therefore use a 3 stage pipeline instead.
|
72 |
|
|
|
73 |
|
|
<P>In the following figure we see how a sequence of three opcodes ADD, MOV,
|
74 |
|
|
and JMP is executed in the pipeline.
|
75 |
|
|
|
76 |
|
|
<P><br>
|
77 |
|
|
|
78 |
|
|
<P><img src="cpu_core_1.png">
|
79 |
|
|
|
80 |
|
|
<P><br>
|
81 |
|
|
|
82 |
|
|
<P>From the discussion above we can already predict the big picture of
|
83 |
|
|
the CPU core. It consists of a pipeline with 3 stages opcode fetch,
|
84 |
|
|
opcode decoder, and execution (which is called data path in the design
|
85 |
|
|
because the operations required by the execution more or less imply
|
86 |
|
|
the structure of the data paths in the execution stage:
|
87 |
|
|
|
88 |
|
|
<P><br>
|
89 |
|
|
|
90 |
|
|
<P><img src="cpu_core_2.png">
|
91 |
|
|
|
92 |
|
|
<P><br>
|
93 |
|
|
|
94 |
|
|
<P>The pipeline consists of the <STRONG>opc_fetch</STRONG> stage that drives <STRONG>PC</STRONG>, <STRONG>OPC</STRONG>, and
|
95 |
|
|
<STRONG>T0</STRONG> signals to the opcode decoder stage.
|
96 |
|
|
The <STRONG>opc_deco</STRONG> stage decodes the <STRONG>OPC</STRONG> signal and generates a number of
|
97 |
|
|
control signals towards the execution stage, The execution stage then
|
98 |
|
|
executes the decoded instruction.
|
99 |
|
|
|
100 |
|
|
<P>The control signals towards the execution stage can be divided into 3 groups:
|
101 |
|
|
|
102 |
|
|
<OL>
|
103 |
|
|
<LI>Select signals (<STRONG>ALU_OP</STRONG>, <STRONG>AMOD</STRONG>, <STRONG>BIT</STRONG>, <STRONG>DDDDD</STRONG>, <STRONG>IMM</STRONG>, <STRONG>OPC</STRONG>, <STRONG>PMS</STRONG>,
|
104 |
|
|
<STRONG>RD_M</STRONG>, <STRONG>RRRRR</STRONG>, and <STRONG>RSEL</STRONG>). These signals control details (like register
|
105 |
|
|
numbers) of the instruction being executed.
|
106 |
|
|
<LI>Branch and timing signals (<STRONG>PC</STRONG>, <STRONG>PC_OP</STRONG>, <STRONG>WAIT</STRONG>, (and <STRONG>SKIP</STRONG> in the reverse
|
107 |
|
|
direction)). These signals control changes in the normal execution
|
108 |
|
|
flow.
|
109 |
|
|
<LI>Write enable signals (<STRONG>WE_01</STRONG>, <STRONG>WE_D</STRONG>, <STRONG>WE_F</STRONG>, <STRONG>WE_M</STRONG>, and <STRONG>WE_XYZS</STRONG>).
|
110 |
|
|
These signals define if and when registers and memory locations are
|
111 |
|
|
updated.
|
112 |
|
|
</OL>
|
113 |
|
|
<P>We come to the VHDL code for the CPU core. The entity declaration
|
114 |
|
|
must match the instantiation in the top-level design. Therefore:
|
115 |
|
|
|
116 |
|
|
<P><br>
|
117 |
|
|
|
118 |
|
|
<pre class="vhdl">
|
119 |
|
|
|
120 |
|
|
33 entity cpu_core is
|
121 |
|
|
34 port ( I_CLK : in std_logic;
|
122 |
|
|
35 I_CLR : in std_logic;
|
123 |
|
|
36 I_INTVEC : in std_logic_vector( 5 downto 0);
|
124 |
|
|
37 I_DIN : in std_logic_vector( 7 downto 0);
|
125 |
|
|
38
|
126 |
|
|
39 Q_OPC : out std_logic_vector(15 downto 0);
|
127 |
|
|
40 Q_PC : out std_logic_vector(15 downto 0);
|
128 |
|
|
41 Q_DOUT : out std_logic_vector( 7 downto 0);
|
129 |
|
|
42 Q_ADR_IO : out std_logic_vector( 7 downto 0);
|
130 |
|
|
43 Q_RD_IO : out std_logic;
|
131 |
|
|
44 Q_WE_IO : out std_logic);
|
132 |
|
|
<pre class="filename">
|
133 |
|
|
src/cpu_core.vhd
|
134 |
|
|
</pre></pre>
|
135 |
|
|
<P>
|
136 |
|
|
|
137 |
|
|
<P><br>
|
138 |
|
|
|
139 |
|
|
<P>The declaration and instantiation of <STRONG>opc_fetch</STRONG>, <STRONG>opc_deco</STRONG>, and <STRONG>dpath</STRONG>
|
140 |
|
|
simply reflects what is shown in the previous figure.
|
141 |
|
|
|
142 |
|
|
<P>The multiplexer driving <STRONG>DIN</STRONG> selects between data from the I/O input and
|
143 |
|
|
data from the program memory. This is controlled by signal <STRONG>PMS</STRONG> (<STRONG>program
|
144 |
|
|
memory select</STRONG>):
|
145 |
|
|
|
146 |
|
|
<P><br>
|
147 |
|
|
|
148 |
|
|
<pre class="vhdl">
|
149 |
|
|
|
150 |
|
|
240 L_DIN <= F_PM_DOUT when (D_PMS = '1') else I_DIN(7 downto 0);
|
151 |
|
|
<pre class="filename">
|
152 |
|
|
src/cpu_core.vhd
|
153 |
|
|
</pre></pre>
|
154 |
|
|
<P>
|
155 |
|
|
|
156 |
|
|
<P><br>
|
157 |
|
|
|
158 |
|
|
<P>The interrupt vector input <STRONG>INTVEC</STRONG> is <STRONG>and</STRONG>'ed with the global interrupt
|
159 |
|
|
enable bit in the status register (which is contained in the data path):
|
160 |
|
|
|
161 |
|
|
<P><br>
|
162 |
|
|
|
163 |
|
|
<pre class="vhdl">
|
164 |
|
|
|
165 |
|
|
241 L_INTVEC_5 <= I_INTVEC(5) and R_INT_ENA;
|
166 |
|
|
<pre class="filename">
|
167 |
|
|
src/cpu_core.vhd
|
168 |
|
|
</pre></pre>
|
169 |
|
|
<P>
|
170 |
|
|
|
171 |
|
|
<P><br>
|
172 |
|
|
|
173 |
|
|
<P>This concludes the discussion of the CPU core and we will proceed with
|
174 |
|
|
the different stages of the pipeline. Rather than following the natural
|
175 |
|
|
order (opcode fetch, opcode decoder, execution), however, we will describe
|
176 |
|
|
the opcode decoder last. The reason is that the opcode decoder is a
|
177 |
|
|
consequence of the design of the execution stage. Once the execution stage
|
178 |
|
|
is understood, the opcode decoder will become obvious (though still complex).
|
179 |
|
|
|
180 |
|
|
<P><hr><BR>
|
181 |
|
|
<table class="ttop"><th class="tpre"><a href="03_Pipelining.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="05_Opcode_Fetch.html">Next Lesson</a></th></table>
|
182 |
|
|
</BODY>
|
183 |
|
|
</HTML>
|