1 |
2 |
Bregalad |
============================================
|
2 |
3 |
Bregalad |
=== ARM4U Documentation ===
|
3 |
2 |
Bregalad |
= By Jonathan Masur, 2014 ==
|
4 |
|
|
= Made in spring 2014 for OpenCores release ==
|
5 |
|
|
============================================
|
6 |
|
|
|
7 |
|
|
****************
|
8 |
|
|
** Introduction **
|
9 |
|
|
****************
|
10 |
|
|
|
11 |
|
|
ARM4U is a "softcore" processor that was created in the context of an university project in the processor architecture laboratory at Ecole Polytechnique Fédérale de Lausanne ( http://lap.epfl.ch )
|
12 |
|
|
|
13 |
3 |
Bregalad |
We decided, one year after the complexion of the project, to release the processor on the site OpenCores ( http://www.opencores.org ) for free under the GPL licence in order to make the source code and documentations available to the general public. It comes as-it with ABSOLUTELY NO WARRANTY.
|
14 |
2 |
Bregalad |
|
15 |
|
|
The ARM4U processor clones early ARM processors in functionality, it implements the almost full ARMv3 instruction set, and can be targeted by the GCC toolchain. It is free for use and distribute for anyone. However, if someone ever make a cool use of this processor, I would of course be very happy to know about it.
|
16 |
3 |
Bregalad |
This documentation doesn't cover the ARM by itself, for most info about the inner working of the processor (instruction set, etc...) please consult documentation of the ARM processors. This doccumentation instead covers how to use the softcore and what are the difference between it and a genuine ARM.
|
17 |
2 |
Bregalad |
|
18 |
|
|
**************************************
|
19 |
|
|
** Internal workings of the processor **
|
20 |
|
|
*************************************
|
21 |
|
|
|
22 |
|
|
The processor works with a classical 5-stage RISC pipeline (Fetch, Decode, Execute, Memory, Writeback).
|
23 |
|
|
Since a drawing is worth a thousand words, schematics of the processor are joined. PLEASE CONSULT THE SCHEMATICS FOR UNDERSTANDING THE INNER WORKING OF THE PROCESSOR.
|
24 |
|
|
|
25 |
|
|
The processor was not build for extreme performance, nor for extreme minimization of FPGA resources. Instead it was build with the 3 goals of : simplicity, pedagogy, but fully working and usable result.
|
26 |
|
|
|
27 |
3 |
Bregalad |
The CPU communicates with the external world (memory, I/O, etc...) through the Altera Avalon bus. The CPU can be used as a QSys component, just like the NIOS II processor furnished by Altera. However, it should be relatively straightforward to adapt it to another bus. We managed to synthesize a 50 MHz version using a Cyclone IV FPGA. The resource usage was only slightly larger than a NIOS II/s (standard), but the frequency was lower. However, the ARM instructions are more dense and efficient overall, and we can expect comparable performance between both CPU. No benchmarks were made to proof that.
|
28 |
2 |
Bregalad |
|
29 |
|
|
The instruction cache allows to fetch instructions while reading/writing to memory, and to fetch a new instruction each cycle (hopefully) even if the memory has a read/write latency (DRAM).
|
30 |
3 |
Bregalad |
There is no cache coherency : an attempt to write self-modifying code will not work unless some additional circuitry is added done.
|
31 |
2 |
Bregalad |
|
32 |
|
|
*************************************
|
33 |
|
|
** Differences with an authentic ARM **
|
34 |
|
|
*************************************
|
35 |
|
|
|
36 |
3 |
Bregalad |
The ARM4U behaves identically to an ARM implementing the ARMv3 instruction set (ARM6 generation) except for the following differences :
|
37 |
2 |
Bregalad |
|
38 |
|
|
- Abort mode and interrupt doesn't exist
|
39 |
|
|
- There is no support for coprocessor, and related instructions
|
40 |
|
|
- There is no 24-bit (ARMv2) compatibility mode
|
41 |
|
|
- The 'msr' instruction always affect all status flags (you can't limit it to a part of the flags, leaving other flags unaffected)
|
42 |
|
|
- When an interrupt occurs, the status flags takes an hard-coded values. For conditional flags, this shouldn't be a problem, the only major difference is that the 'F' flag is cleared when an IRQ triggers, in other words, FIQs are enabled whenever an IRQ happens
|
43 |
|
|
- R15 (PC+8) can be used as an input for every instructions, and will always produce correct results, even when doing so is forbidden on an authentic ARM
|
44 |
|
|
- 'mul' and 'mla' instructions can be used for all operands and will always produce correct results, even when doing so is forbidden on an authentic ARM
|
45 |
|
|
- 'mlas' instruction will affect the overflow and carry flags based on the addition operation
|
46 |
|
|
- 'swap' and 'swapb' instructions are absent
|
47 |
|
|
|
48 |
|
|
**************
|
49 |
|
|
** Interrupts **
|
50 |
|
|
**************
|
51 |
|
|
|
52 |
|
|
The following interrupts are supported
|
53 |
|
|
|
54 |
|
|
- Reset
|
55 |
|
|
- IRQ
|
56 |
|
|
- FIQ ("fast IRQ")
|
57 |
|
|
- Software interrupt ('swi' instruction)
|
58 |
|
|
- Undefined instruction trap (any instruction not implemented)
|
59 |
|
|
|
60 |
|
|
The vectors, register bank switching, PSW and PC saving words exactly the same as on an authentic ARM. Other kinds of interrupts (namely, "abort") aren't supported.
|
61 |
|
|
|
62 |
|
|
*******************
|
63 |
|
|
** Compiling notes **
|
64 |
|
|
*******************
|
65 |
|
|
|
66 |
|
|
1) With GCC
|
67 |
|
|
VERY IMPORTANT : Always use command line options --fix-v4bx and -march=armv3 when compiling code for the ARM4U with GCC !
|
68 |
|
|
|
69 |
|
|
When compiling C code, use -Xassembler --fix-v4bx instead of plain --fix-v4bx
|
70 |
|
|
|
71 |
|
|
According to our tests and experiences, the difference between this processor and a genuine ARMv3 instruction set is normally too subtle to make compiled C code fail, but the CPU comes with *absolutely no warranty*.
|
72 |
|
|
|
73 |
|
|
2) With other compiler/assembler
|
74 |
|
|
Consult your compiler's documentation and make sure that no "new" instructions from more recent instruction sets than ARMv3 are ever used. It's possible to simulate them with the undefined instruction trap, too.
|
75 |
|
|
|
76 |
|
|
3) A note about endianness :
|
77 |
|
|
This CPU has been made "little endian", in the sense that individual byte access to memory are made in that order on the bus. That would be trivial to change by affecting the "memory.vhd" file, lines 77-80.
|
78 |
|
|
|
79 |
|
|
However, because of conversion issues of .hex files between 32-bit .hex files and 8-bit .hex files inside the Altera Quartus program, we had to use -EB option as well, in order to make the generated binary code appear in big endian in the hex file. The processor itself is not big endian. As far as we know, the -EB option in GCC has only 2 effects :
|
80 |
|
|
|
81 |
|
|
1) The generated file (either binary, hex, or object file) is written in the corresponding order
|
82 |
|
|
2) A bit in object file's header is affected so that it prevents linking big and little endian object files together
|
83 |
|
|
The -EB file doesn't affect the compiled code itself in any way, as far as we know.
|
84 |
|
|
|
85 |
3 |
Bregalad |
****************
|
86 |
|
|
** Test program **
|
87 |
|
|
****************
|
88 |
2 |
Bregalad |
|
89 |
|
|
A test program using all ARM instructions is included as an example, it was used to debug and proof correct operation of the processor.
|
90 |
|
|
|
91 |
|
|
Unfortunately the processor doesn't come with any debugger, so FPGA usage is a bit painful, as the whole hardware has to be re-downloaded for each change in the program, and the only way to debug program is using output LEDs or anything similar.
|
92 |
|
|
|
93 |
|
|
***********
|
94 |
|
|
** Contact **
|
95 |
|
|
***********
|
96 |
|
|
|
97 |
|
|
Contact me at jmasur [at] bluewin [dot] ch if needed.
|