Archive

Assembly Language - Part 1



Chris Johnson

Paul has had a number of requests over the past few months for some articles on assembly language, and I have agreed to try to satisfy at least some of this demand. It seems that, as Acorn computers have become more sophisticated, the amount of programming information supplied as part of the standard documentation has decreased, such that even the Basic manual is an optional extra.

General information on the use of assembly language is extremely sparse indeed. The only sources available to most users are the occasional articles in the Acorn related magazines, such as Archive, Risc User, etc. For example, in Volumes 1 and 2 of Archive, there was a series on assembler programming by Alan Glover.

I shall be targeting this series very much at the beginner. However, it must be said at the outset that assembly language is not for everyone, and the early learning curve may appear to be quite daunting. As in many occupations, perseverance pays dividends, and once over the early barrier, things will, I hope, become much easier. One prerequisite for any learner is some familiarity with a high level language. For almost everyone, this high level language is Basic. Not only is the explanation of concepts made easier by a comparison with a high level language such as Basic, but Basic on Acorn systems has its own built-in assembler. I shall be basing all my examples on the use of the built-in assembler.

This first part is concerned primarily with the CPU and how it operates. In later parts, we shall look at (amongst other things):

Depending on the degree of feedback I get, this could turn into a long series of articles!

When would assembler be used?

I suppose we should first establish whether assembler is going to be useful once we have learnt the basics. What sort of applications does it have? It is no secret that Impression is written almost entirely in assembler. However, it is unlikely that major applications would now be written in this way, with most software houses preferring C/C++, to speed development and debugging time.

Where assembler would be used is, for example, in the development of operating systems (RISC OS, of course); for the support software of hardware add-ons, such as hard disc controllers, ethernet cards, sound cards, etc; for relocatable modules, which are really extensions to the operating system; and for speed-critical parts of a program, which is mainly written in a high level language.

As an example of the latter, the famous ChangeFSI is a program which has a front end written in Basic, with the image processing routines written in assembler, which are called as required. Assembler would also be used by the demo writers, showing off their latest wizardry in rendering 3-D surfaces and scrolling in all four directions at the same time, while playing multitrack music.

Machine code and the assembler

When a CPU is designed, the designer builds into the chip certain functionality that allows the CPU to carry out a number of simple operations. Associated with each of these operations is a coded instruction to tell the CPU to carry out that specific operation. Since the CPU is a digital device, and works with binary data, this coded instruction is simply a binary number.

A modern CPU may have a set of instructions comprising many hundreds or thousands of possible operations, and it is very unlikely that even a seasoned programmer could carry these numbers around in their head. To have to look each one up in a manual would be rather inefficient to say the least. Thus the CPU designer also produces a set of much more easily remembered symbols or mnemonics, to make the job of programming machine code very much easier.

This is where the assembler does its job. The function of the assembler is to translate the assembly language program into the machine code instructions that are actually executed by the CPU.

Assemblers, compilers and interpreters

In one sense, an assembler and a compiler are doing the same job, i.e. they both take instructions written in a language that means something to we humans, and translate such instructions into machine code instructions. However there are important differences. Assembler is written for a specific CPU. For example, the instruction

MOV R0, #1

would only be understood by an assembler for the ARM CPU. (Anticipating a little, this instruction simply places the value 1 in register R0).

The Basic instruction

PRINT "Hello world"

would be understood by any Basic compiler, whatever hardware it is running on. This means that, while code written in a high level compiled language such as C, Fortran, or Basic may need little or no conversion to run on different hardware, code written in assembler must always be completely rewritten to move on to a different type of CPU.

Secondly, an assembler instruction almost always translates into a single machine instruction, whereas a simple statement like the Basic instruction above, may end up as tens or even hundreds of machine instructions once the compiler has done its job. This has consequences for the amount of effort required to write a program in assembler. A thirty or forty line Basic program may take only a few minutes to write. If this program typically becomes several hundred machine instructions, then if we write the same program in assembler, the assembler is going to consist of several hundred lines of code as well, which obviously would take much longer to write.

So where do interpreters fit into the scheme of things? An assembler or a compiler will take, as input, a program file written in the appropriate language, translate this program into machine instructions, and produce a file of these instructions that can be executed at some later time. Basic, as supplied with all Acorn machines is implemented as an interpreted language (although there are also Basic compilers available for RISC OS).

In this case, there is a program running (the interpreter) which reads the Basic program and continuously translates it, on the fly, into the appropriate machine instructions, which are executed as they are translated. There is obviously a significant time overhead to carry out this translation, so programs executed under an interpreted language always run much more slowly than when they are precompiled.

In addition, whereas the compiler carries out the translation once only, to produce the final machine code program, the interpreted language has to carry out the translation every time the program is run. An advantage of using an interpreted language is that, with no compilation step, it is much faster to make changes, and immediately test the program.

The ARM processor

Since assembler is written for a specific processor, we should consider the ARM CPU as it appears to the assembler programmer. For the present purposes, I shall restrict myself to the first ARM chip, the ARM2. Since then, we have had the ARM3, ARM6, ARM7 and, very shortly, the StrongARM. All these chips are, in principle, backwards compatible with the original ARM2 chip, so programs written for an ARM2 should work on all the later chips.

In any computer system, the CPU must communicate with the rest of the system. A very simple schematic of such a system is shown below.

In electronic jargon, a group of 'wires' carrying related signals is known as a bus. The data bus is used to transfer data into and out of the CPU. The address bus is used to tell the other components, (memory, input/output devices, etc) where the particular item of data resides, or is to be put. Finally, there is also a control bus, which manages system operation, e.g. synchronises the flow of data, tells the other components whether the CPU is reading or writing data, and so on.

The ARM's data bus is 32 bits wide (i.e. has 32 wires), hence the ARM is said to be a 32-bit processor. The wider the data bus, the more information can be transferred in one operation. When the ARM2 was introduced, most competing CPUs were 8 or 16-bit devices, which was why it came out so well in performance comparisons. However, a number of current high performance CPUs now have a 64-bit data bus. The data carried on the data bus may be just that, data, but for much of the time, it is the actual machine instructions that are being transferred from memory to the CPU. The data bus is two way, in that data can flow into or out of the CPU.

The address bus in the ARM2 is 26 bits wide. The wider the address bus, the more memory the computer is capable of using in one block (it has always been possible to use banks of memory, and switch from one to another). Thus the ARM2 is able to address 64Mb (226) of memory as one block. When the first ARM based machines were introduced, 4Mb RAM was thought to be more than sufficient, so there was a large address range in hand. However, technology advances apace, and prices fall. 64Mb RAM can now be put into a RiscPC for less than £400, and the RiscPC can, theoretically, have up to 256Mb RAM fitted. The later ARM processors have the full 32-bit address lines to allow even more memory to be addressed. Only the CPU is allowed to place addresses on the address bus, so it may be thought of as a one way bus.

Inside the CPU

Wherever possible, we shall treat the CPU as a black box, but there are some general things we must be aware of before we can attempt to program in assembler. Inside all the ARM CPUs are certain sub-units, together with a lot of controlling logic which makes the entire CPU function as a coherent whole.

The ALU (arithmetic-logic unit) is a unit that takes two 32-bit numbers as input, and produces a 32-bit result. The instruction decode unit tells the ALU where to find the data, which operation to perform on the data, and where to put the result. The ALU carries out such operations as addition, subtraction, and comparison, and also logical operations such as AND and EOR.

The barrel shifter operates on data by moving the bits to the left or to the right, and takes two inputs, the 32-bit data to be shifted, and a value to specify by how many bits the data is to be shifted, and produces a 32-bit result. There are several options for how the barrel shifter does its stuff, such as which way the data is shifted, and whether the bits which fall out one end reappear at the other. One very important property of the barrel shifter is that it always takes only one clock cycle to do its work whatever the operation, and by however many bits the data is shifted. We shall see in a later part that being able to call upon the barrel shifter during the execution of almost any instruction provides a very powerful facility.

Registers

From the programmer's viewpoint, probably the most important part of the CPU is the register bank, since this is what is seen when programming in assembler. A register is a word of storage, like a memory location, and is 32 bits long on the current range of ARM processors. Since the register is on the CPU chip it can always be accessed at the full clock speed of the device, unlike the normal DRAM memory, which may be accessed much more slowly. (This is why the later versions of the ARM CPU, in common with other well known CPUs, have additional on-chip fast memory cache).

Almost all operations on the ARM chip involve the use of registers. Indeed, the only operations involving memory are load and store. There are 16 registers visible to the user at any one time. We shall refer to these registers as R0 to R15. Only one of these registers has a predetermined function. This is R15, which is used as the program counter and status register. The other registers are undedicated, general purpose registers, although R14 is used by the operating system to hold the return address when branching to a subroutine. By general purpose, we mean that if an instruction requires a register as an operand, then any register may be specified.

The program counter and status register

R15 is used as the program counter and status register. The arrangement of this register is shown below.

We have already established that the address bus is 26 bits wide. However, all ARM instructions are a single 32-bit word, or 4 bytes. Instructions are defined to always reside in memory on word boundaries. Thus the two lowest bits of an instruction address must always be zero, and there is no need to store them. Since only bits 2 - 25 are needed for the program counter, and the ARM always sets bits 0 and 1 of the address line to zero when fetching an instruction, bits 0 and 1 of the program counter can be used for something else. These bits are actually used as flags to indicate which mode the ARM processor is executing in. The possible modes are as follows:

S0 S1 Mode
0 0 User
0 1 FIQ (fast interrupt)
1 0 IRQ (interrupt)
1 1 SVC (supervisor)

We shall be dealing almost exclusively with user mode. The other modes are used mainly by the operating system or hardware add-ons. For example, supervisor mode is entered when the ARM is reset, and interrupts are generated during hardware input/output (e.g. keyboard, disc drives, serial port, parallel port).

Bits 26 and 27 are used to temporarily disable fast and normal interrupts, respectively. Interrupts are used when certain operations need to be attended to rapidly although, even here, there is a pecking order, with fast interrupts being more 'urgent' than normal interrupts.

To take an analogy, we may be sitting in our armchair reading a good book and listening to Pink Floyd, or whatever, when the phone rings. We first decide whether we wish to answer the phone - after all, it may only be a kitchen unit/double glazing salesperson! We decide we should answer, so suspend our current task (reading), and answer the phone. We have no sooner said hello to Auntie when the door bell rings. We ask Auntie to hang on while we answer the door. After allowing Johnnie from next door to tramp over the flower beds to get his football, we go back to continue our conversation with Auntie on the phone. We finally get back to our book when Auntie has finished.

In this scenario we have allowed a second interrupt to be serviced during the first interrupt. However, it may be that we wish to be especially nice to Auntie, because she might buy us a StrongARM RiscPC for Christmas. When we discover it is Auntie on the phone we set the appropriate flag (bit 26 or 27), so that the operating system keeps further interrupts on ice until we have finished on the phone. Interrupts are always written in assembler, since they have to be serviced as quickly as possible.

This leaves us with a further four bits, 28 - 31. These are used as flags to signal the outcome of an arithmetic operation as follows.

bit 31 N Negative result
bit 30 Z Zero result
bit 29 C Carry flag
bit 28 V Overflowed result

Any of the register-to-register data operations can, when necessary, be used to set these four status bits, and the execution of any instruction can be made conditional upon the state of one or more of these flags. By default, instructions execute regardless of the state of these flags.

We would make extensive use of these flag settings when constructing code doing the equivalent of, for example, FOR...NEXT or REPEAT...UNTIL loops, or IF...THEN...ELSE.

How the CPU executes a program

Most CPUs work in essentially the same way. They first have to fetch an instruction from the memory location pointed to by the program counter. Having fetched the instruction, the decoding circuits have to work out what the instruction is. Finally, the instruction is executed.

In early CPUs, these phases were carried out in sequence, and so the three 'units' were active for only one third of the time. The ARM speeds up this process by using a technique known as pipelining, in which the three steps are carried out concurrently. While an instruction is being executed, the next instruction is being decoded, and the next again instruction is being fetched. Looked at simplisticly, this gives a three-fold increase in CPU speed at a stroke.

Pipelining is not something unique to ARM processors, and many modern CPUs use such techniques, but it does have one obvious drawback. As long as instructions are being executed sequentially, things run smoothly. What if the current instruction causes the processor to have to go to a different part of memory for its next instruction? This would occur if we jump to a subroutine, or are executing a repeating loop, for example. The two instructions waiting in the pipeline are now no longer needed. The pipeline must therefore be flushed, and we have to start filling the pipeline again. Thus the program execution is stalled for a few clock cycles. The time penalty is usually small, unless we insist on writing 'spaghetti' code in the really time-critical part of a routine.

Books

During any learning process, it is helpful to look at more than one view of the topic. This is where we have a problem. The current books that I am aware of on this topic can be counted on the thumbs of one hand.

If you have access to the Programmer's Reference Manual for RISC OS 3, then there are about 40 pages on assembler in Volume 4. The section is fairly cryptic, and selective in coverage. The PRM is extremely useful, however, since in assembler, the operating system is accessed very often, in the form of SWI calls (similar to the command SYS in Basic), and it is essential to have this information at your fingertips.

Paul stocks The ARM RISC Chip. A Programmer's Guide by Alex van Someren and Carol Atack (Addison-Wesley). I have never seen this book but, from comments I have picked up on UseNet, etc, it covers the later ARM chips (ARM6 core in particular), with in-depth analysis of the chip. It would appear to be aimed rather more at embedded system developers, than at the amateur dabbler. There was a short review of this book in Archive 8.11 p42. The reviewer states that the book is not, nor is it meant to be, a beginner's guide.

There are a couple of old books you may pick up cheaply at a second hand sale. These are ARM Assembly Language Programming by Peter Cockerell, and Archimedes Assembly Language published by Dabs Press. Both these books are very old, predating RISC OS. Although they do cover the instruction set, and the ARM2 chip and so on, you will get no help with programming under RISC OS. I have had a copy of the former for many years, and have certainly found it useful in its coverage of the basics.

What next?

In the next part, we shall look at the Basic assembler, how to use it and how to invoke our machine code programs from Basic.

Feedback

In order for this series to be a success, I believe it is essential for there to be some real feedback. I should like readers to comment on areas such as whether I am going too slowly/too quickly, whether I am assuming too much/too little in prior knowledge, and so on. It would also be helpful to know where readers see a use for assembler, and what areas they would like expanded.

As part of this series, I should like to set up a hints and tips section, or maybe a 'code snippets' section, where useful subroutines or algorithms could be made available. This can only work successfully with active cooperation from readers. I am therefore asking the more experienced amongst us to have a look in all those libraries of code on their hard disc, and dig out anything that might be useful. If such a routine could also form the basis of a section on the use of particular types of instruction then so much the better. Due acknowledgement would always be made, of course.


Contents - The Archives - Archive Articles