It seems that several important processor designs have been done without consideration of the poor compiler, in charge of exploiting the capabilities of the processor to reach its peak performance. I’m thinking, for example, of the IA-64 Intel processor. This paper addresses this problem, discussing both hardware processor design and compiling techniques. Here, the processor is the ARM family. This is one of the most ubiquitous microprocessors, used in embedded products like the iPod, the Playstation, mobile phones, camcorders, pocket personal computers (PCs), and so on.
If one remembers that “more than 98 percent of all microprocessors are used in embedded products,” obviously it is extremely important to improve their performance as much as possible. The constraints, in comparison to processors used in computers, are mostly related to energy and memory savings. However, these savings should not be attained at the expense of speed.
The ARM family uses a 32-bit instruction set, but in order to save memory and energy, it also uses a 16-bit instruction set, properly named “Thumb.” As the authors demonstrate, using Thumb code results in a code size reduction of about 30 percent, but also in a three-fold increase in the number of instructions to execute. Thus, the code is slower, and the energy savings is much lower than expected.
In order to correct this, the authors have designed an enhancement to the Thumb instruction set called augmenting extensions (AX). These instructions are handled in the decode stage of the processor, and thus they don’t use a cycle in the pipeline. Every one is coalesced with the following Thumb instruction, yielding an ARM instruction. This has the advantage of reducing the number of Thumb instructions to be generated by the compiler and executed by the processor (an ARM instruction does more work than a Thumb one). Thus, there are gains in speed, energy savings, and memory usage.
The bulk of the paper is devoted to explaining needed modifications to the hardware, as well as to the compiling techniques needed to generate the code. For example, in some cases, the instructions in the two branches of an if-then-else construct must be generated by pairs, one for the true part and one for the false part, which is uncommon.
Despite a few typographical errors, the paper is well written and pleasant to read. The presented results are convincing. Whether the ideas will actually be used remains to be seen.