Solving a Quora programming homework question in ARM Assembly

It makes me sad when students ask the online communities to ‘please write me a program that does the following’. Not only is this flat out dishonest to ask someone else to do your homework for you, the opportunity for learning is in the process of solving the problem and writing the code yourself. If someone else writes the code, you’ve missed that opportunity. There’s little to be learned if someone else does the hard work and gives you the final result.

Most communities police these types of questions pretty well. Reviewers on Stackoverflow for example are quick to respond to these types of questions to help developers restructure a generic request for help into a specific question about a specific problem that the developer need help with. The guides that are usually referred to on suggestions to restructure these questions are actually very good advice and reminders for us all on how to ask good questions:

On other community sites, the community response goes in a different direction. This question for help on how to write a program was responded to be some of the funniest and bizarre approaches to solve the askers problem in all sorts of obscure language from Brainfuck to Whitespace and plenty of other weirdness inbetween.

Not to be left out but a little later to the party, I realized I hadn’t done any ARM assembly for a while, so here’s my solution in ARM Assembly that I developed on my Raspberry Pi:

.global main

main:
MOV R4, #3 @ init outer line counter =3

_outerloop:
MOV R3, R4 @ init word loop counter with current value of outer counter

_wordloop:
MOV R7, #4 @ syscall 4: output to stdout
MOV R0, #1 @ stdout
MOV R2, #6 @ length of string
LDR R1, =output
SWI 0

SUB R3, R3, #1 @ decrement word loop counter
CMP R3, #0
BNE _wordloop

@print newline
MOV R7, #4 @ syscall 4: output to stdout
MOV R0, #1 @ stdout
MOV R2, #1 @ length of string
LDR R1, =eol
SWI 0

SUB R4, R4, #1 @ decrement outer counter
CMP R4, #0
BNE _outerloop

_exit:
MOV R1, #0
MOV R7, #1
SWI 0

.data
output:
.asciz "Smile!"
eol:
.asciz "\n"

Raspberry Pi Raspbian cross compiler toolchains on 64bit Linux

A while back I setup Eclipse C++ on Ubuntu to cross compile some ARM Assembly (see here). Last time I set up the Raspberry Pi tools on Ubuntu I was using a 32bit install. More recently I installed a 64bit version of Kubuntu, and so was retracing my steps to get set up again.

It might be obvious if you’re more familiar with gcc and cross compiler toolchains, but in the Raspberry Pi tools project there’s 32 bit and 64 bit versions of the tools. Trying to use the 32 bit versions on 64 bit Linux does not work. Rather than some useful error though, trying to execute any of the 32bit versions from a shell gives a rather un-useful ‘No such file or directory’ error.

Referring back to my original Eclipse C++ setup instructions, if you’re running Eclipse on 32bit Linux then you want to point to the tools here:

~/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin

Otherwise point to the 64bit version here:

~/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin

Closer look at the registers on an ARM cpu

In my dabbling with ARM assembly so far (my most recent achievement was completing my simple sorting algorithm – last update here), I had picked up that there’s 16 general purpose 32bit registers for holding either values or addresses, R0 through to R15, but paying a closer look I realized some of these have specific purposes, and or uses by convention.

R0: function argument or result

R1-R3: function args

R4-12: general purpose

R13: SP – this is the stack pointer

R14: LR – Link Register – it’s holds the address to branch back to when you call BR LR

R15: PC – the Program Counter – address pointing to the current instruction being processed

More info in the great summary on ARM Assembly here, and also in the ARM11 tech ref here.

Implementing simple sort algorithms in ARM Assembly (part 3)

I finished the first rough version of my simple sort algorithm in ARM Assembly (see part 1 and part 2 of my updates). Here it is so far (prior to some cleanup and optimization):

[code]
/*
R0 address of string used with printf ti output %d
R4 address of numbers to sort
R5 current number to be compared
R6 offset index for outer loop through numbers
R7 offset index for inner loop
R8 current smallest identified value
R9 current offset index of next uncompared value
*/
.global main
main:
push {ip, lr}
MOV R6, #0 @outerloop offset to numbers to be sorted
MOV R7, #0 @innerloop offers to number to be sorted
MOV R9, #0 @init value for index to next uncompared value
outerLoop:
MOV R8, #99 @reset large default for next loop comparison
MOV R7,R6 @copy outerloop offset to next starting offset for the innerloop
innerLoop:
LDR R0, =output @load addr of output string
LDR R4, =nums @load addr of nums to compare to R4
LDR R5,[R4,R7] @load current num to R5 from R4 with offset R7
MOV R1,R5 @move num for output
BL printf
CMP R5,R8 @is current < smallest so far
BLT swapSmallest @if true, swap smallest to current first position then continue
continue:
CMP R7,#16 @ 0 plus 4*4bytes for 5 entries in array
ADD R7, R7,#4 @inc offset by 4 bytes
BLT innerLoop
continueOuterLoop:
CMP R6, #16 @check if we’ve looped through all values
ADD R6, R6, #4
BLT outerLoop @if not, branch back to start of outer loop
_exit:
POP {ip, lr}
resetLoopOffsets:
MOV R7, #0 @reset loop counter
writeFinalSoredList: @TODO: this is a near copy of the innner loop – refactor this to function
LDR R0, =writeSorted @load addr of output string
LDR R4, =nums @load addr of nums
LDR R5,[R4,R7] @load current num to R5 from R4 with offset R7
MOV R1,R5 @move num for output
BL printf
CMP R7,#16 @ 0 plus 4*4bytes for 5 entries in array
ADD R7, R7,#4 @inc offset by 4 bytes
BLT writeFinalSoredList
doExit:
MOV R1, #0
MOV R7, #1
SWI 0
swapSmallest:
MOV R8,R5 @keep copy of smallest in the current loop
LDR R10, [R4,R6] @tmp copy first position to R10
LDR R11, [R4,R7] @tmp copy value in position currently being compared
STR R10, [R4, +R7] @swap first position value to current position being compared
STR R11, [R4, +R6] @swap the current smallest value into the current first position
BX lr @return
.data
nums:
.word 5,2,7,1,8
output:
.asciz "%d\n"
writeSorted:
.asciz "%d\n"
[/code]

Complete source if you want to grab a copy is in github here.

To get this far I learned plenty about ARM architecture – over time it has evolved and there are many different versions, and different ARM based CPUs implement different architecture versions. To make things more complicated, the naming scheme is a bit confusing.

The ARM CPU in the Raspberry Pi is a Broadcom BCM2835 System on a Chip (SoC), which includes an ARM1176JZF-S (ARM reference manual here). This is an ARM11 core, based on ARMv6 architecture.

Interest points about the ARMv6 instructions (not a comprehensive summary, but some rough notes to refer back to later):

  • The majority of instructions are structured ‘instruction destination, source’ but the STR (Store) for some reason is reversed so it is ‘instruction source, destination’
  • LDR (Load Register), can take a source as a label to a constant, or prefixed with ‘=’ which takes the address in memory where the constant is located.
  • LDR can move the value that is pointed to by an address in another register, using [Rn], and can also be coupled with an offset as a second argument, [Rn, Rm]

I’ll probably spend some time to see if I can clean up the code some more, but I’m happy with this so far.