Need help with HelloSilicon?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

374 Stars 33 Forks MIT License 226 Commits 2 Opened issues


An introduction to ARM64 assembly on Apple Silicon Macs

Services available


Need anything else?

Contributors list

# 98,287
199 commits
# 285,945
2 commits
# 102,039
1 commit
# 104,362
Amazon ...
1 commit


An introduction to assembly on Apple Silicon Macs.


In this repository, I will code along with the book Programming with 64-Bit ARM Assembly Language, adjusting all sample code for Apple's ARM64 line of computers. While Apple's marketing material seems to avoid a name for the platform and talks only about the M1 processor, the developer documentation uses the term "Apple Silicon". I will use this term in the following.

The original sourcecode can be found here.


While I pretty much assume that people who made it here meet most if not all required prerequisites, it doesn't hurt to list them.

  • You need Xcode 12.2 or later, and to make things easier, the command line tools should be installed. This ensures that the tools are found in default locations (namely

    ). If you are not sure that the tools are installed, check Preferences → Locations in Xcode or run
    xcode-select --install
  • All application samples also require macOS Big Sur, iOS 14 or their respective watchOS or tvOS equivalents. Especially for the later three systems it is not a necessity per-se (neither is Xcode 12.2), but it makes things a lot simpler.

  • Finally, while all samples can be adjusted to work on the iPhone and all other of Apple's ARM64 devices, for best results you should have access to an Apple Silicon Mac, formerly known as the MWMNSA, the Machine We Must Not Speak About.


I would like to thank @claui, @jannau, @jrosengarden, @m-schmidt, @saagarjha, and @zhuowei! They helped me when I hit a wall, or asked questions that let me improve the content.

Changes To The Book

With the exception of the existing iOS samples, the book is based on the Linux operating system. Apple's operating systems (macOS, iOS, watchOS and tvOS) are actually just flavors of the Darwin operating system, so they share a set of common core components.

Linux and Darwin, which were both inspired by AT&T Unix System V, are significantly different at the level we are looking at. For the listings in the book, this mostly concerns system calls (i.e. when we want the Kernel to do someting for us), and the way Darwin accesses memory.

This file is organized so that you can read the book, and read about the differences for Apple Silicon side by side. The headlines in this document follow those in the book.

Chapter 1

Computers and Numbers

The default on macOS has a "Programmer Mode", too. You enable it with View → Programmer (⌘3).

CPU Registers

Apple has made certain platform specific choices for the registers:

  • Apple reserves X18 for its own use. Do not use this register.
  • The frame pointer register (FP, X29) must always address a valid frame record.

About the GCC Assembler

The book uses Linux GNU tools, such as the GNU

assembler. While there is an
command on macOS, it will invoke the integrated LLVM Clang assembler by default. And even if there is the
option to use the GNU based assembler, this was only ever an option for x86_64 — and is already deprecated as of this writing.
% as -Q -arch arm64
/usr/bin/as: can't specifiy -Q with -arch arm64
Thus, the GNU assembler syntax is not an option, and the code will have to be adjusted for the Clang assembler syntax.

Likewise, while there is a

command on macOS, this simply calls the Clang C-compiler. For transparancy, all calls to
will be replaced with
% gcc --version
Configured with: --prefix=/Applications/ --with-gxx-include-dir=/Applications/
Apple clang version 12.0.0 (clang-1200.0.32.27)
Target: arm64-apple-darwin20.1.0
Thread model: posix
InstalledDir: /Applications/

Hello World, Listing 1-1

If you are reading this, I assume you already knew that the macOS Terminal can be found in Applications → Utilities → But if you didn't I feel honored to tell you and I wish you lots of fun on this journey! Don't be afraid to ask questions.

To make "Hello World" run on Apple Silicon, first the changes from page 78 (Chapter 3) have to be applied to account for the differences between Darwin and the Linux kernel. To silence the warning, I insert

.align 4
.p2align 2
), because Darwin likes things to be aligned on even boundaries. The books mentions this in Aligning Data in Chapter 5, page 114.

To make the linker work, a little more is needed, most of it should look familiar to Mac/iOS developers. These changes need to be applied to the

and to the
file. The complete call to the linker looks like this:
ld -o HelloWorld HelloWorld.o \
    -lSystem \
    -syslibroot `xcrun -sdk macosx --show-sdk-path` \
    -e _start \
    -arch arm64

We know the

switch, let's examine the others:
  • -lSystem
    tells the linker to link our executable with
    . We do that to add the
    load command to the executable. Generally, Darwin does not support statically linked executables. It is possible, if not especially elegant to create executables without using
    . I will go deeper into that topic when time permits. For people who read Mac OS X Internals I will just add that this replaced
    as of MacOS X 10.7.
  • -sysroot
    : In order to find
    , it is mandatory to tell our linker where to find it. It seems this was not necessary on macOS 10.15 because "New in macOS Big Sur 11 beta, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem.". We use
    xcrun -sdk macosx --show-sdk-path
    to dynamically use the currently active version of Xcode.
  • -e _start
    : Darwin expects an entrypoint
    . In order to keep the sample both as close as possible to the book, and to allow it's use within the C-Sample from Chapter 3, I opted to keep
    and tell the linker that this is the entry point we want to use
  • -arch arm64
    for good measure, let's throw in the option to cross-compile this from an Intel Mac. You can leave this off when running on Apple Silicon.

Chapter 2

The changes from Chapter 1 (makefile, alignment, system calls) have to be applied.

Register and Shift

The Clang assembler does not understand

MOV X1, X2, LSL #1
, instead
LSL X1, X2, #1
(etc) is used. After all, both are just aliasses for the instruction
ORR X1, XZR, X2, LSL #1

Register and Extension

Clang requires the source register to be 32-Bit. This makes sense because with these extensions, the upper 32 Bit of a 64-Bit register will never be touched:

ADD X2, X1, W0, SXTB
The GNU Assembler seems to ignore this and allows you to specifiy a 64-Bit source register.

Chapter 3

Beginning GDB

On macOS,

has been replaced with the LLDB Debugger
of the LLVM project. The syntax is not always the same as for gdb, so I will note the differences here.

To start debugging our movexamps program, enter the command

lldb movexamps

This yields the abbreviated output:

(lldb) target create "movexamps"
Current executable set to 'movexamps' (arm64).

Commands like

work just the same, and there is a nice GDB to LLDB command map.

To disassemble our program, a slightly different syntax is used for lldb:

disassemble --name start

Note that because we are linking a dynamic executable, the listing will be long and include other

functions. Our code will be listed under the line

Likewise, lldb wants the breakpoint name without the underscore:

b start

To get the registers on lldb, we use register read (or re r). Without arguments, this command will print all registers, or you can specify just the registers you would like to see, like

re r SP X0 X1

We can see all the breakpoints with breakpoint list (or br l). We can delete a breakpoint with breakpoint delete (or br de) specifying the breakpoint number to delete.

lldb has even more powerful mechanisms to display memory. The main command is memory read (or m read). For starters, these are the parameters used by the book:

memory read -fx -c4 -s4 $address

where * -f is the display format * -s size of the data * -c count

Listing 3-1

As an exercise, I have added code to find the default Xcode toolchain on macOS. In the book they are using this to later switch from a Linux to an Android toolchain. This process is much different for macOS and iOS: It does not usually involve a different toolchain, but instead a different Software Development Kit (SDK). You can see this in Listing 1-1 where

is set.

That said, while it is possible to build an iOS executable with the command line it is not a trivial process. So for building apps I will stick to Xcode.

Listing 3-7

As Chapter 10 focusses on building an app that will run on iOS, I have chosen to simply create a Command Line Tool here which is now using the same


Chapter 4

Besides the common changes, we face a new issue which is described in the book in Chapter 5: Darwin does not like

LSR X1, =symbol
, it will produce the error
ld: Absolute addressing not allowed in arm64 code
. If we use
ASR X1, symbol
, as suggested in Chapter 3 of the book, our data has to be in the read-only
section. In this sample however, we want writable data.

The Apple Documentation tells us that on Darwin:

All large or possibly nonlocal data is accessed indirectly through a global offset table (GOT) entry. The GOT entry is accessed directly using RIP-relative addressing.

And by default, on Darwin all data contained in the

section, where data is writeable, is "possibly nonlocal".

The full answer can be found here:


instruction loads the address of the 4KB page anywhere in the +/-4GB (33 bits) range of the current instruction (which takes 21 high bits of the offset). This is denoted by the
operator. then, we can either use
to read or write any address inside that page or
to to calculate the final address using the remaining 12 bits of the offset (denoted by

So this:

    LDR X1, =outstr // address of output string

becomes this:

    ADRP    X1, [email protected] // address of output string 4k page
    ADD X1, X1, [email protected] // offset to outstr within the page


I was asked how to read the command line, and I gladly answered the question.

Sample code can be found in Chapter 4 in the file


Chapter 5

The important differences in memory addressing for Darwin were already addresed above.

Listing 5-1


keywords must be in lowercase for the llvm assembler. (See bottom of this file)

Listing 5-10

Changes like in Chapter 4.

Chapter 6

As we learned in Chapter 5, all assembler directives (like

) must be in lowercase.

Chapter 7

does not exist in the Apple SDKs, instead
can be used.

It is also important to notice that while the calls and definitions look similar, Linux and Darwin are not the same:

is -100 on Linux, but must be -2 on Darwin.

Chapter 8

This chapter is specifically for the Raspberry Pi 4, so there is nothing to do here.

Chapter 9

For transparency reasons, I replaced


Listing 9-1

Apart from the usual changes, Apple diverges from the ARM64 standard ABI (i.e. the convention how functions are called) for variadic functions. Variadic functions are functions which take a variable number of arguments, and

is one of them. Where Linux will accept arguments passed in the registers we must pass them on the stack for Darwin.
str     X1, [SP, #-32]! // Move the stack pointer four doublewords (32 bytes) down and push X1 onto the stack
str     X2, [SP, #8]    // Push X2 to one doubleword above the current stack pointer
str     X3, [SP, #16]   // Push X3 to two doublewords above the current stack pointer
adrp    X0, [email protected] // printf format str
add     X0, X0, [email protected]  // add offset for format str
bl      _printf // call printf
add     SP, SP, #32 // Clean up stack

So first, we are growing the stack downwards 32 bytes to make room for three 64-Bit values. We are creating space for a fourth value for padding because, as pointed out on page 137 in the book, ARM hardware requires the stack pointer to always be 16-byte aligned.

In the same command, X1 is stored at the new location of the stack pointer.

Now, we fill the rest of the space that was just created by storing X2 in a location eight bytes above, and X3 16 bytes above the stack pointer. Note that the str commands for X2 and X3 do not move SP.

We could fill the stack in different ways; what is important that the

function expects the parameters as doubleword values in order, upwards from the current stackpointer. So in the case of the
file, it expects the parameter for the
to be at the location of SP, the parameter for
at one doubleword above this, and finally the parameter for
two doublewords, 16 bytes, above the current stack pointer.

What we have effectively done is allocating memory on the stack. As we, the caller, "own" that memory we need to release it after the function branch, in this case simply by shrinking the stack (upwards) by the 32 bytes we allocated. The instruction

add SP, SP, #32
will do that.

Listing 9-5

was prefixed with
as this is necessary for C on Darwin to find it.

Listing 9-6

No change was required.

Listing 9-7

Instead of a shared

ELF library, a dynamic Mach-O libary is created. Further information can be found here: Creating Dynamic Libraries

Listing 9-8

In inline-assembly, which we are using here, The

label must be declared as a local label by prefixing it with
. While this was not necessary in pure assembly, like in Chapter 5, the llvm C-Frontend will automatically add the directive
to the code:

Funny Darwin hack: This flag tells the linker that no global symbols contain code that falls through to other global symbols (e.g. the obvious implementation of multiple entry points). If this doesn't occur, the linker can safely perform dead code stripping. Since LLVM never generates code that does this, it is always safe to set. (From llvm source code)

While we are using the LLVM toolchain, in assembly — including inline-assembly — all safety checks are off so we must take extra precautions and specifically declare the forward label local.

Also, the size of one variable had to be changed from int to long to make the compiler complete happy and remove all warnings

Listing 9-9

While the
file only needed a minimal change, calling the code is a little more challenging: On Apple Silicon Macs, python is a Mach-O universal binary with two architectures: x86_64 and arm64e. Notably absent is the arm64 architecture we were building for up to this point. This makes our dylib unusable with python.

arm64e is the Armv-8 architecture, which Apple is using since the A12 chip. If you want to address devices prior to the A12, you must stick to arm64. The first Macs to use ARM64 run on the M1 CPU based on the A14 architecture, thus Apple decided to take advangage of the new features.

So, what to do? We could compile everything as arm64e, but that would make the library useless on devices like the iPhone X or older, and we would like to support them, too.

Above, you read something about a universal binary. For a very long time, the Mach-O executable format had support for several processor architectures in a single file. This includes, but is not limited to, Motorola 68k (on NeXT computers), PowerPC, Intel x86, as well ARM code, each with their 32 and 64 bit variantes where applicable. In this case, I am building a universal dynamic library which includes both arm64 and arm64e code. More information can be found here.

Chapter 10

No changes in the core code were required, but instead of just an iOS app I created a SwiftUI app that will work on macOS, iOS, watchOS (Series 4 and later), and tvOS.

Chapter 11

At this point, the changes should be self-explainatory. The usual makefile adjustments,

.align 4
, address mode changes, and

Chapter 12

Like in Chapter 11, all the chages have been introduced already. Nothing new here.

Chapter 13

Once again, the Clang assembler wants a slightly different syntax: Where gcc accepts

MUL V6.4H, V0.4H, V3.4H[0]

the Clang assembler expects

MUL.4H V6, V0, V3[0]

All other changes to the code should be trivial at this point.

Chapter 14

No unusal changes here.

Chapter 15

Copying a Page of Memory

Some place to start reading ARM64 code in the Darwin Kernel can be found in bcopy.s. There is a lot more in that directory and the repository in general.

Code Created by GCC

No changes were required. The "tiny" code model is not supported for Mach-O excecutables:

% clang -O3 -mcmodel=tiny -o upper upper.c
fatal error: error in backend: tiny code model is only supported on ELF

Chapter 16

All that can be said is that clang automatically enables position-independent executables, and the option

does not work. Therefore, the exploit shown in the
file can not be reproduced.

Additional references

One More Thing…

"The C language is case-sensitive. Compilers are case-sensitive. The Unix command line, ufs, and nfs file systems are case-sensitive. I'm case-sensitive too, especially about product names. The IDE is called Xcode. Big X, little c. Not XCode or xCode or X-Code. Remember that now." — Chris Espinosa

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.