7.1 KiB
+++ title = "Systems Architecture" +++
Systems Architecture
This section is a short review of System Architecture topics that you’ll need for System Programming.
Assembly
What is assembly? Assembly is the lowest that you’ll get to machine language without writing 1’s and 0’s. Each computer has an architecture, and that architecture has an associated assembly language. Each assembly command has a 1:1 mapping to a set of 1’s and 0’s that tell the computer exactly what to do. For example, the following in the widely used x86 Assembly language add one to the memory address 20 (Wikibooks #ref-wiki:xxx) – you can also look in (Guide #ref-guide2011intel) Section 2A under the add instruction though it is more verbose.
add BYTE PTR [0x20], 1
Why do we mention this? Because it is important that although you are going to be doing most of this class in C, that this is what the code is translated into. Serious implications arise for race conditions and atomic operations. Atomic Operations
An operation is atomic if no other processor should interrupt it. Take for example the above assembly code to add one to a register. In the architecture, it may actually have a few different steps on the circuit. The operation may start by fetching the value of the memory from the stick of ram, then storing it in the cache or a register, and then finally writing back (Schweizer, Besta, and Hoefler #ref-schweizer2015evaluating) – under the description for fetch-and-add though your micro-architecture may vary. Or depending on performance operations, it may keep that value in cache or in a register which is local to that process – try dumping the -O2 optimized assembly of incrementing a variable. The problem comes in if two processors try to do it at the same time. The two processors could at the same time copy the value of the memory address, add one, and store the same result back, resulting in the value only being incremented once. That is why we have a special set of instructions on modern systems called atomic operations. If an instruction is atomic, it makes sure that only one processor or thread performs any intermediate step at a time. With x86 this is done by the lock prefix (Guide #ref-guide2011intel, 1120).
lock add BYTE PTR [0x20], 1
Why don’t we do this for everything? It makes commands slower! If every time a computer does something it has to make sure that the other cores or processors aren’t doing anything, it’ll be much slower. Most of the time we differentiate these with special consideration. Meaning, we will tell you when we use something like this. Most of the time you can assume the instructions are unlocked. Caching
Ah yes, Caching. One of computer science’s greatest problems. Caching that we are referring is processor caching. If a particular address is already in the cache when reading or writing, the processor will perform the operation on the cache such as adding and update the actual memory later because updating memory is slow (Intel #ref-intel2015improving Section 3.4). If it isn’t, the processor requests a chunk of memory from the memory chip and stores it in the cache, kicking out the least recently used page – this depends on caching policy, but Intel’s does use this. This is done because the l3 processor cache is roughly three times faster to reach than the memory in terms of time (Levinthal #ref-levinthal2009performance, 22) though exact speeds will vary based on the clock speed and architecture. Naturally, this leads to problems because there are two different copies of the same value, in the cited paper this refers to an unshared line. This isn’t a class about caching, know how this could impact your code. A short but non-complete list could be
-
Race Conditions! If a value is stored in two different processor caches, then that value should be accessed by a single thread.
-
Speed. With a cache, your program may look faster mysteriously. Just assume that reads and writes that either happened recently or are next to each other in memory are fast.
-
Side effects. Every read or write effects the cache state. While most of the time this doesn’t help or hurt, it is important to know. Check the Intel programmer guide on the lock prefix for more information.
Interrupts
Interrupts are a important part of system programming. An interrupt is internally an electrical signal that is delivered to the processor when something happens – this is a hardware interrupt (“Chapter 3. Hardware Interrupts,” #ref-redhat_hardware_int). Then the hardware decides if this is something that it should handle (i.e. handling keyboard or mouse input for older keyboard and mouses) or it should pass to the operating system. The operating system then decides if this is something that it should handle (i.e. paging a memory table from disk) or something the application should handle (i.e. a SEGFAULT). If the operating system decides that this is something that the process or program should take care of, it sends a software fault and that software fault is then propagated. The application then decides if it is an error (SEGFAULT) or not (SIGPIPE for example) and reports to the user. Applications can also send signals to the kernel and to the hardware as well. This is an oversimplification because there are certain hardware faults that can’t be ignored or masked away, but this class isn’t about teaching you to build an operating system.
An important application of this is this is how system calls are served! There is a well-established set of registers that the arguments go in according to the kernel as well as a system call “number” again defined by the kernel. Then the operating system triggers an interrupt which the kernel catches and serves the system call (Garg #ref-garg_2006).
Operating system developers and instruction set developers alike didn’t like the overhead of causing an interrupt on a system call. Now, systems use SYSENTER and SYSEXIT which has a cleaner way of transferring control safely to the kernel and safely back. What safely means is obvious out of the scope for this class, but it persists.
Optional: Hyperthreading
Hyperthreading is a new technology and is in no way shape or form multithreading. Hyperthreading allows one physical core to appear as many virtual cores to the operating system (Guide #ref-guide2011intel, P.51). The operating system can then schedule processes on these virtual cores and one core will execute them. Each core interleaves processes or threads. While the core is waiting for one memory access to complete, it may perform a few instructions of another process thread. The overall result is more instructions executed in a shorter time. This potentially means that you can divide the number of cores you need to power smaller devices.
There be dragons here though. With hyperthreading, you must be wary of optimizations. A famous hyperthreading bug that caused programs to crash if at least two processes were scheduled on a physical core, using specific registers, in a tight loop. The actual problem is better explained through an architecture lens. But, the actual application was found through systems programmers working on OCaml’s mainline (Leroy #ref-leroy_2017).