add mutex

2 years ago · bdd33d945e
parent 3d7ca77ccb
commit bdd33d945e
3 changed files with 460 additions and 0 deletions
--- a/content/en/chap7/Mutex.md
+++ b/content/en/chap7/Mutex.md
@ -0,0 +1,342 @@
 +++
 title = "Mutex"
 +++
 # Mutex
 Synchronization coordinates various tasks so that they all finishin the the correct state. In C, we have series of mechanisms to control what threads are allowed to perform at a given state. Most of the time, the threads can progress without having to communicate, but every so often two or more threads may want to access a critical section. A critical section is a section of code that can only be executed by one thread at a time if the program is to function correctly. If two threads (or processes) were to execute code inside the critical section at the same time, it is possible that the program may no longer have the correct behavior.
 As we said in the previous chapter, race conditions happen when an operation touches a piece of memory at the same time as another thread. If the memory location is only accessible by one thread, for example the automatic variable i below, then there is no possibility of a race condition and no Critical Section associated with i. However, the sum variable is a global variable and accessed by two threads. It is possible that two threads may attempt to increment the variable at the same time.
 ```c
 #include <stdio.h>
 #include <pthread.h>
 int sum = 0; //shared
 void *countgold(void *param) {
  int i; //local to each thread
  for (i = 0; i < 10000000; i++) {
    sum += 1;
  }
  return NULL;
 }
 int main() {
  pthread_t tid1, tid2;
  pthread_create(&tid1, NULL, countgold, NULL);
  pthread_create(&tid2, NULL, countgold, NULL);
  //Wait for both threads to finish:
  pthread_join(tid1, NULL);
  pthread_join(tid2, NULL);
  printf("ARRRRG sum is %d\n", sum);
  return 0;
 }
 ```
 A typical output of the above code is `ARGGGH sum is <some number less than expected>` because there is a race condition. The code allows two threads to read and write sum at the same time. For example, both threads copy the current value of sum into CPU that runs each thread (let’s pick 123). Both threads increment one to their own copy. Both threads write back the value (124). If the threads had accessed the sum at different times then the count would have been 125. A few of the possible different orderings are below.
 Permissible Pattern:
 Thread 1  |   Thread 2
 --- | ---
 Load Addr, Add 1 (i=1 locally)  |   …
 Store (i=1 globally)  |   …
 …  |   Load Addr, Add 1 (i=2 locally)
 …  |   Store (i=2 globally)
 Good Thread Access Pattern
 Partial Overlap:
 Thread 1    |    Thread 2
 --- | ---
 Load Addr, Add 1 (i=1 locally)    |    …
 Store (i=1 globally)    |    Load Addr, Add 1 (i=1 locally)
 …    |    Store (i=1 globally)
 Bad Thread Access Pattern
 Full Overlap
 Thread 1 | Thread 2
 --- | ---
 Load Addr, Add 1 (i=1 locally) | Load Addr, Add 1 (i=1 locally)
 Store (i=1 globally) | Store (i=1 globally)
 Horrible Thread Access Pattern
 We would like the first pattern of the code being mutually exclusive. Which leads us to our first synchronization primitive, a Mutex.
 ## Mutex
 To ensure that only one thread at a time can access a global variable, use a mutex – short for Mutual Exclusion. If one thread is currently inside a critical section we would like another thread to wait until the first thread is complete. A mutex isn’t a primitive in the truest sense, though it is one of the smallest that has useful threading API. A mutex also isn’t a data structure. It is an abstract data type.
 Let’s think about a duck satisfying the mutex api. If someone has the duck then they are allowed to access a shared resource! We call it the mutex duck. Everyone else has to waddle around and wait. Once someone let’s go of the duck, they have to stop interacting with the resource and the next grabber can interact with the shared resource. Now you know the origins of the duck.
 There are many ways to implement a mutex, and we’ll give a few in this chapter. For right now let’s use the black box that the pthread library gives us. Here is how we declare a mutex.
 ```c
 pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER; // global variable
 pthread_mutex_lock(&m); // start of Critical Section
 // Critical section
 pthread_mutex_unlock(&m); //end of Critical Section
 ```
 ## Mutex Lifetime
 There are a few ways of initializing a mutex. A program can use the macro  `PTHREAD_MUTEX_INITIALIZER` only for global (‘static’) variables. `m = PTHREAD_MUTEX_INITIALIZER` is functionally equivalent to the more general purpose  `pthread_mutex_init(&m,NULL)`. The init version includes options to trade performance for additional error-checking and advanced sharing options. The init version also makes sure that the mutex is correctly initialized after the call, global mutexes are initialized on the first lock. A program can also call the init function inside of a program for a mutex located on the heap.
 ```c
 pthread_mutex_t *lock = malloc(sizeof(pthread_mutex_t));
 pthread_mutex_init(lock, NULL);
 //later
 pthread_mutex_destroy(lock);
 free(lock);
 ```
 Once we are finished with the mutex we should also call `pthread_mutex_destroy(&m)` too. Note, a program can only destroy an unlocked mutex, destroy on a locked mutex is undefined behavior. Things to keep in mind about `init` and `destroy` A program doesn’t need to destroy a mutex created with the global initializer.
 1. Multiple threads init/destroy has undefined behavior
 2. Destroying a locked mutex has undefined behavior
 3. Keep to the pattern of one and only one thread initializing a mutex.
 4. Copying the bytes of the mutex to a new memory location and then using the copy is not supported. To reference a mutex, a program must to have a pointer to that memory address.
 ## Mutex Usages
 How does one use a mutex? Here is a complete example in the spirit of the earlier piece of code.
 ```c
 #include <stdio.h>
 #include <pthread.h>
 // Create a mutex this ready to be locked!
 pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
 int sum = 0;
 void *countgold(void *param) {
  int i;
  //Same thread that locks the mutex must unlock it
  //Critical section is 'sum += 1'
  //However locking and unlocking ten million times
  //has significant overhead
  pthread_mutex_lock(&m);
  // Other threads that call lock will have to wait until we call unlock
  for (i = 0; i < 10000000; i++) {
    sum += 1;
  }
  pthread_mutex_unlock(&m);
  return NULL;
 }
 int main() {
  pthread_t tid1, tid2;
  pthread_create(&tid1, NULL, countgold, NULL);
  pthread_create(&tid2, NULL, countgold, NULL);
  pthread_join(tid1, NULL);
  pthread_join(tid2, NULL);
  printf("ARRRRG sum is %d\n", sum);
  return 0;
 }
 ```
 In the code above, the thread gets the lock to the counting house before entering. The critical section is only the `sum+=1` so the following version is also correct.
 ```c
 for (i = 0; i < 10000000; i++) {
  pthread_mutex_lock(&m);
  sum += 1;
  pthread_mutex_unlock(&m);
 }
 return NULL;
 }
 ```
 This process runs slower because we lock and unlock the mutex a million times, which is expensive - at least compared with incrementing a variable. In this simple example, we didn’t need threads - we could have added up twice! A faster multi-thread example would be to add one million using an automatic (local) variable and only then adding it to a shared total after the calculation loop has finished:
 ```c
 int local = 0;
 for (i = 0; i < 10000000; i++) {
  local += 1;
 }
 pthread_mutex_lock(&m);
 sum += local;
 pthread_mutex_unlock(&m);
 return NULL;
 }
 ```
 If you know the Gaussian sum, you can avoid race conditions altogether, but this is for illustration.
 Starting with the gotchas. Firstly, C Mutexes do not lock variables. A mutex is a simple data structure. It works with code, not data. If a mutex is locked, the other threads will continue. It’s only when a thread attempts to lock a mutex that is already locked, will the thread have to wait. As soon as the original thread unlocks the mutex, the second (waiting) thread will acquire the lock and be able to continue. The following code creates a mutex that does effectively nothing.
 ```c
 int a;
 pthread_mutex_t m1 = PTHREAD_MUTEX_INITIALIZER,
 m2 = = PTHREAD_MUTEX_INITIALIZER;
 // later
 // Thread 1
 pthread_mutex_lock(&m1);
 a++;
 pthread_mutex_unlock(&m1);
 // Thread 2
 pthread_mutex_lock(&m2);
 a++;
 pthread_mutex_unlock(&m2);
 ```
 Here are some other gotchas in no particular order
 1. Don’t cross the streams! If using threads, don’t fork in the middle of your program. This means any time after your mutexes have been initialized.
 2. The thread that locks a mutex is the only thread that can unlock it.
 3. Each program can have multiple mutex locks. A thread safe design might include a lock with each data structure, one lock per heap, or one lock per set of data structures If a program has only one lock, then there may be significant contention for the lock. If two threads were updating two different counters, it isn’t necessary to use the same lock.
 4. Locks are only tools. They don’t spot critical sections!
 5. There will always be a small amount of overhead of calling `pthread_mutex_lock` and `pthread_mutex_unlock`. However, this is the price to pay for correctly functioning programs!
 6. Not unlocking a mutex due to an early return during an error condition
 7. Resource leak (not calling `pthread_mutex_destroy`)
 8. Using an uninitialized mutex or using a mutex that has already been destroyed
 9. Locking a mutex twice on a thread without unlocking first
 10. Deadlock
 ## Mutex Implementation
 So we have this cool data structure. How do we implement it? A naive, incorrect implementation is shown below. The `unlock` function simply unlocks the mutex and returns. The lock function first checks to see if the lock is already locked. If it is currently locked, it will keep checking again until another thread has unlocked the mutex. For the time being, we’ll avoid the condition that other threads are able to unlock a lock they don’t own and focus on the mutual exclusion aspect.
 ```c
 // Version 1 (Incorrect!)
 void lock(mutex_t *m) {
  while(m->locked) { /*Locked? Never-mind - loop and check again!*/ }
  m->locked = 1;
 }
 void unlock(mutex_t *m) {
  m->locked = 0;
 }
 ```
 Version 1 uses ‘busy-waiting’ unnecessarily wasting CPU resources. However, there is a more serious problem. We have a race-condition! If two threads both called `lock` concurrently, it is possible that both threads would read `m_locked` as zero. Thus both threads would believe they have exclusive access to the lock and both threads will continue.
 We might attempt to reduce the CPU overhead a little by calling `pthread_yield()` inside the loop - pthread_yield suggests to the operating system that the thread does not use the CPU for a short while, so the CPU may be assigned to threads that are waiting to run. This still leaves the race-condition. We need a better implementation. We will talk about this later in the critical section part of this chapter. For now, we will talk about semaphores.
 ### Advanced: Implementing a Mutex with hardware
 We can use C11 Atomics to do that perfectly! A complete solution is detailed here. This is a spinlock mutex, https://locklessinc.com/articles/mutex_cv_futex/ implementations can be found online.
 First the data structure and initialization code.
 ```c
 typedef struct mutex_{
  // We need some variable to see if the lock is locked
  atomic_int_least8_t lock;
  // A mutex needs to keep track of its owner so
  // Another thread can't unlock it
  pthread_t owner;
 } mutex;
 #define UNLOCKED 0
 #define LOCKED 1
 #define UNASSIGNED_OWNER 0
 int mutex_init(mutex* mtx){
  // Some simple error checking
  if(!mtx){
    return 0;
  }
  // Not thread-safe the user has to take care of this
  atomic_init(&mtx->lock, UNLOCKED);
  mtx->owner = UNASSIGNED_OWNER;
  return 1;
 }
 ```
 This is the initialization code, nothing fancy here. We set the state of the mutex to unlocked and set the owner to locked.
 ```c
 int mutex_lock(mutex* mtx){
  int_least8_t zero = UNLOCKED;
  while(!atomic_compare_exchange_weak_explicit
  (&mtx->lock,
  &zero,
  LOCKED,
  memory_order_seq_cst,
  memory_order_seq_cst)){
    zero = UNLOCKED;
    sched_yield(); // Use system calls for scheduling speed
  }
  // We have the lock now
  mtx->owner = pthread_self();
  return 1;
 }
 ```
 What does this code do? It initializes a variable that we will keep as the unlocked state. https://en.wikipedia.org/wiki/Compare-and-swap is an instruction supported by most modern architectures (on x86 it’s `lock cmpxchg`). The pseudocode for this operation looks like this.
 ```c
 int atomic_compare_exchange_pseudo(int* addr1, int* addr2, int val){
  if(*addr1 == *addr2){
    *addr1 = val;
    return 1;
  }else{
    *addr2 = *addr1;
    return 0;
  }
 }
 ```
 Except it is all done atomically meaning in one uninterruptible operation. What does the weak part mean? Atomic instructions are prone to spurious failures meaning that there are two versions to these atomic functions a strong and a weak part, strong guarantees the success or failure while weak may fail even when the operation succeeds. These are the same spurious failures that you’ll see in condition variables below. We are using weak because weak is faster, and we are in a loop! That means we are okay if it fails a little bit more often because we will keep spinning around anyway.
 Inside the while loop, we have failed to grab the lock! We reset zero to unlocked and sleep for a little while. When we wake up we try to grab the lock again. Once we successfully swap, we are in the critical section! We set the mutex’s owner to the current thread for the unlock method and return successfully.
 How does this guarantee mutual exclusion? When working with atomics we are unsure! But in this simple example, we can because the thread that can successfully expect the lock to be UNLOCKED (0) and swap it to a LOCKED (1) state is considered the winner. How do we implement unlock?
 ```c
 int mutex_unlock(mutex* mtx){
  if(unlikely(pthread_self() != mtx->owner)){
    return 0; // Can't unlock a mutex if the thread isn't the owner
  }
  int_least8_t one = 1;
  //Critical section ends after this atomic
  mtx->owner = UNASSIGNED_OWNER;
  if(!atomic_compare_exchange_strong_explicit(
  &mtx->lock,
  &one,
  UNLOCKED,
  memory_order_seq_cst,
  memory_order_seq_cst)){
    //The mutex was never locked in the first place
    return 0;
  }
  return 1;
 }
 ```
 To satisfy the API, a thread can’t unlock the mutex unless the thread is the one who owns it. Then we unassign the mutex owner, because critical section is over after the atomic. We want a strong exchange because we don’t want to block. We expect the mutex to be locked, and we swap it to unlock. If the swap was successful, we unlocked the mutex. If the swap wasn’t, that means that the mutex was UNLOCKED and we tried to switch it from UNLOCKED to UNLOCKED, preserving the behavior of unlock.
 What is this memory order business? We were talking about memory fences earlier, here it is! We won’t go into detail because it is outside the scope of this course but in the scope of https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync. We need consistency to make sure no loads or stores are ordered before or after. A program need to create dependency chains for more efficient ordering.
--- a/content/en/chap7/Semaphore.md
+++ b/content/en/chap7/Semaphore.md
@ -0,0 +1,116 @@
 +++
 title = "Semaphore"
 +++
 # Semaphore
 A semaphore is another synchronization primitive. It is initialized to some value. Threads can either `sem_wait` or `sem_post` which lowers or increases the value. If the value reaches zero and a wait is called, the thread will be blocked until a post is called.
 Using a semaphore is as easy as using a mutex. First, decide if on the initial value, for example the number of remaining spaces in an array. Unlike pthread mutex there are no shortcuts to creating a semaphore - use `sem_init`.
 ```c
 #include <semaphore.h>
 sem_t s;
 int main() {
  sem_init(&s, 0, 10); // returns -1 (=FAILED) on OS X
  sem_wait(&s); // Could do this 10 times without blocking
  sem_post(&s); // Announce that we've finished (and one more resource item is available; increment count)
  sem_destroy(&s); // release resources of the semaphore
 }
 ```
 When using a semaphore, wait and post can be called from different threads! Unlike a mutex, the increment and decrement can be from different threads.
 This becomes especially useful if you want to use a semaphore to implement a mutex. A mutex is a semaphore that always waits before it posts. Some textbooks will refer to a mutex as a binary semaphore. You do have to be careful to never add more than one to a semaphore or otherwise your mutex abstraction breaks. That is usually why a mutex is used to implement a semaphore and vice versa.
 - Initialize the semaphore with a count of one.
 - Replace `pthread_mutex_lock` with `sem_wait`
 - Replace `pthread_mutex_unlock` with `sem_post`
 ```c
 sem_t s;
 sem_init(&s, 0, 1);
 sem_wait(&s);
 // Critical Section
 sem_post(&s);
 ```
 But be warned, it isn’t the same! A mutex can handle what we call lock inversion well. Meaning the following code breaks with a traditional mutex, but produces a race condition with threads.
 ```c
 // Thread 1
 sem_wait(&s);
 // Critical Section
 sem_post(&s);
 // Thread 2
 // Some threads want to see the world burn
 sem_post(&s);
 // Thread 3
 sem_wait(&s);
 // Not thread-safe!
 sem_post(&s);
 ```
 If we replace it with mutex lock, it won’t work now.
 ```c
 // Thread 1
 mutex_lock(&s);
 // Critical Section
 mutex_unlock(&s);
 // Thread 2
 // Foiled!
 mutex_unlock(&s);
 // Thread 3
 mutex_lock(&s);
 // Now it's thread-safe
 mutex_unlock(&s);
 ```
 Also, binary semaphores are different than mutexes because one thread can unlock a mutex from a different thread.
 ### Signal Safety
 Also, `sem_post` is one of a handful of functions that can be correctly used inside a signal handler `pthread_mutex_unlock` is not. We can release a waiting thread that can now make all of the calls that we disallowed to call inside the signal handler itself e.g. `printf`. Here is some code that utilizes this;
 ```c
 #include <stdio.h>
 #include <pthread.h>
 #include <signal.h>
 #include <semaphore.h>
 #include <unistd.h>
 sem_t s;
 void handler(int signal) {
  sem_post(&s); /* Release the Kraken! */
 }
 void *singsong(void *param) {
  sem_wait(&s);
  printf("Waiting until a signal releases...\n");
 }
 int main() {
  int ok = sem_init(&s, 0, 0 /* Initial value of zero*/);
  if (ok == -1) {
    perror("Could not create unnamed semaphore");
    return 1;
  }
  signal(SIGINT, handler); // Too simple! See Signals chapter
  pthread_t tid;
  pthread_create(&tid, NULL, singsong, NULL);
  pthread_exit(NULL); /* Process will exit when there are no more threads */
 }
 ```
 Other uses for semaphores are keeping track of empty spaces in arrays. We will discuss these in the thread-safe data structures section.
--- a/content/en/menu/index.md
+++ b/content/en/menu/index.md
@ -22,5 +22,7 @@ headless: true
 - **[6. Threads]({{< relref "/chap6" >}})**
 - **[7. Synchronization]({{< relref "/chap7" >}})**
    - [Mutex]({{< relref "/chap7/Mutex" >}})
    - [Semaphore]({{< relref "/chap7/Semaphore" >}})
    - [Condition Variables]({{< relref "/chap7/Condition_Variables" >}})