Unlocking High-Performance Concurrency: Mastering Data Races, Mutex, Atomic, and More in Go

Data race

When executing non-atomic operations within the code, and multiple goroutines access the same data concurrently, there is a potential for a data race. This situation can lead to inconsistent read operations, compromising data integrity.

Here is an code example of data race:

package main

import (
    "fmt"
    "sync"
)

var counter int
var wg sync.WaitGroup

func incr() {
    for i := 0; i < 10000; i++ {
        counter++
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go incr()
    go incr()
    wg.Wait()
    fmt.Println("counter is: ", counter)
}

The final value of counter is unlikely to be 20,000 due to the fact that counter++ is not an atomic operation. Consequently, multiple goroutines attempting simultaneous read and write operations in the same memory space can result in an inaccurate counter value.

Mutex vs. Atomic

Mutex

A mutex is a synchronization primitive provided by the sync package to ensure that only one goroutine is accessing a critical section of code at a time, preventing race conditions. “Mutex” stands for “mutual exclusion.”

package main

import (
    "fmt"
    "sync"
)

var counter int
var wg sync.WaitGroup
var mu sync.Mutex

func incr() {
    for i := 0; i < 10000; i++ {
        mu.Lock()
        counter++
        mu.Unlock()
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go incr()
    go incr()
    wg.Wait()
    fmt.Println("counter is: ", counter)
}

Atomic

Alternatively, we can use atomic.Int32 provided by the atomic package which represents an int32 variable that can be safely accessed by multiple goroutines without using mutex locks, as long as all accesses are through the atomic package’s functions. atomic is a package that provides low-level atomic memory primitives useful for implementing synchronization algorithms.

package main

import (
    "fmt"
    "sync"
)

var counter atomic.Int32
var wg sync.WaitGroup

func incr() {
    for i := 0; i < 10000; i++ {
        counter.Add(1)
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go incr()
    go incr()
    wg.Wait()
    fmt.Println("counter is: ", counter.Load())
}

Conclusion and benchmarks: unsafe vs. mutex vs. atomic

When handling simple data types, the atomic package typically offers better performance compared to the sync.Mutex package. This advantage stems from the absence of locking overhead and the elimination of context switching.

However, for more complex synchronization requirements, particularly when a larger block of code needs protection, employing sync.Mutex is advisable. This approach ensures a more robust and secure implementation of synchronization algorithms.

These are the benchmark results comparing the performance of unsafe implementation with safe implementations using both sync.Mutex and atomic.

Benchmark-12
   47584             26811 ns/op               0 B/op          0 allocs/op  # unsafe
    4317            272698 ns/op              10 B/op          0 allocs/op  # mutex
    6824            220096 ns/op               8 B/op          0 allocs/op  # atomic

Concurrency in maps

The built-in map in Go is not thread-safe when accessed concurrently by multiple goroutines.

Here is an code example of runtime panic due to unsafe concurrent access to the map:

package main

import (
    "fmt"
    "sync"
)

var m = map[int]int{}
var wg sync.WaitGroup

func get(k int) (v int) {
    return m[k]
}

func set(k int) {
    m[k]++
}

func batchSet(k int) {
    for i := 0; i < 10000; i++ {
        set(k)
    }
}

func main() {
    wg.Add(2)
    go batchSet(1)
    go batchSet(1)
    wg.Wait()
    fmt.Println(m)
}

`sync.Mutex` and `sync.RWMutex` implementation

We can make the built-in map in Go thread-safe by integrating it with sync.Mutex. Implementing locks on both get and set operations ensures that only one goroutine has read and write access to the map at any given moment.

package main

import (
    "fmt"
    "sync"
)

var m = map[int]int{}
var wg sync.WaitGroup
var mu sync.Mutex

func get(k int) (v int) {
    mu.Lock()
    v = m[k]
    mu.Unlock()
    return v
}

func set(k int) {
    mu.Lock()
    m[k]++
    mu.Unlock()
}

func batchSet(k int) {
    for i := 0; i < 10000; i++ {
        set(k)
    }
    wg.Done()
}

func batchGet(k int) {
    for i := 0; i < 10000; i++ {
        get(k)
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go batchSet(0)
    go batchSet(0)
    wg.Wait()
    fmt.Println(get(0))
}

In read-heavy scenarios, utilizing sync.RWMutex can further optimize performance. This permits multiple goroutines to read concurrently, but when a write operation is initiated by one goroutine, it blocks all other goroutines from both reading and writing.

package main

import (
    "fmt"
    "sync"
)

var m = map[int]int{}
var wg sync.WaitGroup
var mu sync.RWMutex

func get(k int) (v int) {
    mu.RLock()
    v, _ = m[k]
    mu.RUnlock()
    return v
}

func set(k int) {
    mu.Lock()
    m[k]++
    mu.Unlock()
}

func batchSet(k int) {
    for i := 0; i < 10000; i++ {
        set(k)
    }
    wg.Done()
}

func batchGet(k int) {
    for i := 0; i < 10000; i++ {
        get(k)
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go batchSet(0)
    go batchSet(0)
    wg.Wait()
    fmt.Println(get(0))
}

Built-in `sync.Map`

In read-heavy scenarios, a more conventional approach is to employ the built-in sync.Map.

sync.Map uses read map and dirty map to separate read and write operation.

The read map is designed for lock-free read operations.
The dirty map handles write operations, utilizing a mutex to ensure thread safety.
When the number of misses in the read map exceeds a certain threshold, the dirty map is promoted to become the new read map.

This approach trades additional memory usage (space) for improved access times (time) and is generally not well-suited for environments with a write-heavy workload.

package main

import (
    "fmt"
    "sync"
    "sync/atomic"
)

var m sync.Map
var wg sync.WaitGroup

func get(k int) (v any) {
    v, ok := m.Load(k)
    if !ok {
        return 0
    }
    return v.(*atomic.Int32).Load()
}

func set(k int) {
    v, ok := m.Load(k)
    if !ok {
        v, _ = m.LoadOrStore(k, new(atomic.Int32))
    }
    v.(*atomic.Int32).Add(1)
}

func batchSet(k int) {
    for i := 0; i < 10000; i++ {
        set(k)
    }
    wg.Done()
}

func batchGet(k int) {
    for i := 0; i < 10000; i++ {
        get(k)
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go batchSet(0)
    go batchSet(0)
    wg.Wait()
    fmt.Println(get(0))
}

`atomic` and `unsafe` implementation

Apart from using sync.Mutex and sync.Map, implementing a thread-safe map at a lower level can be achieved with the atomic and unsafe packages.

In the implementation described:

The map is stored as an unsafe.Pointer.
atomic.LoadPointer is used to atomically read the map.
During write operations, the implementation checks for the existence of the key in the map. If the key is absent, it is lazily initialized, and the map is updated using a copy-on-write strategy. This involves creating a new copy of the entire map for the update.

This method reduces locking overhead compared to sync.Map, potentially offering superior performance in read-heavy workloads. However, it involves lower-level memory manipulation, which requires careful handling to prevent potential data races.

package main

import (
    "fmt"
    "sync"
    "sync/atomic"
    "unsafe"
)

var m unsafe.Pointer
var wg sync.WaitGroup
var mu sync.Mutex

func get(k int) (v any) {
    mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
    v, ok := mMap[k]
    if !ok {
        return 0
    }
    return v.(*atomic.Int32).Load()
}

func set(k int) {
    mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
    v, ok := mMap[k]
    if !ok {
        mu.Lock()
        // double-checked locking
        mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
        v, ok = mMap[k]
        if !ok {
            newMap := map[int]*atomic.Int32{}
            for key, value := range mMap {
                newMap[key] = value
            }
            newMap[k] = new(atomic.Int32)
            newMap[k].Add(1)
            atomic.StorePointer(&m, unsafe.Pointer(&newMap))
        } else {
            v.Add(1)
        }
        mu.Unlock()
    } else {
        v.Add(1)
    }
}

func batchSet(k int) {
    for i := 0; i < 10000; i++ {
        set(k)
    }
    wg.Done()
}

func batchGet(k int) {
    for i := 0; i < 10000; i++ {
        get(k)
    }
    wg.Done()
}

func main() {
    wg.Add(2)
    go batchSet(0)
    go batchSet(0)
    wg.Wait()
    fmt.Println(get(0))
}

func init() {
    mMap := map[int]*atomic.Int32{}
    atomic.StorePointer(&m, unsafe.Pointer(&mMap))
}

Conclusion and benchmarks: `sync.Mutex` vs. `sync.Map` vs. `atomic` and `unsafe`

sync.Mutex is a more coarser-grained implementation by locking the entire map both reading and writing. sync.RWMutex offers improved read performance but at the cost of reduced write performance.

sync.Map is a more conventional choice and outperforms sync.Mutex in read-heavy workloads by trading increased memory usage (space) for faster access times (time).

atomic and unsafe implementation has the best potential performance in read-heavy workload but has more complex design and harder to implement.

# mutex
BenchmarkGet-12
100000000               11.93 ns/op            0 B/op         0 allocs/op
BenchmarkSet-12
88270341                13.74 ns/op            0 B/op         0 allocs/op

# rwmutex
BenchmarkGet-12
100000000               11.25 ns/op            0 B/op         0 allocs/op
BenchmarkSet-12
54296922                22.98 ns/op            0 B/op         0 allocs/op

# syncmap
BenchmarkGet-12
199754134                5.704 ns/op           0 B/op         0 allocs/op
BenchmarkSet-12
87067563                12.99 ns/op            0 B/op         0 allocs/op

# atomic
BenchmarkGet-12
1000000000               1.071 ns/op           0 B/op          0 allocs/op
BenchmarkSet-12
190676976                6.525 ns/op           0 B/op          0 allocs/op

I hope you find this post helpful. If you would like to connect with me or show your support, you can follow me on X or buy me a coffee!

Data race#

Mutex vs. Atomic#

Mutex#

Atomic#

Conclusion and benchmarks: unsafe vs. mutex vs. atomic#

Concurrency in maps#

sync.Mutex and sync.RWMutex implementation#

Built-in sync.Map#

atomic and unsafe implementation#

Conclusion and benchmarks: sync.Mutex vs. sync.Map vs. atomic and unsafe#