Data race
When executing non-atomic operations within the code, and multiple goroutines access the same data concurrently, there is a potential for a data race. This situation can lead to inconsistent read operations, compromising data integrity.
Here is an code example of data race:
package main
import (
"fmt"
"sync"
)
var counter int
var wg sync.WaitGroup
func incr() {
for i := 0; i < 10000; i++ {
counter++
}
wg.Done()
}
func main() {
wg.Add(2)
go incr()
go incr()
wg.Wait()
fmt.Println("counter is: ", counter)
}
The final value of counter
is unlikely to be 20,000 due to the fact that counter++
is not an atomic operation. Consequently, multiple goroutines attempting simultaneous read and write operations in the same memory space can result in an inaccurate counter value.
Mutex vs. Atomic
Mutex
A mutex is a synchronization primitive provided by the sync
package to ensure that only one goroutine is accessing a critical section of code at a time, preventing race conditions. “Mutex” stands for “mutual exclusion.”
package main
import (
"fmt"
"sync"
)
var counter int
var wg sync.WaitGroup
var mu sync.Mutex
func incr() {
for i := 0; i < 10000; i++ {
mu.Lock()
counter++
mu.Unlock()
}
wg.Done()
}
func main() {
wg.Add(2)
go incr()
go incr()
wg.Wait()
fmt.Println("counter is: ", counter)
}
Atomic
Alternatively, we can use atomic.Int32
provided by the atomic
package which represents an int32
variable that can be safely accessed by multiple goroutines without using mutex locks, as long as all accesses are through the atomic
package’s functions. atomic
is a package that provides low-level atomic memory primitives useful for implementing synchronization algorithms.
package main
import (
"fmt"
"sync"
)
var counter atomic.Int32
var wg sync.WaitGroup
func incr() {
for i := 0; i < 10000; i++ {
counter.Add(1)
}
wg.Done()
}
func main() {
wg.Add(2)
go incr()
go incr()
wg.Wait()
fmt.Println("counter is: ", counter.Load())
}
Conclusion and benchmarks: unsafe vs. mutex vs. atomic
When handling simple data types, the atomic
package typically offers better performance compared to the sync.Mutex
package. This advantage stems from the absence of locking overhead and the elimination of context switching.
However, for more complex synchronization requirements, particularly when a larger block of code needs protection, employing sync.Mutex
is advisable. This approach ensures a more robust and secure implementation of synchronization algorithms.
These are the benchmark results comparing the performance of unsafe implementation with safe implementations using both sync.Mutex
and atomic
.
Benchmark-12
47584 26811 ns/op 0 B/op 0 allocs/op # unsafe
4317 272698 ns/op 10 B/op 0 allocs/op # mutex
6824 220096 ns/op 8 B/op 0 allocs/op # atomic
Concurrency in maps
The built-in map in Go is not thread-safe when accessed concurrently by multiple goroutines.
Here is an code example of runtime panic due to unsafe concurrent access to the map:
package main
import (
"fmt"
"sync"
)
var m = map[int]int{}
var wg sync.WaitGroup
func get(k int) (v int) {
return m[k]
}
func set(k int) {
m[k]++
}
func batchSet(k int) {
for i := 0; i < 10000; i++ {
set(k)
}
}
func main() {
wg.Add(2)
go batchSet(1)
go batchSet(1)
wg.Wait()
fmt.Println(m)
}
sync.Mutex
and sync.RWMutex
implementation
We can make the built-in map in Go thread-safe by integrating it with sync.Mutex
. Implementing locks on both get and set operations ensures that only one goroutine has read and write access to the map at any given moment.
package main
import (
"fmt"
"sync"
)
var m = map[int]int{}
var wg sync.WaitGroup
var mu sync.Mutex
func get(k int) (v int) {
mu.Lock()
v = m[k]
mu.Unlock()
return v
}
func set(k int) {
mu.Lock()
m[k]++
mu.Unlock()
}
func batchSet(k int) {
for i := 0; i < 10000; i++ {
set(k)
}
wg.Done()
}
func batchGet(k int) {
for i := 0; i < 10000; i++ {
get(k)
}
wg.Done()
}
func main() {
wg.Add(2)
go batchSet(0)
go batchSet(0)
wg.Wait()
fmt.Println(get(0))
}
In read-heavy scenarios, utilizing sync.RWMutex
can further optimize performance. This permits multiple goroutines to read concurrently, but when a write operation is initiated by one goroutine, it blocks all other goroutines from both reading and writing.
package main
import (
"fmt"
"sync"
)
var m = map[int]int{}
var wg sync.WaitGroup
var mu sync.RWMutex
func get(k int) (v int) {
mu.RLock()
v, _ = m[k]
mu.RUnlock()
return v
}
func set(k int) {
mu.Lock()
m[k]++
mu.Unlock()
}
func batchSet(k int) {
for i := 0; i < 10000; i++ {
set(k)
}
wg.Done()
}
func batchGet(k int) {
for i := 0; i < 10000; i++ {
get(k)
}
wg.Done()
}
func main() {
wg.Add(2)
go batchSet(0)
go batchSet(0)
wg.Wait()
fmt.Println(get(0))
}
Built-in sync.Map
In read-heavy scenarios, a more conventional approach is to employ the built-in sync.Map
.
sync.Map
uses read
map and dirty
map to separate read and write operation.
- The
read
map is designed for lock-free read operations. - The
dirty
map handles write operations, utilizing a mutex to ensure thread safety. - When the number of misses in the
read
map exceeds a certain threshold, thedirty
map is promoted to become the newread
map.
This approach trades additional memory usage (space) for improved access times (time) and is generally not well-suited for environments with a write-heavy workload.
package main
import (
"fmt"
"sync"
"sync/atomic"
)
var m sync.Map
var wg sync.WaitGroup
func get(k int) (v any) {
v, ok := m.Load(k)
if !ok {
return 0
}
return v.(*atomic.Int32).Load()
}
func set(k int) {
v, ok := m.Load(k)
if !ok {
v, _ = m.LoadOrStore(k, new(atomic.Int32))
}
v.(*atomic.Int32).Add(1)
}
func batchSet(k int) {
for i := 0; i < 10000; i++ {
set(k)
}
wg.Done()
}
func batchGet(k int) {
for i := 0; i < 10000; i++ {
get(k)
}
wg.Done()
}
func main() {
wg.Add(2)
go batchSet(0)
go batchSet(0)
wg.Wait()
fmt.Println(get(0))
}
atomic
and unsafe
implementation
Apart from using sync.Mutex
and sync.Map
, implementing a thread-safe map at a lower level can be achieved with the atomic
and unsafe
packages.
In the implementation described:
- The map is stored as an
unsafe.Pointer
. atomic.LoadPointer
is used to atomically read the map.- During write operations, the implementation checks for the existence of the key in the map. If the key is absent, it is lazily initialized, and the map is updated using a copy-on-write strategy. This involves creating a new copy of the entire map for the update.
This method reduces locking overhead compared to sync.Map
, potentially offering superior performance in read-heavy workloads. However, it involves lower-level memory manipulation, which requires careful handling to prevent potential data races.
package main
import (
"fmt"
"sync"
"sync/atomic"
"unsafe"
)
var m unsafe.Pointer
var wg sync.WaitGroup
var mu sync.Mutex
func get(k int) (v any) {
mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
v, ok := mMap[k]
if !ok {
return 0
}
return v.(*atomic.Int32).Load()
}
func set(k int) {
mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
v, ok := mMap[k]
if !ok {
mu.Lock()
// double-checked locking
mMap := *(*map[int]*atomic.Int32)(atomic.LoadPointer(&m))
v, ok = mMap[k]
if !ok {
newMap := map[int]*atomic.Int32{}
for key, value := range mMap {
newMap[key] = value
}
newMap[k] = new(atomic.Int32)
newMap[k].Add(1)
atomic.StorePointer(&m, unsafe.Pointer(&newMap))
} else {
v.Add(1)
}
mu.Unlock()
} else {
v.Add(1)
}
}
func batchSet(k int) {
for i := 0; i < 10000; i++ {
set(k)
}
wg.Done()
}
func batchGet(k int) {
for i := 0; i < 10000; i++ {
get(k)
}
wg.Done()
}
func main() {
wg.Add(2)
go batchSet(0)
go batchSet(0)
wg.Wait()
fmt.Println(get(0))
}
func init() {
mMap := map[int]*atomic.Int32{}
atomic.StorePointer(&m, unsafe.Pointer(&mMap))
}
Conclusion and benchmarks: sync.Mutex
vs. sync.Map
vs. atomic
and unsafe
sync.Mutex
is a more coarser-grained implementation by locking the entire map both reading and writing. sync.RWMutex
offers improved read performance but at the cost of reduced write performance.
sync.Map
is a more conventional choice and outperforms sync.Mutex in read-heavy workloads by trading increased memory usage (space) for faster access times (time).
atomic
and unsafe
implementation has the best potential performance in read-heavy workload but has more complex design and harder to implement.
# mutex
BenchmarkGet-12
100000000 11.93 ns/op 0 B/op 0 allocs/op
BenchmarkSet-12
88270341 13.74 ns/op 0 B/op 0 allocs/op
# rwmutex
BenchmarkGet-12
100000000 11.25 ns/op 0 B/op 0 allocs/op
BenchmarkSet-12
54296922 22.98 ns/op 0 B/op 0 allocs/op
# syncmap
BenchmarkGet-12
199754134 5.704 ns/op 0 B/op 0 allocs/op
BenchmarkSet-12
87067563 12.99 ns/op 0 B/op 0 allocs/op
# atomic
BenchmarkGet-12
1000000000 1.071 ns/op 0 B/op 0 allocs/op
BenchmarkSet-12
190676976 6.525 ns/op 0 B/op 0 allocs/op
I hope you find this post helpful. If you would like to connect with me or show your support, you can follow me on X or buy me a coffee!