-
Notifications
You must be signed in to change notification settings - Fork 469
UCS: Introduce lightweight rwlock #10355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
65f025c
to
89415df
Compare
src/ucs/type/rwlock.h
Outdated
ucs_cpu_relax(); | ||
} | ||
|
||
x = __atomic_fetch_add(&lock->l, UCS_RWLOCK_READ, __ATOMIC_ACQUIRE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can use the atomic operations defined in atomic.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the we replace deprecated __sync*
fuctions from atomic.h
with new __atomic
variants?
test/gtest/ucs/test_type.cc
Outdated
int m = std::thread::hardware_concurrency(); | ||
std::vector<int> threads = {1, 2, 4, m}; | ||
std::vector<int> writers_per_256 = {1, 25, 128, 250}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- i think we also want to measure overhead of read lock+unlock regardless of concurrency since it is the reason we added lightweight rwlock
- can you pls post example output in the PR description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- you mean write percent 0?
- didn't i do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes ( isee you added it)
- yes, pls update with percent 0
src/ucs/type/rwlock.h
Outdated
ucs_cpu_relax(); | ||
} | ||
|
||
x = __atomic_fetch_add(&lock->l, UCS_RWLOCK_READ, __ATOMIC_ACQUIRE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the we replace deprecated __sync*
fuctions from atomic.h
with new __atomic
variants?
test/gtest/ucs/test_type.cc
Outdated
|
||
UCS_TEST_F(test_rwlock, perf) { | ||
ucs_rwlock_t lock = UCS_RWLOCK_STATIC_INITIALIZER; | ||
measure( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good performance test, but it must change some state to guarantee the lock correctness. This is the whole purpose of litmus test. For instance, these functions may perform some simple math calculations and we check the invariant:
// Invariant: counter2 is 2 times bigger than counter1
int counter1 = 1;
int counter2 = 2;
measure(
[&]() {
ucs_rwlock_read_lock(&lock);
UCS_ASSERT_EQ(counter1 * 2, counter2);
ucs_rwlock_read_unlock(&lock);
},
[&]() {
ucs_rwlock_write_lock(&lock);
counter1 += counter1;
counter2 += counter2;
if (counter2 > 100000) {
counter1 = 1;
counter2 = 2;
}
ucs_rwlock_write_unlock(&lock);
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good idea. This test doesn't guarantee lock correctness, it will very easily give a false negative result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, testing correctness of MT algorithms is hard topic. You're right about false negatives results, that's ok. That's actually a nature of litmus tests: when they pass, it does not guarantee that your algorithm is 100% correct, but when they fail - it's obviously broken. Personally I catch tons of MT issues with the help of litmus tests.
If you propose some other way of testing correctness - let's discuss that. What I'm proposing is well established industry practise, and I'm sure it's better to have it than no testing at all.
Moreover, the example that I provided is just scratching the surface. We should also consider adding tests with nested locks, try-locks, etc
src/ucs/type/rwlock.h
Outdated
int x; | ||
|
||
while (1) { | ||
while (lock->l & UCS_RWLOCK_MASK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we read atomic without mem order guarantees, normally it should be something
__atomic_load_n(&lock->l, __ATOMIC_RELAXED)
Same in other read cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? AFAIU relaxed mem order means - no mem order guarantees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my point here is to use the uniform API to intercept loads/stores of this variable, so it does not need to be volatile. And then we can specify an appropriate mem order, whether it's relaxed or acquire.
Btw I also see performance improvement after replacing volatile with __atomic_load
s
src/ucs/type/rwlock.h
Outdated
|
||
static inline void ucs_rwlock_write_unlock(ucs_rwlock_t *lock) | ||
{ | ||
__atomic_fetch_sub(&lock->l, UCS_RWLOCK_WRITE, __ATOMIC_RELAXED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it is usually __ATOMIC_RELEASE
to be used on all release paths to contain and be sure that what happened under lock is visible, any reason for not doing so in this PR?
} | ||
}; | ||
|
||
UCS_TEST_F(test_rwlock, lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a way to run this/some tests with maximal optimizations (maybe inline compiler pragma, ..) ? I suspect we end-up running it without optimisations which might affect correctness-related tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this test and others with all optimizations?
src/ucs/type/rwlock.h
Outdated
|
||
|
||
/** | ||
* Read-write lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spinlock
src/ucs/type/rwlock.h
Outdated
#include <errno.h> | ||
|
||
/** | ||
* The ucs_rwlock_t type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: ucs_rw_spinlock_t
and for all apis?
src/ucs/type/rwlock.h
Outdated
} | ||
|
||
|
||
static inline void ucs_rwlock_read_unlock(ucs_rwlock_t *lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assertions/checks to detect underflow using returned value, only on debug builds (and also overflow on the other path)
src/ucs/type/rwlock.h
Outdated
|
||
static inline void ucs_rwlock_write_unlock(ucs_rwlock_t *lock) | ||
{ | ||
__atomic_fetch_sub(&lock->l, UCS_RWLOCK_WRITE, __ATOMIC_RELAXED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detect underflow on debug builds?
src/ucs/type/rwlock.h
Outdated
* Read-write lock. | ||
*/ | ||
typedef struct { | ||
volatile int l; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use unsigned int
to have defined behavior on overflow/underflow (and detect it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't expect overflow by design so signed int is more suitable i this case IMO
src/ucs/type/rwlock.h
Outdated
} | ||
|
||
|
||
static inline void ucs_rwlock_write_lock(ucs_rwlock_t *lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe annotate for coverity with something like /* coverity[lock] */
, to help it with sanity checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for lock paths
d41b883
to
2c3a0e8
Compare
static UCS_F_ALWAYS_INLINE void ucs_cpu_relax() | ||
{ | ||
#ifdef __SSE2__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: do something else if SSE2 not defined, like "":::"memory" ?
src/ucs/arch/atomic.h
Outdated
if (flags & UCS_ATOMIC_FENCE_LOCK) { | ||
return __ATOMIC_ACQUIRE; | ||
} | ||
|
||
if (flags & UCS_ATOMIC_FENCE_UNLOCK) { | ||
return __ATOMIC_RELEASE; | ||
} | ||
|
||
return __ATOMIC_RELAXED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are UCS_ATOMIC_xx defined as flags if there is no meaning to combine more than one of them?
also, how does it relate to UCS_DEFINE_ATOMIC_xx macros? should we remove/extend them?
src/ucs/arch/atomic.h
Outdated
static UCS_F_ALWAYS_INLINE int ucs_atomic_get(int *ptr, unsigned flags) | ||
{ | ||
return __atomic_load_n(ptr, ucs_atomic_memorder(flags)); | ||
} | ||
|
||
|
||
static UCS_F_ALWAYS_INLINE int | ||
ucs_atomic_fadd(int *ptr, int val, unsigned flags) | ||
{ | ||
return __atomic_fetch_add(ptr, val, ucs_atomic_memorder(flags)); | ||
} | ||
|
||
|
||
static UCS_F_ALWAYS_INLINE void | ||
ucs_atomic_sub(int *ptr, int val, unsigned flags) | ||
{ | ||
__atomic_fetch_sub(ptr, val, ucs_atomic_memorder(flags)); | ||
} | ||
|
||
|
||
static UCS_F_ALWAYS_INLINE void ucs_atomic_or(int *ptr, int val, unsigned flags) | ||
{ | ||
__atomic_fetch_or(ptr, val, ucs_atomic_memorder(flags)); | ||
} | ||
|
||
|
||
static UCS_F_ALWAYS_INLINE int | ||
ucs_atomic_cswap(int *ptr, int old_val, int new_val, unsigned flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we define it as macros to allow any type (not just int)?
src/ucs/arch/atomic.h
Outdated
flags & UCS_ATOMIC_WEAK, | ||
ucs_atomic_memorder(flags), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why in case of success we want weak mem order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using it in trylock so it may use faster instruction
src/ucs/type/rwlock.h
Outdated
*/ | ||
typedef struct { | ||
volatile int l; | ||
} ucs_rwlock_t; | ||
int state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe uint32_t ? since it's flags value and not numeric value
src/ucs/type/rwlock.h
Outdated
(__atomic_compare_exchange_n(&lock->l, &x, x + UCS_RWLOCK_WRITE, 1, | ||
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED))) { | ||
return 0; | ||
ucs_atomic_cswap(&lock->state, x, x + UCS_RWLOCK_WRITE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use
"!(x & UCS_RWLOCK_WRITE)" instead of "x < UCS_RWLOCK_WRITE"
"x | UCS_RWLOCK_WRITE" instead of "x + UCS_RWLOCK_WRITE"
since this is bit value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe keep wait bit included in the check too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe "!(x & ~(UCS_RWLOCK_WRITE-1))"? "!(x & UCS_RWLOCK_WRITE)" is not the same
src/ucs/type/rwlock.h
Outdated
|
||
static UCS_F_ALWAYS_INLINE void ucs_rw_spinlock_cleanup(ucs_rw_spinlock_t *lock) | ||
{ | ||
ucs_assert(lock->state == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertv
|
||
int write_taken = 0; | ||
bool write_taken = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use true/false for bool instead of 1/0
test/gtest/ucs/test_type.cc
Outdated
int m = std::thread::hardware_concurrency(); | ||
std::vector<int> threads = {1, 2, 4, m}; | ||
std::vector<int> writers_per_256 = {1, 25, 128, 250}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- yes ( isee you added it)
- yes, pls update with percent 0
src/ucs/arch/atomic.h
Outdated
@@ -138,4 +139,59 @@ UCS_DEFINE_ATOMIC_BOOL_CSWAP(16, w); | |||
UCS_DEFINE_ATOMIC_BOOL_CSWAP(32, l); | |||
UCS_DEFINE_ATOMIC_BOOL_CSWAP(64, q); | |||
|
|||
|
|||
#define UCS_ATOMIC_WEAK UCS_BIT(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to encapsulate these values, then maybe it should be an enum?
And also we miss few other values:
__ATOMIC_CONSUME - probably we don't need this
Data dependency only for both barrier and synchronization with another thread.
__ATOMIC_ACQ_REL - I'm not sure about this one
Full barrier in both directions and synchronizes with acquire loads and release stores in another thread.
__ATOMIC_SEQ_CST - the strongest mode, we definitely need it
Full barrier in both directions and synchronizes with acquire loads and release stores in all threads.
src/ucs/arch/atomic.h
Outdated
#define UCS_ATOMIC_FENCE_UNLOCK UCS_BIT(2) | ||
|
||
|
||
static UCS_F_ALWAYS_INLINE int ucs_atomic_memorder(unsigned flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can simply use the actual flag (__ATOMIC_ACQUIRE
etc) directly
src/ucs/type/rwlock.h
Outdated
static UCS_F_ALWAYS_INLINE void | ||
ucs_rw_spinlock_read_unlock(ucs_rw_spinlock_t *lock) | ||
{ | ||
ucs_assert(lock->state >= UCS_RWLOCK_READ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertv with details?
src/ucs/type/rwlock.h
Outdated
@@ -0,0 +1,137 @@ | |||
/* | |||
* Copyright (c) NVIDIA CORPORATION & AFFILIATES, 2024. ALL RIGHTS RESERVED. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyrights would need 2025
src/ucs/type/rwlock.h
Outdated
(__atomic_compare_exchange_n(&lock->l, &x, x + UCS_RWLOCK_WRITE, 1, | ||
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED))) { | ||
return 0; | ||
ucs_atomic_cswap(&lock->state, x, x + UCS_RWLOCK_WRITE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe keep wait bit included in the check too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, is the test code run with optim too?
} | ||
}; | ||
|
||
UCS_TEST_F(test_rwlock, lock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run this test and others with all optimizations?
@Artemy-Mellanox please squash |
93c059d
to
8bbe776
Compare
Uh oh!
There was an error while loading. Please reload this page.