diff options
Diffstat (limited to 'SOURCES/futex2.patch')
-rw-r--r-- | SOURCES/futex2.patch | 1063 |
1 files changed, 571 insertions, 492 deletions
diff --git a/SOURCES/futex2.patch b/SOURCES/futex2.patch index 3604062..482b6c1 100644 --- a/SOURCES/futex2.patch +++ b/SOURCES/futex2.patch @@ -1,7 +1,7 @@ -From a64bf661d4fc6dbfde640bf002eae2e22884a419 Mon Sep 17 00:00:00 2001 +From e434311562a21e6fc917edeadac7b25732c6ea60 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:00 -0300 -Subject: [PATCH 01/13] futex2: Implement wait and wake functions +Subject: [PATCH] futex2: Implement wait and wake functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -38,47 +38,45 @@ be used by N waiters, thus making easier to implement vectorized wait. (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no default size. - By default, the timeout uses a monotonic clock, but can be used as a - realtime one by using the FUTEX_REALTIME_CLOCK flag. + By default, the timeout uses a monotonic clock, but can be used as a realtime + one by using the FUTEX_REALTIME_CLOCK flag. - By default, futexes are of the private type, that means that this user - address will be accessed by threads that shares the same memory region. - This allows for some internal optimizations, so they are faster. - However, if the address needs to be shared with different processes - (like using `mmap()` or `shm()`), they need to be defined as shared and - the flag FUTEX_SHARED_FLAG is used to set that. + By default, futexes are of the private type, that means that this user address + will be accessed by threads that shares the same memory region. This allows for + some internal optimizations, so they are faster. However, if the address needs + to be shared with different processes (like using `mmap()` or `shm()`), they + need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. - By default, the operation has no NUMA-awareness, meaning that the user - can't choose the memory node where the kernel side futex data will be - stored. The user can choose the node where it wants to operate by - setting the FUTEX_NUMA_FLAG and using the following structure (where X - can be 8, 16, or 32): + By default, the operation has no NUMA-awareness, meaning that the user can't + choose the memory node where the kernel side futex data will be stored. The + user can choose the node where it wants to operate by setting the + FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or + 32): struct futexX_numa { __uX value; __sX hint; }; - This structure should be passed at the `void *uaddr` of futex - functions. The address of the structure will be used to be waited/waken - on, and the `value` will be compared to `val` as usual. The `hint` - member is used to defined which node the futex will use. When waiting, - the futex will be registered on a kernel-side table stored on that - node; when waking, the futex will be searched for on that given table. - That means that there's no redundancy between tables, and the wrong - `hint` value will led to undesired behavior. Userspace is responsible - for dealing with node migrations issues that may occur. `hint` can - range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use - the same node the current process is using. - - When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be - stored on a global table on some node, defined at compilation time. + This structure should be passed at the `void *uaddr` of futex functions. The + address of the structure will be used to be waited/waken on, and the + `value` will be compared to `val` as usual. The `hint` member is used to + defined which node the futex will use. When waiting, the futex will be + registered on a kernel-side table stored on that node; when waking, the futex + will be searched for on that given table. That means that there's no redundancy + between tables, and the wrong `hint` value will led to undesired behavior. + Userspace is responsible for dealing with node migrations issues that may + occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or + -1, to use the same node the current process is using. + + When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a + global table on some node, defined at compilation time. ** The `timo` argument -As per the Y2038 work done in the kernel, new interfaces shouldn't add -timeout options known to be buggy. Given that, `timo` should be a 64bit -timeout at all platforms, using an absolute timeout value. +As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout +options known to be buggy. Given that, `timo` should be a 64bit timeout at +all platforms, using an absolute timeout value. Signed-off-by: André Almeida <andrealmeid@collabora.com> --- @@ -92,7 +90,7 @@ This patch series introduces the futex2 syscalls. * What happened to the current futex()? For some years now, developers have been trying to add new features to -futex, but maintainers have been reluctant to accept them, given the +futex, but maintainers have been reluctant to accept then, given the multiplexed interface full of legacy features and tricky to do big changes. Some problems that people tried to address with patchsets are: NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2]. @@ -122,8 +120,7 @@ current futex. console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the - consumer side is essential for good performance of those running - over Wine. + consumer side is essential for good performance of those running over Wine. [0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/ [1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/ @@ -178,7 +175,7 @@ about it. This patchset can be also found at my git tree: -https://gitlab.collabora.com/tonyk/linux/-/tree/futex2 +https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev - Patch 1: Implements wait/wake, and the basics foundations of futex2 @@ -283,8 +280,6 @@ Along with that, the following work was done: Thanks, André - -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- MAINTAINERS | 2 +- arch/arm/tools/syscall.tbl | 2 + @@ -294,21 +289,21 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> arch/x86/entry/syscalls/syscall_64.tbl | 2 + include/linux/syscalls.h | 7 + include/uapi/asm-generic/unistd.h | 8 +- - include/uapi/linux/futex.h | 56 ++ + include/uapi/linux/futex.h | 5 + init/Kconfig | 7 + kernel/Makefile | 1 + - kernel/futex2.c | 625 ++++++++++++++++++ + kernel/futex2.c | 619 ++++++++++++++++++ kernel/sys_ni.c | 4 + tools/include/uapi/asm-generic/unistd.h | 8 +- .../arch/x86/entry/syscalls/syscall_64.tbl | 2 + - 15 files changed, 728 insertions(+), 4 deletions(-) + 15 files changed, 671 insertions(+), 4 deletions(-) create mode 100644 kernel/futex2.c diff --git a/MAINTAINERS b/MAINTAINERS -index bfc1b86e3..86ed91b72 100644 +index d92f85ca831d..01aceb92aa40 100644 --- a/MAINTAINERS +++ b/MAINTAINERS -@@ -7332,7 +7332,7 @@ F: Documentation/locking/*futex* +@@ -7370,7 +7370,7 @@ F: Documentation/locking/*futex* F: include/asm-generic/futex.h F: include/linux/futex.h F: include/uapi/linux/futex.h @@ -318,78 +313,78 @@ index bfc1b86e3..86ed91b72 100644 F: tools/testing/selftests/futex/ diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl -index 20e1170e2..4eef220cd 100644 +index dcc1191291a2..2bf93c69e00a 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl -@@ -455,3 +455,5 @@ - 439 common faccessat2 sys_faccessat2 +@@ -456,3 +456,5 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 -+442 common futex_wait sys_futex_wait -+443 common futex_wake sys_futex_wake + 442 common mount_setattr sys_mount_setattr ++443 common futex_wait sys_futex_wait ++444 common futex_wake sys_futex_wake diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h -index 86a9d7b3e..d1f7d35f9 100644 +index 949788f5ba40..64ebdc1ec581 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) --#define __NR_compat_syscalls 442 -+#define __NR_compat_syscalls 444 +-#define __NR_compat_syscalls 443 ++#define __NR_compat_syscalls 445 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h -index cccfbbefb..2db1529b2 100644 +index 3d874f624056..15c2cd5f1c95 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h -@@ -891,6 +891,10 @@ __SYSCALL(__NR_faccessat2, sys_faccessat2) - __SYSCALL(__NR_process_madvise, sys_process_madvise) - #define __NR_epoll_pwait2 441 +@@ -893,6 +893,10 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise) __SYSCALL(__NR_epoll_pwait2, compat_sys_epoll_pwait2) -+#define __NR_futex_wait 442 + #define __NR_mount_setattr 442 + __SYSCALL(__NR_mount_setattr, sys_mount_setattr) ++#define __NR_futex_wait 443 +__SYSCALL(__NR_futex_wait, sys_futex_wait) -+#define __NR_futex_wake 443 ++#define __NR_futex_wake 444 +__SYSCALL(__NR_futex_wake, sys_futex_wake) /* * Please add new compat syscalls above this comment and update diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl -index 874aeacde..ece90c8d9 100644 +index a1c9f496fca6..17d22509d780 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl -@@ -446,3 +446,5 @@ - 439 i386 faccessat2 sys_faccessat2 +@@ -447,3 +447,5 @@ 440 i386 process_madvise sys_process_madvise 441 i386 epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 -+442 i386 futex_wait sys_futex_wait -+443 i386 futex_wake sys_futex_wake + 442 i386 mount_setattr sys_mount_setattr ++443 i386 futex_wait sys_futex_wait ++444 i386 futex_wake sys_futex_wake diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl -index 78672124d..72fb65ef9 100644 +index 7bf01cbe582f..3336b5cd5bdb 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl -@@ -363,6 +363,8 @@ - 439 common faccessat2 sys_faccessat2 +@@ -364,6 +364,8 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 -+442 common futex_wait sys_futex_wait -+443 common futex_wake sys_futex_wake + 442 common mount_setattr sys_mount_setattr ++443 common futex_wait sys_futex_wait ++444 common futex_wake sys_futex_wake # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h -index 7688bc983..bf146c2b0 100644 +index 2839dc9a7c01..352f69a2b94c 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h -@@ -618,6 +618,13 @@ asmlinkage long sys_get_robust_list(int pid, +@@ -619,6 +619,13 @@ asmlinkage long sys_get_robust_list(int pid, asmlinkage long sys_set_robust_list(struct robust_list_head __user *head, size_t len); +/* kernel/futex2.c */ +asmlinkage long sys_futex_wait(void __user *uaddr, unsigned int val, + unsigned int flags, -+ struct __kernel_timespec __user __user *timo); ++ struct __kernel_timespec __user *timo); +asmlinkage long sys_futex_wake(void __user *uaddr, unsigned int nr_wake, + unsigned int flags); + @@ -397,97 +392,46 @@ index 7688bc983..bf146c2b0 100644 asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, struct __kernel_timespec __user *rmtp); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h -index 728752917..57e19200f 100644 +index ce58cff99b66..738315f148fa 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h -@@ -862,8 +862,14 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise) - #define __NR_epoll_pwait2 441 - __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2) +@@ -864,8 +864,14 @@ __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2) + #define __NR_mount_setattr 442 + __SYSCALL(__NR_mount_setattr, sys_mount_setattr) -+#define __NR_futex_wait 442 ++#define __NR_futex_wait 443 +__SYSCALL(__NR_futex_wait, sys_futex_wait) + -+#define __NR_futex_wake 443 ++#define __NR_futex_wake 444 +__SYSCALL(__NR_futex_wake, sys_futex_wake) + #undef __NR_syscalls --#define __NR_syscalls 442 -+#define __NR_syscalls 444 +-#define __NR_syscalls 443 ++#define __NR_syscalls 445 /* * 32 bit systems traditionally used different diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h -index a89eb0acc..9fbdaaf4f 100644 +index a89eb0accd5e..8d30f4b6d094 100644 --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h -@@ -41,6 +41,62 @@ +@@ -41,6 +41,11 @@ #define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \ FUTEX_PRIVATE_FLAG) +/* Size argument to futex2 syscall */ -+#define FUTEX_8 0 -+#define FUTEX_16 1 +#define FUTEX_32 2 + +#define FUTEX_SIZE_MASK 0x3 + -+#define FUTEX_SHARED_FLAG 8 -+ -+#define FUTEX_NUMA_FLAG 16 -+ -+/** -+ * struct futexXX_numa - struct for NUMA-aware futex operation -+ * @value: futex value -+ * @hint: node id to operate -+ */ -+ -+struct futex8_numa { -+ __u8 value; -+ __s8 hint; -+}; -+ -+struct futex16_numa { -+ __u16 value; -+ __s16 hint; -+}; -+ -+struct futex32_numa { -+ __u32 value; -+ __s32 hint; -+}; -+ -+#define FUTEX_WAITV_MAX 128 -+ -+/** -+ * struct futex_waitv - A waiter for vectorized wait -+ * @uaddr: User address to wait on -+ * @val: Expected value at uaddr -+ * @flags: Flags for this waiter -+ */ -+struct futex_waitv { -+ void *uaddr; -+ unsigned int val; -+ unsigned int flags; -+}; -+ -+/** -+ * struct futex_requeue - Define an address and its flags for requeue operation -+ * @uaddr: User address of one of the requeue arguments -+ * @flags: Flags for this address -+ */ -+struct futex_requeue { -+ void *uaddr; -+ unsigned int flags; -+}; -+ /* * Support for robust futexes: the kernel cleans up held futexes at * thread exit time. diff --git a/init/Kconfig b/init/Kconfig -index 29ad68325..c3e62e1b1 100644 +index 22946fe5ded9..0dce39965bfb 100644 --- a/init/Kconfig +++ b/init/Kconfig -@@ -1531,6 +1531,13 @@ config FUTEX +@@ -1538,6 +1538,13 @@ config FUTEX support for "fast userspace mutexes". The resulting kernel may not run glibc-based applications correctly. @@ -502,7 +446,7 @@ index 29ad68325..c3e62e1b1 100644 bool depends on FUTEX && RT_MUTEXES diff --git a/kernel/Makefile b/kernel/Makefile -index aa7368c7e..afbe15e51 100644 +index 320f1f3941b7..b6407f92c9af 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -57,6 +57,7 @@ obj-$(CONFIG_PROFILING) += profile.o @@ -515,10 +459,10 @@ index aa7368c7e..afbe15e51 100644 ifneq ($(CONFIG_SMP),y) diff --git a/kernel/futex2.c b/kernel/futex2.c new file mode 100644 -index 000000000..802578ad6 +index 000000000000..d6a2efbfa488 --- /dev/null +++ b/kernel/futex2.c -@@ -0,0 +1,625 @@ +@@ -0,0 +1,619 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * futex2 system call interface by André Almeida <andrealmeid@collabora.com> @@ -565,7 +509,7 @@ index 000000000..802578ad6 + * @index: Index of waiter in futexv list + */ +struct futex_waiter { -+ uintptr_t uaddr; ++ void __user *uaddr; + struct futex_key key; + struct list_head list; + unsigned int val; @@ -575,12 +519,12 @@ index 000000000..802578ad6 +}; + +/** -+ * struct futexv_head - List of futexes to be waited ++ * struct futex_waiter_head - List of futexes to be waited + * @task: Task to be awaken + * @hint: Was someone on this list awakened? + * @objects: List of futexes + */ -+struct futexv_head { ++struct futex_waiter_head { + struct task_struct *task; + bool hint; + struct futex_waiter objects[0]; @@ -598,28 +542,12 @@ index 000000000..802578ad6 + struct list_head list; +}; + -+/** -+ * struct futex_single_waiter - Wrapper for a futexv_head of one element -+ * @futexv: Single futexv element -+ * @waiter: Single waiter element -+ */ -+struct futex_single_waiter { -+ struct futexv_head futexv; -+ struct futex_waiter waiter; -+} __packed; + +/* Mask for futex2 flag operations */ -+#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_SHARED_FLAG | \ -+ FUTEX_CLOCK_REALTIME) ++#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME) + -+/* Mask for sys_futex_waitv flag */ -+#define FUTEXV_MASK (FUTEX_CLOCK_REALTIME) -+ -+/* Mask for each futex in futex_waitv list */ -+#define FUTEXV_WAITER_MASK (FUTEX_SIZE_MASK | FUTEX_SHARED_FLAG) -+ -+struct futex_bucket *futex_table; -+unsigned int futex2_hashsize; ++static struct futex_bucket *futex_table; ++static unsigned int futex2_hashsize; + +/* + * Reflects a new waiter being added to the waitqueue. @@ -681,7 +609,7 @@ index 000000000..802578ad6 + /* Checking if uaddr is valid and accessible */ + if (unlikely(!IS_ALIGNED(address, sizeof(u32)))) + return ERR_PTR(-EINVAL); -+ if (unlikely(!access_ok(address, sizeof(u32)))) ++ if (unlikely(!access_ok(uaddr, sizeof(u32)))) + return ERR_PTR(-EFAULT); + + key->offset = address % PAGE_SIZE; @@ -763,14 +691,14 @@ index 000000000..802578ad6 + * * -1 - If no futex was woken during the removal + * * 0>= - At least one futex was found woken, index of the last one + */ -+static int futex_dequeue_multiple(struct futexv_head *futexv, unsigned int nr) ++static int futex_dequeue_multiple(struct futex_waiter_head *futexv, unsigned int nr) +{ + int i, ret = -1; + + for (i = 0; i < nr; i++) { + spin_lock(&futexv->objects[i].bucket->lock); -+ if (!list_empty_careful(&futexv->objects[i].list)) { -+ list_del_init_careful(&futexv->objects[i].list); ++ if (!list_empty(&futexv->objects[i].list)) { ++ list_del_init(&futexv->objects[i].list); + bucket_dec_waiters(futexv->objects[i].bucket); + } else { + ret = i; @@ -820,18 +748,19 @@ index 000000000..802578ad6 + * * 0 - Everything is enqueued and we are ready to sleep + * * 0< - Something went wrong, nothing is enqueued, return error code + */ -+static int futex_enqueue(struct futexv_head *futexv, unsigned int nr_futexes, ++static int futex_enqueue(struct futex_waiter_head *futexv, unsigned int nr_futexes, + int *awakened) +{ + int i, ret; -+ u32 uval, *uaddr, val; ++ u32 uval, val; ++ u32 __user *uaddr; + struct futex_bucket *bucket; + +retry: + set_current_state(TASK_INTERRUPTIBLE); + + for (i = 0; i < nr_futexes; i++) { -+ uaddr = (u32 * __user)futexv->objects[i].uaddr; ++ uaddr = (u32 __user *)futexv->objects[i].uaddr; + val = (u32)futexv->objects[i].val; + + bucket = futexv->objects[i].bucket; @@ -848,12 +777,12 @@ index 000000000..802578ad6 + __set_current_state(TASK_RUNNING); + *awakened = futex_dequeue_multiple(futexv, i); + -+ if (__get_user(uval, uaddr)) -+ return -EFAULT; -+ + if (*awakened >= 0) + return 1; + ++ if (__get_user(uval, uaddr)) ++ return -EFAULT; ++ + goto retry; + } + @@ -887,7 +816,7 @@ index 000000000..802578ad6 + * * 0 >= - Hint of which futex woke us + * * 0 < - Error code + */ -+static int __futex_wait(struct futexv_head *futexv, unsigned int nr_futexes, ++static int __futex_wait(struct futex_waiter_head *futexv, unsigned int nr_futexes, + struct hrtimer_sleeper *timeout) +{ + int ret; @@ -924,7 +853,7 @@ index 000000000..802578ad6 + * thread, return -ERESTARTSYS. + * + * * If there's no signal pending, it was a spurious wake -+ * (scheduler gave us a change to do some work, even if we ++ * (scheduler gave us a chance to do some work, even if we + * don't want to). We need to remove ourselves from the + * bucket and add again, to prevent losing wakeups in the + * meantime. @@ -957,7 +886,7 @@ index 000000000..802578ad6 + * * 0 >= - Hint of which futex woke us + * * 0 < - Error code + */ -+static int futex_set_timer_and_wait(struct futexv_head *futexv, ++static int futex_set_timer_and_wait(struct futex_waiter_head *futexv, + unsigned int nr_futexes, + struct __kernel_timespec __user *timo, + unsigned int flags) @@ -996,9 +925,14 @@ index 000000000..802578ad6 + unsigned int, flags, struct __kernel_timespec __user *, timo) +{ + unsigned int size = flags & FUTEX_SIZE_MASK; -+ struct futex_single_waiter wait_single = {0}; + struct futex_waiter *waiter; -+ struct futexv_head *futexv; ++ struct futex_waiter_head *futexv; ++ ++ /* Wrapper for a futexv_head of one element */ ++ struct { ++ struct futex_waiter_head futexv; ++ struct futex_waiter waiter; ++ } __packed wait_single; + + if (flags & ~FUTEX2_MASK) + return -EINVAL; @@ -1013,7 +947,8 @@ index 000000000..802578ad6 + waiter = &wait_single.waiter; + waiter->index = 0; + waiter->val = val; -+ waiter->uaddr = (uintptr_t)uaddr; ++ waiter->uaddr = uaddr; ++ memset(&wait_single.waiter.key, 0, sizeof(struct futex_key)); + + INIT_LIST_HEAD(&waiter->list); + @@ -1032,13 +967,13 @@ index 000000000..802578ad6 + * + * Return: A pointer to its futexv struct + */ -+static inline struct futexv_head *futex_get_parent(uintptr_t waiter, -+ unsigned int index) ++static inline struct futex_waiter_head *futex_get_parent(uintptr_t waiter, ++ unsigned int index) +{ -+ uintptr_t parent = waiter - sizeof(struct futexv_head) ++ uintptr_t parent = waiter - sizeof(struct futex_waiter_head) + - (uintptr_t)(index * sizeof(struct futex_waiter)); + -+ return (struct futexv_head *)parent; ++ return (struct futex_waiter_head *)parent; +} + +/** @@ -1052,13 +987,14 @@ index 000000000..802578ad6 + struct wake_q_head *wake_q) +{ + struct task_struct *task; -+ struct futexv_head *parent = futex_get_parent((uintptr_t)waiter, -+ waiter->index); ++ struct futex_waiter_head *parent = futex_get_parent((uintptr_t)waiter, ++ waiter->index); + ++ lockdep_assert_held(&bucket->lock); + parent->hint = true; + task = parent->task; + get_task_struct(task); -+ list_del_init_careful(&waiter->list); ++ list_del_init(&waiter->list); + wake_q_add_safe(wake_q, task); + bucket_dec_waiters(bucket); +} @@ -1135,6 +1071,8 @@ index 000000000..802578ad6 + futex2_hashsize, futex2_hashsize); + futex2_hashsize = 1UL << futex_shift; + ++ BUG_ON(!is_power_of_2(futex2_hashsize)); ++ + for (i = 0; i < futex2_hashsize; i++) { + INIT_LIST_HEAD(&futex_table[i].list); + spin_lock_init(&futex_table[i].lock); @@ -1145,7 +1083,7 @@ index 000000000..802578ad6 +} +core_initcall(futex2_init); diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c -index 19aa80689..27ef83ca8 100644 +index 19aa806890d5..27ef83ca8a9d 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -150,6 +150,10 @@ COND_SYSCALL_COMPAT(set_robust_list); @@ -1160,22 +1098,22 @@ index 19aa80689..27ef83ca8 100644 /* kernel/itimer.c */ diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h -index 728752917..57e19200f 100644 +index ce58cff99b66..738315f148fa 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h -@@ -862,8 +862,14 @@ __SYSCALL(__NR_process_madvise, sys_process_madvise) - #define __NR_epoll_pwait2 441 - __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2) +@@ -864,8 +864,14 @@ __SC_COMP(__NR_epoll_pwait2, sys_epoll_pwait2, compat_sys_epoll_pwait2) + #define __NR_mount_setattr 442 + __SYSCALL(__NR_mount_setattr, sys_mount_setattr) -+#define __NR_futex_wait 442 ++#define __NR_futex_wait 443 +__SYSCALL(__NR_futex_wait, sys_futex_wait) + -+#define __NR_futex_wake 443 ++#define __NR_futex_wake 444 +__SYSCALL(__NR_futex_wake, sys_futex_wake) + #undef __NR_syscalls --#define __NR_syscalls 442 -+#define __NR_syscalls 444 +-#define __NR_syscalls 443 ++#define __NR_syscalls 445 /* * 32 bit systems traditionally used different @@ -1183,23 +1121,22 @@ diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch index 78672124d..15d2b89b6 100644 --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl -@@ -363,6 +363,8 @@ - 439 common faccessat2 sys_faccessat2 +@@ -364,6 +364,8 @@ 440 common process_madvise sys_process_madvise 441 common epoll_pwait2 sys_epoll_pwait2 -+442 common futex_wait sys_futex_wait -+443 common futex_wake sys_futex_wake + 442 common mount_setattr sys_mount_setattr ++443 common futex_wait sys_futex_wait ++444 common futex_wake sys_futex_wake # # Due to a historical design error, certain syscalls are numbered differently -- -2.30.2 +GitLab - -From ea4e3d7ee8dc965fbe3cabd753b88ada23cecb39 Mon Sep 17 00:00:00 2001 +From 5e0a8c1ff68778e373871b741a54554e89f0b8fe Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:01 -0300 -Subject: [PATCH 02/13] futex2: Add support for shared futexes +Subject: [PATCH] futex2: Add support for shared futexes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -1208,12 +1145,11 @@ Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. -There are two types of futexes: private and shared ones. The private are -futexes meant to be used by threads that shares the same memory space, -are easier to be uniquely identified an thus can have some performance -optimization. The elements for identifying one are: the start address of -the page where the address is, the address offset within the page and -the current->mm pointer. +There are two types of futexes: private and shared ones. The private are futexes +meant to be used by threads that shares the same memory space, are easier to be +uniquely identified an thus can have some performance optimization. The elements +for identifying one are: the start address of the page where the address is, +the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: @@ -1230,23 +1166,22 @@ Now, for uniquely identifying shared futex: page->index, an UUID for the struct inode and the offset within the page. -Note that members of futex_key doesn't have any particular meaning after -they are part of the struct - they are just bytes to identify a futex. -Given that, we don't need to use a particular name or type that matches -the original data, we only need to care about the bitsize of each -component and make both private and shared data fit in the same memory -space. +Note that members of futex_key doesn't have any particular meaning after they +are part of the struct - they are just bytes to identify a futex. Given that, +we don't need to use a particular name or type that matches the original data, +we only need to care about the bitsize of each component and make both private +and shared data fit in the same memory space. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- - fs/inode.c | 1 + - include/linux/fs.h | 1 + - kernel/futex2.c | 220 +++++++++++++++++++++++++++++++++++++++++++-- - 3 files changed, 217 insertions(+), 5 deletions(-) + fs/inode.c | 1 + + include/linux/fs.h | 1 + + include/uapi/linux/futex.h | 2 + + kernel/futex2.c | 222 ++++++++++++++++++++++++++++++++++++- + 4 files changed, 220 insertions(+), 6 deletions(-) diff --git a/fs/inode.c b/fs/inode.c -index 6442d97d9..886fe11cc 100644 +index a047ab306f9a..c5e1dd13fd40 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -139,6 +139,7 @@ int inode_init_always(struct super_block *sb, struct inode *inode) @@ -1258,10 +1193,10 @@ index 6442d97d9..886fe11cc 100644 inode->i_op = &empty_iops; inode->i_fop = &no_open_fops; diff --git a/include/linux/fs.h b/include/linux/fs.h -index fd47deea7..516bda982 100644 +index ec8f3ddf4a6a..33683ff94cb3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h -@@ -681,6 +681,7 @@ struct inode { +@@ -683,6 +683,7 @@ struct inode { }; atomic64_t i_version; atomic64_t i_sequence; /* see futex */ @@ -1269,8 +1204,21 @@ index fd47deea7..516bda982 100644 atomic_t i_count; atomic_t i_dio_count; atomic_t i_writecount; +diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h +index 8d30f4b6d094..70ea66fffb1c 100644 +--- a/include/uapi/linux/futex.h ++++ b/include/uapi/linux/futex.h +@@ -46,6 +46,8 @@ + + #define FUTEX_SIZE_MASK 0x3 + ++#define FUTEX_SHARED_FLAG 8 ++ + /* + * Support for robust futexes: the kernel cleans up held futexes at + * thread exit time. diff --git a/kernel/futex2.c b/kernel/futex2.c -index 802578ad6..27767b2d0 100644 +index d6a2efbfa488..69866f98f287 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -14,8 +14,10 @@ @@ -1295,19 +1243,21 @@ index 802578ad6..27767b2d0 100644 * @offset: Address offset of uaddr in a page */ struct futex_key { -@@ -97,6 +99,11 @@ struct futex_single_waiter { - /* Mask for each futex in futex_waitv list */ - #define FUTEXV_WAITER_MASK (FUTEX_SIZE_MASK | FUTEX_SHARED_FLAG) +@@ -79,7 +81,12 @@ struct futex_bucket { + + /* Mask for futex2 flag operations */ +-#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME) ++#define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME | FUTEX_SHARED_FLAG) ++ +#define is_object_shared ((futexv->objects[i].flags & FUTEX_SHARED_FLAG) ? true : false) + +#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */ +#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */ -+ - struct futex_bucket *futex_table; - unsigned int futex2_hashsize; -@@ -143,16 +150,200 @@ static inline int bucket_get_waiters(struct futex_bucket *bucket) + static struct futex_bucket *futex_table; + static unsigned int futex2_hashsize; +@@ -127,16 +134,200 @@ static inline int bucket_get_waiters(struct futex_bucket *bucket) #endif } @@ -1509,7 +1459,7 @@ index 802578ad6..27767b2d0 100644 { uintptr_t address = (uintptr_t)uaddr; u32 hash_key; -@@ -168,6 +359,9 @@ static struct futex_bucket *futex_get_bucket(void __user *uaddr, +@@ -152,6 +343,9 @@ static struct futex_bucket *futex_get_bucket(void __user *uaddr, key->pointer = (u64)address; key->index = (unsigned long)current->mm; @@ -1519,21 +1469,21 @@ index 802578ad6..27767b2d0 100644 /* Generate hash key for this futex using uaddr and current->mm */ hash_key = jhash2((u32 *)key, sizeof(*key) / sizeof(u32), 0); -@@ -303,6 +497,7 @@ static int futex_enqueue(struct futexv_head *futexv, unsigned int nr_futexes, - int *awakened) - { +@@ -289,6 +483,7 @@ static int futex_enqueue(struct futex_waiter_head *futexv, unsigned int nr_futex int i, ret; + u32 uval, val; + u32 __user *uaddr; + bool retry = false; - u32 uval, *uaddr, val; struct futex_bucket *bucket; -@@ -313,6 +508,18 @@ static int futex_enqueue(struct futexv_head *futexv, unsigned int nr_futexes, - uaddr = (u32 * __user)futexv->objects[i].uaddr; + retry: +@@ -298,6 +493,18 @@ static int futex_enqueue(struct futex_waiter_head *futexv, unsigned int nr_futex + uaddr = (u32 __user *)futexv->objects[i].uaddr; val = (u32)futexv->objects[i].val; + if (is_object_shared && retry) { + struct futex_bucket *tmp = -+ futex_get_bucket((void *)uaddr, ++ futex_get_bucket((void __user *)uaddr, + &futexv->objects[i].key, true); + if (IS_ERR(tmp)) { + __set_current_state(TASK_RUNNING); @@ -1546,23 +1496,23 @@ index 802578ad6..27767b2d0 100644 bucket = futexv->objects[i].bucket; bucket_inc_waiters(bucket); -@@ -333,6 +540,7 @@ static int futex_enqueue(struct futexv_head *futexv, unsigned int nr_futexes, - if (*awakened >= 0) - return 1; +@@ -318,6 +525,7 @@ static int futex_enqueue(struct futex_waiter_head *futexv, unsigned int nr_futex + if (__get_user(uval, uaddr)) + return -EFAULT; + retry = true; goto retry; } -@@ -474,6 +682,7 @@ static int futex_set_timer_and_wait(struct futexv_head *futexv, +@@ -459,6 +667,7 @@ static int futex_set_timer_and_wait(struct futex_waiter_head *futexv, SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned int, val, unsigned int, flags, struct __kernel_timespec __user *, timo) { + bool shared = (flags & FUTEX_SHARED_FLAG) ? true : false; unsigned int size = flags & FUTEX_SIZE_MASK; - struct futex_single_waiter wait_single = {0}; struct futex_waiter *waiter; -@@ -497,7 +706,7 @@ SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned int, val, + struct futex_waiter_head *futexv; +@@ -488,7 +697,7 @@ SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned int, val, INIT_LIST_HEAD(&waiter->list); /* Get an unlocked hash bucket */ @@ -1571,7 +1521,7 @@ index 802578ad6..27767b2d0 100644 if (IS_ERR(waiter->bucket)) return PTR_ERR(waiter->bucket); -@@ -562,6 +771,7 @@ static inline bool futex_match(struct futex_key key1, struct futex_key key2) +@@ -554,6 +763,7 @@ static inline bool futex_match(struct futex_key key1, struct futex_key key2) SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, unsigned int, flags) { @@ -1579,7 +1529,7 @@ index 802578ad6..27767b2d0 100644 unsigned int size = flags & FUTEX_SIZE_MASK; struct futex_waiter waiter, *aux, *tmp; struct futex_bucket *bucket; -@@ -574,7 +784,7 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, +@@ -566,7 +776,7 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, if (size != FUTEX_32) return -EINVAL; @@ -1589,13 +1539,12 @@ index 802578ad6..27767b2d0 100644 return PTR_ERR(bucket); -- -2.30.2 - +GitLab -From bdfdc48ad40d314933c7872f4818172e76bcd350 Mon Sep 17 00:00:00 2001 +From 9ac7e0f5c6a4f0e56d0e974e21ec4fa09d899be1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:00 -0300 -Subject: [PATCH 03/13] futex2: Implement vectorized wait +Subject: [PATCH] futex2: Implement vectorized wait MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -1620,71 +1569,57 @@ should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. -Returns the array index of one of the awakened futexes. There’s no given +Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it -(if it’s the first awakened, if it is of the smaller index...). +(if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- arch/arm/tools/syscall.tbl | 1 + - arch/arm64/include/asm/unistd.h | 2 +- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/compat.h | 11 ++ include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- - kernel/futex2.c | 171 ++++++++++++++++++ + include/uapi/linux/futex.h | 14 ++ + kernel/futex2.c | 177 ++++++++++++++++++ kernel/sys_ni.c | 1 + tools/include/uapi/asm-generic/unistd.h | 5 +- .../arch/x86/entry/syscalls/syscall_64.tbl | 1 + - 11 files changed, 200 insertions(+), 3 deletions(-) + 11 files changed, 219 insertions(+), 2 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl -index 4eef220cd..6d0f6626a 100644 +index 2bf93c69e00a..f9b55f2ea444 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl -@@ -457,3 +457,4 @@ - 441 common epoll_pwait2 sys_epoll_pwait2 - 442 common futex_wait sys_futex_wait - 443 common futex_wake sys_futex_wake -+444 common futex_waitv sys_futex_waitv -diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h -index d1f7d35f9..64ebdc1ec 100644 ---- a/arch/arm64/include/asm/unistd.h -+++ b/arch/arm64/include/asm/unistd.h -@@ -38,7 +38,7 @@ - #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) - #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) - --#define __NR_compat_syscalls 444 -+#define __NR_compat_syscalls 445 - #endif - - #define __ARCH_WANT_SYS_CLONE +@@ -458,3 +458,4 @@ + 442 common mount_setattr sys_mount_setattr + 443 common futex_wait sys_futex_wait + 444 common futex_wake sys_futex_wake ++445 common futex_waitv sys_futex_waitv diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl -index ece90c8d9..fe242fa0b 100644 +index 17d22509d780..4bc546c841b0 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl -@@ -448,3 +448,4 @@ - 441 i386 epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2 - 442 i386 futex_wait sys_futex_wait - 443 i386 futex_wake sys_futex_wake -+444 i386 futex_waitv sys_futex_waitv compat_sys_futex_waitv +@@ -449,3 +449,4 @@ + 442 i386 mount_setattr sys_mount_setattr + 443 i386 futex_wait sys_futex_wait + 444 i386 futex_wake sys_futex_wake ++445 i386 futex_waitv sys_futex_waitv compat_sys_futex_waitv diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl -index 72fb65ef9..9d0f07e05 100644 +index 3336b5cd5bdb..a715e88e3d6d 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl -@@ -365,6 +365,7 @@ - 441 common epoll_pwait2 sys_epoll_pwait2 - 442 common futex_wait sys_futex_wait - 443 common futex_wake sys_futex_wake -+444 common futex_waitv sys_futex_waitv +@@ -366,6 +366,7 @@ + 442 common mount_setattr sys_mount_setattr + 443 common futex_wait sys_futex_wait + 444 common futex_wake sys_futex_wake ++445 common futex_waitv sys_futex_waitv # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/compat.h b/include/linux/compat.h -index 6e65be753..041d18174 100644 +index 6e65be753603..041d18174350 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -365,6 +365,12 @@ struct compat_robust_list_head { @@ -1713,19 +1648,19 @@ index 6e65be753..041d18174 100644 asmlinkage long compat_sys_getitimer(int which, struct old_itimerval32 __user *it); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h -index bf146c2b0..7da1ceb36 100644 +index 352f69a2b94c..48e96fe7d8f6 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h -@@ -68,6 +68,7 @@ union bpf_attr; - struct io_uring_params; +@@ -69,6 +69,7 @@ struct io_uring_params; struct clone_args; struct open_how; + struct mount_attr; +struct futex_waitv; #include <linux/types.h> #include <linux/aio_abi.h> -@@ -624,6 +625,9 @@ asmlinkage long sys_futex_wait(void __user *uaddr, unsigned int val, - struct __kernel_timespec __user __user *timo); +@@ -625,6 +626,9 @@ asmlinkage long sys_futex_wait(void __user *uaddr, unsigned int val, + struct __kernel_timespec __user *timo); asmlinkage long sys_futex_wake(void __user *uaddr, unsigned int nr_wake, unsigned int flags); +asmlinkage long sys_futex_waitv(struct futex_waitv __user *waiters, @@ -1735,27 +1670,65 @@ index bf146c2b0..7da1ceb36 100644 /* kernel/hrtimer.c */ asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h -index 57e19200f..090da8e12 100644 +index 738315f148fa..2a6adca37fe9 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h -@@ -868,8 +868,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) - #define __NR_futex_wake 443 +@@ -870,8 +870,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) + #define __NR_futex_wake 444 __SYSCALL(__NR_futex_wake, sys_futex_wake) -+#define __NR_futex_waitv 444 ++#define __NR_futex_waitv 445 +__SC_COMP(__NR_futex_waitv, sys_futex_waitv, compat_sys_futex_waitv) + #undef __NR_syscalls --#define __NR_syscalls 444 -+#define __NR_syscalls 445 +-#define __NR_syscalls 445 ++#define __NR_syscalls 446 /* * 32 bit systems traditionally used different +diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h +index 70ea66fffb1c..3216aee015d2 100644 +--- a/include/uapi/linux/futex.h ++++ b/include/uapi/linux/futex.h +@@ -48,6 +48,20 @@ + + #define FUTEX_SHARED_FLAG 8 + ++#define FUTEX_WAITV_MAX 128 ++ ++/** ++ * struct futex_waitv - A waiter for vectorized wait ++ * @uaddr: User address to wait on ++ * @val: Expected value at uaddr ++ * @flags: Flags for this waiter ++ */ ++struct futex_waitv { ++ void __user *uaddr; ++ unsigned int val; ++ unsigned int flags; ++}; ++ + /* + * Support for robust futexes: the kernel cleans up held futexes at + * thread exit time. diff --git a/kernel/futex2.c b/kernel/futex2.c -index 27767b2d0..f3c2379ab 100644 +index 69866f98f287..3290e51695eb 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c -@@ -713,6 +713,177 @@ SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned int, val, +@@ -83,6 +83,12 @@ struct futex_bucket { + /* Mask for futex2 flag operations */ + #define FUTEX2_MASK (FUTEX_SIZE_MASK | FUTEX_CLOCK_REALTIME | FUTEX_SHARED_FLAG) + ++/* Mask for sys_futex_waitv flag */ ++#define FUTEXV_MASK (FUTEX_CLOCK_REALTIME) ++ ++/* Mask for each futex in futex_waitv list */ ++#define FUTEXV_WAITER_MASK (FUTEX_SIZE_MASK | FUTEX_SHARED_FLAG) ++ + #define is_object_shared ((futexv->objects[i].flags & FUTEX_SHARED_FLAG) ? true : false) + + #define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode */ +@@ -704,6 +710,177 @@ SYSCALL_DEFINE4(futex_wait, void __user *, uaddr, unsigned int, val, return futex_set_timer_and_wait(futexv, 1, timo, flags); } @@ -1768,7 +1741,7 @@ index 27767b2d0..f3c2379ab 100644 + * + * Return: Error code on failure, pointer to a prepared futexv otherwise + */ -+static int compat_futex_parse_waitv(struct futexv_head *futexv, ++static int compat_futex_parse_waitv(struct futex_waiter_head *futexv, + struct compat_futex_waitv __user *uwaitv, + unsigned int nr_futexes) +{ @@ -1786,7 +1759,7 @@ index 27767b2d0..f3c2379ab 100644 + + futexv->objects[i].key.pointer = 0; + futexv->objects[i].flags = waitv.flags; -+ futexv->objects[i].uaddr = (uintptr_t)compat_ptr(waitv.uaddr); ++ futexv->objects[i].uaddr = compat_ptr(waitv.uaddr); + futexv->objects[i].val = waitv.val; + futexv->objects[i].index = i; + @@ -1809,7 +1782,7 @@ index 27767b2d0..f3c2379ab 100644 + unsigned int, nr_futexes, unsigned int, flags, + struct __kernel_timespec __user *, timo) +{ -+ struct futexv_head *futexv; ++ struct futex_waiter_head *futexv; + int ret; + + if (flags & ~FUTEXV_MASK) @@ -1845,7 +1818,7 @@ index 27767b2d0..f3c2379ab 100644 + * + * Return: Error code on failure, pointer to a prepared futexv otherwise + */ -+static int futex_parse_waitv(struct futexv_head *futexv, ++static int futex_parse_waitv(struct futex_waiter_head *futexv, + struct futex_waitv __user *uwaitv, + unsigned int nr_futexes) +{ @@ -1863,7 +1836,7 @@ index 27767b2d0..f3c2379ab 100644 + + futexv->objects[i].key.pointer = 0; + futexv->objects[i].flags = waitv.flags; -+ futexv->objects[i].uaddr = (uintptr_t)waitv.uaddr; ++ futexv->objects[i].uaddr = waitv.uaddr; + futexv->objects[i].val = waitv.val; + futexv->objects[i].index = i; + @@ -1904,7 +1877,7 @@ index 27767b2d0..f3c2379ab 100644 + unsigned int, nr_futexes, unsigned int, flags, + struct __kernel_timespec __user *, timo) +{ -+ struct futexv_head *futexv; ++ struct futex_waiter_head *futexv; + int ret; + + if (flags & ~FUTEXV_MASK) @@ -1934,7 +1907,7 @@ index 27767b2d0..f3c2379ab 100644 * futex_get_parent - For a given futex in a futexv list, get a pointer to the futexv * @waiter: Address of futex in the list diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c -index 27ef83ca8..977890c58 100644 +index 27ef83ca8a9d..977890c58ab5 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -153,6 +153,7 @@ COND_SYSCALL_COMPAT(get_robust_list); @@ -1946,19 +1919,19 @@ index 27ef83ca8..977890c58 100644 /* kernel/hrtimer.c */ diff --git a/tools/include/uapi/asm-generic/unistd.h b/tools/include/uapi/asm-generic/unistd.h -index 57e19200f..23febe59e 100644 +index 738315f148fa..b1ab5e14d0c3 100644 --- a/tools/include/uapi/asm-generic/unistd.h +++ b/tools/include/uapi/asm-generic/unistd.h -@@ -868,8 +868,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) - #define __NR_futex_wake 443 +@@ -870,8 +870,11 @@ __SYSCALL(__NR_futex_wait, sys_futex_wait) + #define __NR_futex_wake 444 __SYSCALL(__NR_futex_wake, sys_futex_wake) -+#define __NR_futex_waitv 444 -+__SYSCALL(__NR_futex_wait, sys_futex_wait) ++#define __NR_epoll_pwait2 445 ++__SC_COMP(__NR_futex_waitv, sys_futex_waitv, compat_sys_futex_waitv) + #undef __NR_syscalls --#define __NR_syscalls 444 -+#define __NR_syscalls 445 +-#define __NR_syscalls 445 ++#define __NR_syscalls 446 /* * 32 bit systems traditionally used different @@ -1966,27 +1939,26 @@ diff --git a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl b/tools/perf/arch index 15d2b89b6..820c1e4b1 100644 --- a/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl +++ b/tools/perf/arch/x86/entry/syscalls/syscall_64.tbl -@@ -365,6 +365,7 @@ - 441 common epoll_pwait2 sys_epoll_pwait2 - 442 common futex_wait sys_futex_wait - 443 common futex_wake sys_futex_wake -+444 common futex_waitv sys_futex_waitv +@@ -366,6 +366,7 @@ + 442 common mount_setattr sys_mount_setattr + 443 common futex_wait sys_futex_wait + 444 common futex_wake sys_futex_wake ++445 common futex_waitv sys_futex_waitv # # Due to a historical design error, certain syscalls are numbered differently -- -2.30.2 - +GitLab -From e1198b0e26063ba40993154176b8232f646c3c4b Mon Sep 17 00:00:00 2001 +From f22443c43d18f487392cb2a3d2a6557ff6081ce0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:01 -0300 -Subject: [PATCH 04/13] futex2: Implement requeue operation +Subject: [PATCH] futex2: Implement requeue operation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit -Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation. +Implement requeue interface similary to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, @@ -2039,8 +2011,6 @@ the current performance: futex_requeue) just like wait/wake(). In that way, we could avoid the copy_from_user(). - -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- @@ -2049,21 +2019,22 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> include/linux/compat.h | 12 ++ include/linux/syscalls.h | 5 + include/uapi/asm-generic/unistd.h | 5 +- + include/uapi/linux/futex.h | 10 ++ kernel/futex2.c | 215 +++++++++++++++++++++++++ kernel/sys_ni.c | 1 + - 9 files changed, 241 insertions(+), 2 deletions(-) + 10 files changed, 251 insertions(+), 2 deletions(-) diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl -index 6d0f6626a..9aa108802 100644 +index f9b55f2ea444..24a700535747 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl -@@ -458,3 +458,4 @@ - 442 common futex_wait sys_futex_wait - 443 common futex_wake sys_futex_wake - 444 common futex_waitv sys_futex_waitv -+445 common futex_requeue sys_futex_requeue +@@ -459,3 +459,4 @@ + 443 common futex_wait sys_futex_wait + 444 common futex_wake sys_futex_wake + 445 common futex_waitv sys_futex_waitv ++446 common futex_requeue sys_futex_requeue diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h -index 64ebdc1ec..d1cc2849d 100644 +index 64ebdc1ec581..d1cc2849dc00 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ @@ -2076,28 +2047,28 @@ index 64ebdc1ec..d1cc2849d 100644 #define __ARCH_WANT_SYS_CLONE diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl -index fe242fa0b..0cd1df235 100644 +index 4bc546c841b0..4d0111f44d79 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl -@@ -449,3 +449,4 @@ - 442 i386 futex_wait sys_futex_wait - 443 i386 futex_wake sys_futex_wake - 444 i386 futex_waitv sys_futex_waitv compat_sys_futex_waitv -+445 i386 futex_requeue sys_futex_requeue compat_sys_futex_requeue +@@ -450,3 +450,4 @@ + 443 i386 futex_wait sys_futex_wait + 444 i386 futex_wake sys_futex_wake + 445 i386 futex_waitv sys_futex_waitv compat_sys_futex_waitv ++446 i386 futex_requeue sys_futex_requeue compat_sys_futex_requeue diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl -index 9d0f07e05..abbfddcdb 100644 +index a715e88e3d6d..61c0b47365e3 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl -@@ -366,6 +366,7 @@ - 442 common futex_wait sys_futex_wait - 443 common futex_wake sys_futex_wake - 444 common futex_waitv sys_futex_waitv -+445 common futex_requeue sys_futex_requeue +@@ -367,6 +367,7 @@ + 443 common futex_wait sys_futex_wait + 444 common futex_wake sys_futex_wake + 445 common futex_waitv sys_futex_waitv ++446 common futex_requeue sys_futex_requeue # # Due to a historical design error, certain syscalls are numbered differently diff --git a/include/linux/compat.h b/include/linux/compat.h -index 041d18174..d4c1b402b 100644 +index 041d18174350..d4c1b402b962 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -371,6 +371,11 @@ struct compat_futex_waitv { @@ -2127,18 +2098,18 @@ index 041d18174..d4c1b402b 100644 asmlinkage long compat_sys_getitimer(int which, struct old_itimerval32 __user *it); diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h -index 7da1ceb36..06823bc7e 100644 +index 48e96fe7d8f6..b0675f236066 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h -@@ -69,6 +69,7 @@ struct io_uring_params; - struct clone_args; +@@ -70,6 +70,7 @@ struct clone_args; struct open_how; + struct mount_attr; struct futex_waitv; +struct futex_requeue; #include <linux/types.h> #include <linux/aio_abi.h> -@@ -628,6 +629,10 @@ asmlinkage long sys_futex_wake(void __user *uaddr, unsigned int nr_wake, +@@ -629,6 +630,10 @@ asmlinkage long sys_futex_wake(void __user *uaddr, unsigned int nr_wake, asmlinkage long sys_futex_waitv(struct futex_waitv __user *waiters, unsigned int nr_futexes, unsigned int flags, struct __kernel_timespec __user *timo); @@ -2150,27 +2121,48 @@ index 7da1ceb36..06823bc7e 100644 /* kernel/hrtimer.c */ asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h -index 090da8e12..095c10a83 100644 +index 2a6adca37fe9..2778da551846 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h -@@ -871,8 +871,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) - #define __NR_futex_waitv 444 +@@ -873,8 +873,11 @@ __SYSCALL(__NR_futex_wake, sys_futex_wake) + #define __NR_futex_waitv 445 __SC_COMP(__NR_futex_waitv, sys_futex_waitv, compat_sys_futex_waitv) -+#define __NR_futex_requeue 445 ++#define __NR_futex_requeue 446 +__SC_COMP(__NR_futex_requeue, sys_futex_requeue, compat_sys_futex_requeue) + #undef __NR_syscalls --#define __NR_syscalls 445 -+#define __NR_syscalls 446 +-#define __NR_syscalls 446 ++#define __NR_syscalls 447 /* * 32 bit systems traditionally used different +diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h +index 3216aee015d2..c15bfddcf1e2 100644 +--- a/include/uapi/linux/futex.h ++++ b/include/uapi/linux/futex.h +@@ -62,6 +62,16 @@ struct futex_waitv { + unsigned int flags; + }; + ++/** ++ * struct futex_requeue - Define an address and its flags for requeue operation ++ * @uaddr: User address of one of the requeue arguments ++ * @flags: Flags for this address ++ */ ++struct futex_requeue { ++ void __user *uaddr; ++ unsigned int flags; ++}; ++ + /* + * Support for robust futexes: the kernel cleans up held futexes at + * thread exit time. diff --git a/kernel/futex2.c b/kernel/futex2.c -index f3c2379ab..bad8c183c 100644 +index 3290e51695eb..75c961b309bb 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c -@@ -977,6 +977,221 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, +@@ -975,6 +975,221 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, return ret; } @@ -2232,7 +2224,7 @@ index f3c2379ab..bad8c183c 100644 + + if (unlikely(ret)) { + futex_double_unlock(b1, b2); -+ if (__get_user(uval, (u32 * __user)rq1.uaddr)) ++ if (__get_user(uval, (u32 __user *)rq1.uaddr)) + return -EFAULT; + + bucket_dec_waiters(b2); @@ -2393,7 +2385,7 @@ index f3c2379ab..bad8c183c 100644 { int i; diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c -index 977890c58..1750dfc41 100644 +index 977890c58ab5..1750dfc416d8 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -154,6 +154,7 @@ COND_SYSCALL_COMPAT(get_robust_list); @@ -2405,13 +2397,12 @@ index 977890c58..1750dfc41 100644 /* kernel/hrtimer.c */ -- -2.30.2 - +GitLab -From 9ef45e80251029ad164b538b20f0d68a9b75865c Mon Sep 17 00:00:00 2001 +From a439256740d175a9a5f9da6f81af1ad070541151 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Thu, 11 Feb 2021 10:47:23 -0300 -Subject: [PATCH 05/13] futex2: Add compatibility entry point for x86_x32 ABI +Subject: [PATCH] futex2: Add compatibility entry point for x86_x32 ABI MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -2421,13 +2412,12 @@ paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- kernel/futex2.c | 42 +++++++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-) diff --git a/kernel/futex2.c b/kernel/futex2.c -index bad8c183c..8a8b45f98 100644 +index 75c961b309bb..61b81b401e58 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c @@ -23,6 +23,10 @@ @@ -2441,7 +2431,7 @@ index bad8c183c..8a8b45f98 100644 /** * struct futex_key - Components to build unique key for a futex * @pointer: Pointer to current->mm or inode's UUID for file backed futexes -@@ -875,7 +879,16 @@ SYSCALL_DEFINE4(futex_waitv, struct futex_waitv __user *, waiters, +@@ -872,7 +876,16 @@ SYSCALL_DEFINE4(futex_waitv, struct futex_waitv __user *, waiters, futexv->hint = false; futexv->task = current; @@ -2459,7 +2449,7 @@ index bad8c183c..8a8b45f98 100644 if (!ret) ret = futex_set_timer_and_wait(futexv, nr_futexes, timo, flags); -@@ -1181,13 +1194,28 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, +@@ -1179,13 +1192,28 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, if (flags) return -EINVAL; @@ -2495,13 +2485,12 @@ index bad8c183c..8a8b45f98 100644 return __futex_requeue(rq1, rq2, nr_wake, nr_requeue, cmpval, shared1, shared2); } -- -2.30.2 - +GitLab -From 80944da5db0f1e00d0bf174d85f74ae4df2444aa Mon Sep 17 00:00:00 2001 +From 6ae4534eedef59fa3bd48e4cc03961813e10f643 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Tue, 9 Feb 2021 13:59:00 -0300 -Subject: [PATCH 06/13] docs: locking: futex2: Add documentation +Subject: [PATCH] docs: locking: futex2: Add documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -2510,7 +2499,6 @@ Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- Documentation/locking/futex2.rst | 198 +++++++++++++++++++++++++++++++ Documentation/locking/index.rst | 1 + @@ -2519,7 +2507,7 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst new file mode 100644 -index 000000000..edd47c22f +index 000000000000..3ab49f0e741c --- /dev/null +++ b/Documentation/locking/futex2.rst @@ -0,0 +1,198 @@ @@ -2531,10 +2519,10 @@ index 000000000..edd47c22f + +:Author: André Almeida <andrealmeid@collabora.com> + -+futex, or fast user mutex, is a set of syscalls to allow the userspace to create ++futex, or fast user mutex, is a set of syscalls to allow userspace to create +performant synchronization mechanisms, such as mutexes, semaphores and +conditional variables in userspace. C standard libraries, like glibc, uses it -+as means to implements more high level interfaces like pthreads. ++as a means to implement more high level interfaces like pthreads. + +The interface +============= @@ -2561,7 +2549,7 @@ index 000000000..edd47c22f +one by using the FUTEX_REALTIME_CLOCK flag. + +By default, futexes are of the private type, that means that this user address -+will be accessed by threads that shares the same memory region. This allows for ++will be accessed by threads that share the same memory region. This allows for +some internal optimizations, so they are faster. However, if the address needs +to be shared with different processes (like using ``mmap()`` or ``shm()``), they +need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. @@ -2580,22 +2568,22 @@ index 000000000..edd47c22f +This structure should be passed at the ``void *uaddr`` of futex functions. The +address of the structure will be used to be waited on/waken on, and the +``value`` will be compared to ``val`` as usual. The ``hint`` member is used to -+defined which node the futex will use. When waiting, the futex will be ++define which node the futex will use. When waiting, the futex will be +registered on a kernel-side table stored on that node; when waking, the futex +will be searched for on that given table. That means that there's no redundancy -+between tables, and the wrong ``hint`` value will led to undesired behavior. ++between tables, and the wrong ``hint`` value will lead to undesired behavior. +Userspace is responsible for dealing with node migrations issues that may -+occur. ``hint`` can range from [0, MAX_NUMA_NODES], for specifying a node, or ++occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a node, or +-1, to use the same node the current process is using. + +When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a -+global table on some node, defined at compilation time. ++global table on allocated on the first node. + +The ``timo`` argument +--------------------- + +As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout -+options known to be buggy. Given that, ``timo`` should be a 64bit timeout at ++options known to be buggy. Given that, ``timo`` should be a 64-bit timeout at +all platforms, using an absolute timeout value. + +Implementation @@ -2614,7 +2602,7 @@ index 000000000..edd47c22f +of futexes to wait for (using struct futexv_head). For futex_wait() calls, this +list will have a single object. + -+We have a hash table, were waiters register themselves before sleeping. Then, ++We have a hash table, where waiters register themselves before sleeping. Then +the wake function checks this table looking for waiters at uaddr. The hash +bucket to be used is determined by a struct futex_key, that stores information +to uniquely identify an address from a given process. Given the huge address @@ -2627,7 +2615,7 @@ index 000000000..edd47c22f +futexes. The check (``*uaddr == val``) can fail for two reasons: + +- The values are different, and we return -EAGAIN. However, if while -+ dequeueing we found that some futex were awakened, we prioritize this ++ dequeueing we found that some futexes were awakened, we prioritize this + and return success. + +- When trying to access the user address, we do so with page faults @@ -2636,20 +2624,20 @@ index 000000000..edd47c22f + fault, or an invalid address. We release the lock, dequeue everyone + (because it's illegal to sleep while there are futexes enqueued, we + could lose wakeups) and try again with page fault enabled. If we -+ succeeded, this means that the address is valid, but we need to do ++ succeed, this means that the address is valid, but we need to do + all the work again. For serialization reasons, we need to have the + spin lock when getting the user value. Additionally, for shared + futexes, we also need to recalculate the hash, since the underlying + mapping mechanisms could have changed when dealing with page fault. + If, even with page fault enabled, we can't access the address, it + means it's an invalid user address, and we return -EFAULT. For this -+ case, we prioritize the error, even if some futex were awaken. ++ case, we prioritize the error, even if some futexes were awaken. + +If the check is OK, they are enqueued on a linked list in our bucket, and +proceed to the next one. If all waiters succeed, we put the thread to sleep +until a futex_wake() call, timeout expires or we get a signal. After waking up, -+we dequeue everyone, and check if some futex was awaken. This dequeue is done by -+iteratively walking at each element of struct futex_head list. ++we dequeue everyone, and check if some futex was awakened. This dequeue is done ++by iteratively walking at each element of struct futex_head list. + +All enqueuing/dequeuing operations requires to hold the bucket lock, to avoid +racing while modifying the list. @@ -2660,9 +2648,9 @@ index 000000000..edd47c22f +We get the bucket that's storing the waiters at uaddr, and wake the required +number of waiters, checking for hash collision. + -+There's an optimization that makes futex_wake() not taking the bucket lock if -+there's no one to be wake on that bucket. It checks an atomic counter that each -+bucket has, if it says 0, than the syscall exits. In order to this work, the ++There's an optimization that makes futex_wake() not take the bucket lock if ++there's no one to be woken on that bucket. It checks an atomic counter that each ++bucket has, if it says 0, then the syscall exits. In order for this to work, the +waiter thread increases it before taking the lock, so the wake thread will +correctly see that there's someone waiting and will continue the path to take +the bucket lock. To get the correct serialization, the waiter issues a memory @@ -2673,10 +2661,10 @@ index 000000000..edd47c22f +--------- + +The requeue path first checks for each struct futex_requeue and their flags. -+Then, it will compare the excepted value with the one at uaddr1::uaddr. ++Then, it will compare the expected value with the one at uaddr1::uaddr. +Following the same serialization explained at Waking_, we increase the atomic +counter for the bucket of uaddr2 before taking the lock. We need to have both -+buckets locks at same time so we don't race with others futexes operations. To ++buckets locks at same time so we don't race with other futex operation. To +ensure the locks are taken in the same order for all threads (and thus avoiding +deadlocks), every requeue operation takes the "smaller" bucket first, when +comparing both addresses. @@ -2684,33 +2672,33 @@ index 000000000..edd47c22f +If the compare with user value succeeds, we proceed by waking ``nr_wake`` +futexes, and then requeuing ``nr_requeue`` from bucket of uaddr1 to the uaddr2. +This consists in a simple list deletion/addition and replacing the old futex key -+for the new one. ++with the new one. + +Futex keys +---------- + +There are two types of futexes: private and shared ones. The private are futexes -+meant to be used by threads that shares the same memory space, are easier to be -+uniquely identified an thus can have some performance optimization. The elements -+for identifying one are: the start address of the page where the address is, -+the address offset within the page and the current->mm pointer. ++meant to be used by threads that share the same memory space, are easier to be ++uniquely identified and thus can have some performance optimization. The ++elements for identifying one are: the start address of the page where the ++address is, the address offset within the page and the current->mm pointer. + -+Now, for uniquely identifying shared futex: ++Now, for uniquely identifying a shared futex: + +- If the page containing the user address is an anonymous page, we can + just use the same data used for private futexes (the start address of + the page, the address offset within the page and the current->mm -+ pointer) that will be enough for uniquely identifying such futex. We ++ pointer); that will be enough for uniquely identifying such futex. We + also set one bit at the key to differentiate if a private futex is -+ used on the same address (mixing shared and private calls do not ++ used on the same address (mixing shared and private calls does not + work). + +- If the page is file-backed, current->mm maybe isn't the same one for + every user of this futex, so we need to use other data: the -+ page->index, an UUID for the struct inode and the offset within the ++ page->index, a UUID for the struct inode and the offset within the + page. + -+Note that members of futex_key doesn't have any particular meaning after they ++Note that members of futex_key don't have any particular meaning after they +are part of the struct - they are just bytes to identify a futex. Given that, +we don't need to use a particular name or type that matches the original data, +we only need to care about the bitsize of each component and make both private @@ -2722,7 +2710,7 @@ index 000000000..edd47c22f +.. kernel-doc:: kernel/futex2.c + :no-identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst -index 7003bd5ae..9bf03c7fa 100644 +index 7003bd5aeff4..9bf03c7fa1ec 100644 --- a/Documentation/locking/index.rst +++ b/Documentation/locking/index.rst @@ -24,6 +24,7 @@ locking @@ -2734,13 +2722,12 @@ index 7003bd5ae..9bf03c7fa 100644 .. only:: subproject and html -- -2.30.2 +GitLab - -From 807830198558476757c3e1b77fcfad2129fe29fa Mon Sep 17 00:00:00 2001 +From 2e88815c729ee716e3ce15c0b182d9a0d7958079 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:01 -0300 -Subject: [PATCH 07/13] selftests: futex2: Add wake/wait test +Subject: [PATCH] selftests: futex2: Add wake/wait test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -2756,7 +2743,6 @@ temporary workaround that implements the required types and calls the appropriated syscalls, since futex2 doesn't supports 32 bit sized time. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 6 +- @@ -2768,7 +2754,7 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> create mode 100644 tools/testing/selftests/futex/include/futex2test.h diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore -index 0efcd494d..d61f1df94 100644 +index 0efcd494daab..d61f1df94360 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -6,3 +6,4 @@ futex_wait_private_mapped_file @@ -2777,7 +2763,7 @@ index 0efcd494d..d61f1df94 100644 futex_wait_wouldblock +futex2_wait diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile -index 23207829e..9b334f190 100644 +index 23207829ec75..9b334f190759 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -1,10 +1,11 @@ @@ -2805,7 +2791,7 @@ index 23207829e..9b334f190 100644 diff --git a/tools/testing/selftests/futex/functional/futex2_wait.c b/tools/testing/selftests/futex/functional/futex2_wait.c new file mode 100644 -index 000000000..4b5416585 +index 000000000000..4b5416585c79 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_wait.c @@ -0,0 +1,209 @@ @@ -3019,7 +3005,7 @@ index 000000000..4b5416585 + return ret; +} diff --git a/tools/testing/selftests/futex/functional/run.sh b/tools/testing/selftests/futex/functional/run.sh -index 1acb6ace1..3730159c8 100755 +index 1acb6ace1680..3730159c865a 100755 --- a/tools/testing/selftests/futex/functional/run.sh +++ b/tools/testing/selftests/futex/functional/run.sh @@ -73,3 +73,6 @@ echo @@ -3031,7 +3017,7 @@ index 1acb6ace1..3730159c8 100755 +./futex2_wait $COLOR diff --git a/tools/testing/selftests/futex/include/futex2test.h b/tools/testing/selftests/futex/include/futex2test.h new file mode 100644 -index 000000000..e724d56b9 +index 000000000000..e724d56b917e --- /dev/null +++ b/tools/testing/selftests/futex/include/futex2test.h @@ -0,0 +1,79 @@ @@ -3115,13 +3101,12 @@ index 000000000..e724d56b9 + return syscall(__NR_futex_wake, uaddr, nr, flags); +} -- -2.30.2 - +GitLab -From 382ed2cfcea3ed7e77d07e3e12b3769a081001ea Mon Sep 17 00:00:00 2001 +From 4564c36175c0c0cf31cac0fd1e94c4813b6c7244 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:01 -0300 -Subject: [PATCH 08/13] selftests: futex2: Add timeout test +Subject: [PATCH] selftests: futex2: Add timeout test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -3131,13 +3116,12 @@ futex2. futex2 accepts only absolute 64bit timers, but supports both monotonic and realtime clocks. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- .../futex/functional/futex_wait_timeout.c | 58 ++++++++++++++++--- 1 file changed, 49 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_timeout.c b/tools/testing/selftests/futex/functional/futex_wait_timeout.c -index ee55e6d38..b4dffe9e3 100644 +index ee55e6d389a3..b4dffe9e3b44 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c +++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c @@ -11,6 +11,7 @@ @@ -3236,13 +3220,12 @@ index ee55e6d38..b4dffe9e3 100644 return ret; } -- -2.30.2 +GitLab - -From 27d37b4e24805d9dc5478c296ee680a8a4db8a6e Mon Sep 17 00:00:00 2001 +From 3e2b4ab146c93d51e083f410b34be3ccb2bbee5b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:01 -0300 -Subject: [PATCH 09/13] selftests: futex2: Add wouldblock test +Subject: [PATCH] selftests: futex2: Add wouldblock test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -3251,13 +3234,12 @@ Adapt existing futex wait wouldblock file to test the same mechanism for futex2. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- .../futex/functional/futex_wait_wouldblock.c | 33 ++++++++++++++++--- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c -index 0ae390ff8..ed3660090 100644 +index 0ae390ff8164..ed3660090907 100644 --- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c +++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c @@ -12,6 +12,7 @@ @@ -3331,13 +3313,12 @@ index 0ae390ff8..ed3660090 100644 return ret; } -- -2.30.2 - +GitLab -From 2b2f4e71b3bb09c0d45f9eae4c1986155d3a1235 Mon Sep 17 00:00:00 2001 +From 671e79d935493cdc5aa05a7e3a96547adbb3c200 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:02 -0300 -Subject: [PATCH 10/13] selftests: futex2: Add waitv test +Subject: [PATCH] selftests: futex2: Add waitv test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -3347,7 +3328,6 @@ shared futexes. Wake the last futex in the array, and check if the return value from futex_waitv() is the right index. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- @@ -3358,7 +3338,7 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore -index d61f1df94..d0b8f637b 100644 +index d61f1df94360..d0b8f637b786 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -7,3 +7,4 @@ futex_wait_timeout @@ -3367,7 +3347,7 @@ index d61f1df94..d0b8f637b 100644 futex2_wait +futex2_waitv diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile -index 9b334f190..09c08ccde 100644 +index 9b334f190759..09c08ccdeaf2 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -16,7 +16,8 @@ TEST_GEN_FILES := \ @@ -3382,7 +3362,7 @@ index 9b334f190..09c08ccde 100644 diff --git a/tools/testing/selftests/futex/functional/futex2_waitv.c b/tools/testing/selftests/futex/functional/futex2_waitv.c new file mode 100644 -index 000000000..2f81d296d +index 000000000000..2f81d296d95d --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_waitv.c @@ -0,0 +1,157 @@ @@ -3544,7 +3524,7 @@ index 000000000..2f81d296d + return ret; +} diff --git a/tools/testing/selftests/futex/functional/run.sh b/tools/testing/selftests/futex/functional/run.sh -index 3730159c8..18b3883d7 100755 +index 3730159c865a..18b3883d7236 100755 --- a/tools/testing/selftests/futex/functional/run.sh +++ b/tools/testing/selftests/futex/functional/run.sh @@ -76,3 +76,6 @@ echo @@ -3555,7 +3535,7 @@ index 3730159c8..18b3883d7 100755 +echo +./futex2_waitv $COLOR diff --git a/tools/testing/selftests/futex/include/futex2test.h b/tools/testing/selftests/futex/include/futex2test.h -index e724d56b9..31979afc4 100644 +index e724d56b917e..31979afc486f 100644 --- a/tools/testing/selftests/futex/include/futex2test.h +++ b/tools/testing/selftests/futex/include/futex2test.h @@ -28,6 +28,19 @@ @@ -3596,13 +3576,12 @@ index e724d56b9..31979afc4 100644 + return syscall(__NR_futex_waitv, waiters, nr_waiters, flags, timo); +} -- -2.30.2 +GitLab - -From 18a89fdf17baa9595b09bb98cc545ecba4ce93fb Mon Sep 17 00:00:00 2001 +From c6f8ed51da9411d893ea5efe6ba3cba4624cfda5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:02 -0300 -Subject: [PATCH 11/13] selftests: futex2: Add requeue test +Subject: [PATCH] selftests: futex2: Add requeue test MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -3613,7 +3592,6 @@ requeue, and we check return values to see if the operation woke/requeued the expected number of waiters. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- .../selftests/futex/functional/.gitignore | 1 + .../selftests/futex/functional/Makefile | 3 +- @@ -3623,7 +3601,7 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore -index d0b8f637b..af7557e82 100644 +index d0b8f637b786..af7557e821da 100644 --- a/tools/testing/selftests/futex/functional/.gitignore +++ b/tools/testing/selftests/futex/functional/.gitignore @@ -8,3 +8,4 @@ futex_wait_uninitialized_heap @@ -3632,7 +3610,7 @@ index d0b8f637b..af7557e82 100644 futex2_waitv +futex2_requeue diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile -index 09c08ccde..3ccb9ea58 100644 +index 09c08ccdeaf2..3ccb9ea58ddd 100644 --- a/tools/testing/selftests/futex/functional/Makefile +++ b/tools/testing/selftests/futex/functional/Makefile @@ -17,7 +17,8 @@ TEST_GEN_FILES := \ @@ -3647,7 +3625,7 @@ index 09c08ccde..3ccb9ea58 100644 diff --git a/tools/testing/selftests/futex/functional/futex2_requeue.c b/tools/testing/selftests/futex/functional/futex2_requeue.c new file mode 100644 -index 000000000..1bc3704dc +index 000000000000..1bc3704dc8c2 --- /dev/null +++ b/tools/testing/selftests/futex/functional/futex2_requeue.c @@ -0,0 +1,164 @@ @@ -3816,7 +3794,7 @@ index 000000000..1bc3704dc + return ret; +} diff --git a/tools/testing/selftests/futex/include/futex2test.h b/tools/testing/selftests/futex/include/futex2test.h -index 31979afc4..e2635006b 100644 +index 31979afc486f..e2635006b1a9 100644 --- a/tools/testing/selftests/futex/include/futex2test.h +++ b/tools/testing/selftests/futex/include/futex2test.h @@ -103,3 +103,19 @@ static inline int futex2_waitv(volatile struct futex_waitv *waiters, unsigned lo @@ -3840,13 +3818,12 @@ index 31979afc4..e2635006b 100644 + return syscall(__NR_futex_requeue, uaddr1, uaddr2, nr_wake, nr_requeue, cmpval, flags); +} -- -2.30.2 - +GitLab -From 799e24f7b39e114107b36c4cc4ece4825a9fa6a0 Mon Sep 17 00:00:00 2001 +From 7a9cacbb174d26d4adfcca3efa5747bf6ae05fa2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:02 -0300 -Subject: [PATCH 12/13] perf bench: Add futex2 benchmark tests +Subject: [PATCH] perf bench: Add futex2 benchmark tests MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -3857,7 +3834,6 @@ measure the performance of implementation, but also as stress testing for the kernel infrastructure. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- tools/arch/x86/include/asm/unistd_64.h | 12 ++++++ tools/perf/bench/bench.h | 4 ++ @@ -3870,7 +3846,7 @@ Signed-off-by: Jan200101 <sentrycraft123@gmail.com> 8 files changed, 206 insertions(+), 34 deletions(-) diff --git a/tools/arch/x86/include/asm/unistd_64.h b/tools/arch/x86/include/asm/unistd_64.h -index 4205ed415..cf5ad4ea1 100644 +index 4205ed4158bf..b65c51e8d675 100644 --- a/tools/arch/x86/include/asm/unistd_64.h +++ b/tools/arch/x86/include/asm/unistd_64.h @@ -17,3 +17,15 @@ @@ -3879,18 +3855,18 @@ index 4205ed415..cf5ad4ea1 100644 #endif + +#ifndef __NR_futex_wait -+# define __NR_futex_wait 442 ++# define __NR_futex_wait 443 +#endif + +#ifndef __NR_futex_wake -+# define __NR_futex_wake 443 ++# define __NR_futex_wake 444 +#endif + +#ifndef __NR_futex_requeue -+# define __NR_futex_requeue 445 ++# define __NR_futex_requeue 446 +#endif diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h -index eac36afab..12346844b 100644 +index eac36afab2b3..12346844b354 100644 --- a/tools/perf/bench/bench.h +++ b/tools/perf/bench/bench.h @@ -38,9 +38,13 @@ int bench_mem_memcpy(int argc, const char **argv); @@ -3908,10 +3884,10 @@ index eac36afab..12346844b 100644 int bench_futex_lock_pi(int argc, const char **argv); int bench_epoll_wait(int argc, const char **argv); diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c -index 915bf3da7..6e62e7708 100644 +index b65373ce5c4f..1068749af40c 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c -@@ -34,7 +34,7 @@ static unsigned int nthreads = 0; +@@ -33,7 +33,7 @@ static unsigned int nthreads = 0; static unsigned int nsecs = 10; /* amount of futexes per thread */ static unsigned int nfutexes = 1024; @@ -3920,7 +3896,7 @@ index 915bf3da7..6e62e7708 100644 static int futex_flag = 0; struct timeval bench__start, bench__end, bench__runtime; -@@ -86,7 +86,10 @@ static void *workerfn(void *arg) +@@ -85,7 +85,10 @@ static void *workerfn(void *arg) * such as internal waitqueue handling, thus enlarging * the critical region protected by hb->lock. */ @@ -3932,7 +3908,7 @@ index 915bf3da7..6e62e7708 100644 if (!silent && (!ret || errno != EAGAIN || errno != EWOULDBLOCK)) warn("Non-expected futex return call"); -@@ -117,7 +120,7 @@ static void print_summary(void) +@@ -116,7 +119,7 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } @@ -3941,7 +3917,7 @@ index 915bf3da7..6e62e7708 100644 { int ret = 0; cpu_set_t cpuset; -@@ -149,7 +152,9 @@ int bench_futex_hash(int argc, const char **argv) +@@ -148,7 +151,9 @@ int bench_futex_hash(int argc, const char **argv) if (!worker) goto errmem; @@ -3952,7 +3928,7 @@ index 915bf3da7..6e62e7708 100644 futex_flag = FUTEX_PRIVATE_FLAG; printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futexes for %d secs.\n\n", -@@ -229,3 +234,14 @@ int bench_futex_hash(int argc, const char **argv) +@@ -228,3 +233,14 @@ int bench_futex_hash(int argc, const char **argv) errmem: err(EXIT_FAILURE, "calloc"); } @@ -3968,7 +3944,7 @@ index 915bf3da7..6e62e7708 100644 + return __bench_futex_hash(argc, argv); +} diff --git a/tools/perf/bench/futex-requeue.c b/tools/perf/bench/futex-requeue.c -index 7a15c2e61..4c7486fbe 100644 +index 5fa23295ee5f..6cdd649b54f4 100644 --- a/tools/perf/bench/futex-requeue.c +++ b/tools/perf/bench/futex-requeue.c @@ -2,8 +2,8 @@ @@ -3982,7 +3958,7 @@ index 7a15c2e61..4c7486fbe 100644 * * This program is particularly useful to measure the latency of nthread * requeues without waking up any tasks -- thus mimicking a regular futex_wait. -@@ -29,7 +29,10 @@ +@@ -28,7 +28,10 @@ #include <stdlib.h> #include <sys/time.h> @@ -3994,7 +3970,7 @@ index 7a15c2e61..4c7486fbe 100644 /* * How many tasks to requeue at a time. -@@ -38,7 +41,7 @@ static u_int32_t futex1 = 0, futex2 = 0; +@@ -37,7 +40,7 @@ static u_int32_t futex1 = 0, futex2 = 0; static unsigned int nrequeue = 1; static pthread_t *worker; @@ -4003,7 +3979,7 @@ index 7a15c2e61..4c7486fbe 100644 static pthread_mutex_t thread_lock; static pthread_cond_t thread_parent, thread_worker; static struct stats requeuetime_stats, requeued_stats; -@@ -80,7 +83,11 @@ static void *workerfn(void *arg __maybe_unused) +@@ -79,7 +82,11 @@ static void *workerfn(void *arg __maybe_unused) pthread_cond_wait(&thread_worker, &thread_lock); pthread_mutex_unlock(&thread_lock); @@ -4016,7 +3992,7 @@ index 7a15c2e61..4c7486fbe 100644 return NULL; } -@@ -112,7 +119,7 @@ static void toggle_done(int sig __maybe_unused, +@@ -111,7 +118,7 @@ static void toggle_done(int sig __maybe_unused, done = true; } @@ -4025,7 +4001,7 @@ index 7a15c2e61..4c7486fbe 100644 { int ret = 0; unsigned int i, j; -@@ -140,15 +147,20 @@ int bench_futex_requeue(int argc, const char **argv) +@@ -139,15 +146,20 @@ int bench_futex_requeue(int argc, const char **argv) if (!worker) err(EXIT_FAILURE, "calloc"); @@ -4048,7 +4024,7 @@ index 7a15c2e61..4c7486fbe 100644 init_stats(&requeued_stats); init_stats(&requeuetime_stats); -@@ -177,11 +189,15 @@ int bench_futex_requeue(int argc, const char **argv) +@@ -176,11 +188,15 @@ int bench_futex_requeue(int argc, const char **argv) gettimeofday(&start, NULL); while (nrequeued < nthreads) { /* @@ -4067,7 +4043,7 @@ index 7a15c2e61..4c7486fbe 100644 } gettimeofday(&end, NULL); -@@ -195,8 +211,12 @@ int bench_futex_requeue(int argc, const char **argv) +@@ -194,8 +210,12 @@ int bench_futex_requeue(int argc, const char **argv) j + 1, nrequeued, nthreads, runtime.tv_usec / (double)USEC_PER_MSEC); } @@ -4082,7 +4058,7 @@ index 7a15c2e61..4c7486fbe 100644 if (nthreads != nrequeued) warnx("couldn't wakeup all tasks (%d/%d)", nrequeued, nthreads); -@@ -221,3 +241,14 @@ int bench_futex_requeue(int argc, const char **argv) +@@ -220,3 +240,14 @@ int bench_futex_requeue(int argc, const char **argv) usage_with_options(bench_futex_requeue_usage, options); exit(EXIT_FAILURE); } @@ -4098,7 +4074,7 @@ index 7a15c2e61..4c7486fbe 100644 + return __bench_futex_requeue(argc, argv); +} diff --git a/tools/perf/bench/futex-wake-parallel.c b/tools/perf/bench/futex-wake-parallel.c -index cd2b81a84..8a89c6ab9 100644 +index 6e6f5247e1fe..cac90fc0bfb3 100644 --- a/tools/perf/bench/futex-wake-parallel.c +++ b/tools/perf/bench/futex-wake-parallel.c @@ -17,6 +17,12 @@ int bench_futex_wake_parallel(int argc __maybe_unused, const char **argv __maybe @@ -4114,7 +4090,7 @@ index cd2b81a84..8a89c6ab9 100644 #else /* HAVE_PTHREAD_BARRIER */ /* For the CLR_() macros */ #include <string.h> -@@ -48,7 +54,7 @@ static unsigned int nwakes = 1; +@@ -47,7 +53,7 @@ static unsigned int nwakes = 1; static u_int32_t futex = 0; static pthread_t *blocked_worker; @@ -4123,7 +4099,7 @@ index cd2b81a84..8a89c6ab9 100644 static unsigned int nblocked_threads = 0, nwaking_threads = 0; static pthread_mutex_t thread_lock; static pthread_cond_t thread_parent, thread_worker; -@@ -79,7 +85,11 @@ static void *waking_workerfn(void *arg) +@@ -78,7 +84,11 @@ static void *waking_workerfn(void *arg) gettimeofday(&start, NULL); @@ -4136,7 +4112,7 @@ index cd2b81a84..8a89c6ab9 100644 if (waker->nwoken != nwakes) warnx("couldn't wakeup all tasks (%d/%d)", waker->nwoken, nwakes); -@@ -130,8 +140,13 @@ static void *blocked_workerfn(void *arg __maybe_unused) +@@ -129,8 +139,13 @@ static void *blocked_workerfn(void *arg __maybe_unused) pthread_mutex_unlock(&thread_lock); while (1) { /* handle spurious wakeups */ @@ -4152,7 +4128,7 @@ index cd2b81a84..8a89c6ab9 100644 } pthread_exit(NULL); -@@ -218,7 +233,7 @@ static void toggle_done(int sig __maybe_unused, +@@ -217,7 +232,7 @@ static void toggle_done(int sig __maybe_unused, done = true; } @@ -4161,7 +4137,7 @@ index cd2b81a84..8a89c6ab9 100644 { int ret = 0; unsigned int i, j; -@@ -262,7 +277,9 @@ int bench_futex_wake_parallel(int argc, const char **argv) +@@ -261,7 +276,9 @@ int bench_futex_wake_parallel(int argc, const char **argv) if (!blocked_worker) err(EXIT_FAILURE, "calloc"); @@ -4172,7 +4148,7 @@ index cd2b81a84..8a89c6ab9 100644 futex_flag = FUTEX_PRIVATE_FLAG; printf("Run summary [PID %d]: blocking on %d threads (at [%s] " -@@ -322,4 +339,16 @@ int bench_futex_wake_parallel(int argc, const char **argv) +@@ -321,4 +338,16 @@ int bench_futex_wake_parallel(int argc, const char **argv) free(blocked_worker); return ret; } @@ -4190,10 +4166,10 @@ index cd2b81a84..8a89c6ab9 100644 + #endif /* HAVE_PTHREAD_BARRIER */ diff --git a/tools/perf/bench/futex-wake.c b/tools/perf/bench/futex-wake.c -index 2dfcef3e3..be4481f5e 100644 +index 6d217868f53c..546d2818eed8 100644 --- a/tools/perf/bench/futex-wake.c +++ b/tools/perf/bench/futex-wake.c -@@ -39,7 +39,7 @@ static u_int32_t futex1 = 0; +@@ -38,7 +38,7 @@ static u_int32_t futex1 = 0; static unsigned int nwakes = 1; pthread_t *worker; @@ -4202,7 +4178,7 @@ index 2dfcef3e3..be4481f5e 100644 static pthread_mutex_t thread_lock; static pthread_cond_t thread_parent, thread_worker; static struct stats waketime_stats, wakeup_stats; -@@ -69,8 +69,13 @@ static void *workerfn(void *arg __maybe_unused) +@@ -68,8 +68,13 @@ static void *workerfn(void *arg __maybe_unused) pthread_mutex_unlock(&thread_lock); while (1) { @@ -4218,7 +4194,7 @@ index 2dfcef3e3..be4481f5e 100644 } pthread_exit(NULL); -@@ -118,7 +123,7 @@ static void toggle_done(int sig __maybe_unused, +@@ -117,7 +122,7 @@ static void toggle_done(int sig __maybe_unused, done = true; } @@ -4227,7 +4203,7 @@ index 2dfcef3e3..be4481f5e 100644 { int ret = 0; unsigned int i, j; -@@ -148,7 +153,9 @@ int bench_futex_wake(int argc, const char **argv) +@@ -147,7 +152,9 @@ int bench_futex_wake(int argc, const char **argv) if (!worker) err(EXIT_FAILURE, "calloc"); @@ -4238,7 +4214,7 @@ index 2dfcef3e3..be4481f5e 100644 futex_flag = FUTEX_PRIVATE_FLAG; printf("Run summary [PID %d]: blocking on %d threads (at [%s] futex %p), " -@@ -180,9 +187,14 @@ int bench_futex_wake(int argc, const char **argv) +@@ -179,9 +186,14 @@ int bench_futex_wake(int argc, const char **argv) /* Ok, all threads are patiently blocked, start waking folks up */ gettimeofday(&start, NULL); @@ -4255,7 +4231,7 @@ index 2dfcef3e3..be4481f5e 100644 timersub(&end, &start, &runtime); update_stats(&wakeup_stats, nwoken); -@@ -212,3 +224,14 @@ int bench_futex_wake(int argc, const char **argv) +@@ -211,3 +223,14 @@ int bench_futex_wake(int argc, const char **argv) free(worker); return ret; } @@ -4271,7 +4247,7 @@ index 2dfcef3e3..be4481f5e 100644 + return __bench_futex_wake(argc, argv); +} diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h -index 31b53cc7d..6b2213cf3 100644 +index 31b53cc7d5bc..6b2213cf3f64 100644 --- a/tools/perf/bench/futex.h +++ b/tools/perf/bench/futex.h @@ -86,4 +86,51 @@ futex_cmp_requeue(u_int32_t *uaddr, u_int32_t val, u_int32_t *uaddr2, int nr_wak @@ -4327,7 +4303,7 @@ index 31b53cc7d..6b2213cf3 100644 +} #endif /* _FUTEX_H */ diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c -index 62a7b7420..e41a95ad2 100644 +index 62a7b7420a44..e41a95ad2db6 100644 --- a/tools/perf/builtin-bench.c +++ b/tools/perf/builtin-bench.c @@ -12,10 +12,11 @@ @@ -4370,13 +4346,118 @@ index 62a7b7420..e41a95ad2 100644 {"epoll", "Epoll stressing benchmarks", epoll_benchmarks }, #endif -- -2.30.2 +GitLab + +From dcd3dfa6c75d110174c948aadb14e41416a826dd Mon Sep 17 00:00:00 2001 +From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> +Date: Fri, 5 Feb 2021 10:34:02 -0300 +Subject: [PATCH] kernel: Enable waitpid() for futex2 +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit +To make pthreads works as expected if they are using futex2, wake +clear_child_tid with futex2 as well. This is make applications that uses +waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the +child to terminate. Given that apps should not mix futex() and futex2(), +any correct app will trigger a harmless noop wakeup on the interface +that it isn't using. -From ea9a7956b5f6f44f3ee70d82542c64fcb7c86c5e Mon Sep 17 00:00:00 2001 +Signed-off-by: André Almeida <andrealmeid@collabora.com> +--- + +This commit is here for the intend to show what we need to do in order +to get a full NPTL working on top of futex2. It should be merged after +we talk to glibc folks on the details around the futex_wait() side. For +instance, we could use this as an opportunity to use private futexes or +8bit sized futexes, but both sides need to use the exactly same flags. +--- + include/linux/syscalls.h | 2 ++ + kernel/fork.c | 2 ++ + kernel/futex2.c | 30 ++++++++++++++++++------------ + 3 files changed, 22 insertions(+), 12 deletions(-) + +diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h +index b0675f236066..b07b7d4334a6 100644 +--- a/include/linux/syscalls.h ++++ b/include/linux/syscalls.h +@@ -1316,6 +1316,8 @@ int ksys_ipc(unsigned int call, int first, unsigned long second, + unsigned long third, void __user * ptr, long fifth); + int compat_ksys_ipc(u32 call, int first, int second, + u32 third, u32 ptr, u32 fifth); ++long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, ++ unsigned int flags); + + /* + * The following kernel syscall equivalents are just wrappers to fs-internal +diff --git a/kernel/fork.c b/kernel/fork.c +index d66cd1014211..e39846a73a43 100644 +--- a/kernel/fork.c ++++ b/kernel/fork.c +@@ -1308,6 +1308,8 @@ static void mm_release(struct task_struct *tsk, struct mm_struct *mm) + put_user(0, tsk->clear_child_tid); + do_futex(tsk->clear_child_tid, FUTEX_WAKE, + 1, NULL, NULL, 0, 0); ++ ksys_futex_wake(tsk->clear_child_tid, 1, ++ FUTEX_32 | FUTEX_SHARED_FLAG); + } + tsk->clear_child_tid = NULL; + } +diff --git a/kernel/futex2.c b/kernel/futex2.c +index 61b81b401e58..b92c3ca5e89f 100644 +--- a/kernel/futex2.c ++++ b/kernel/futex2.c +@@ -940,18 +940,8 @@ static inline bool futex_match(struct futex_key key1, struct futex_key key2) + key1.offset == key2.offset); + } + +-/** +- * sys_futex_wake - Wake a number of futexes waiting on an address +- * @uaddr: Address of futex to be woken up +- * @nr_wake: Number of futexes waiting in uaddr to be woken up +- * @flags: Flags for size and shared +- * +- * Wake `nr_wake` threads waiting at uaddr. +- * +- * Returns the number of woken threads on success, error code otherwise. +- */ +-SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, +- unsigned int, flags) ++long ksys_futex_wake(void __user *uaddr, unsigned long nr_wake, ++ unsigned int flags) + { + bool shared = (flags & FUTEX_SHARED_FLAG) ? true : false; + unsigned int size = flags & FUTEX_SIZE_MASK; +@@ -988,6 +978,22 @@ SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, + return ret; + } + ++/** ++ * sys_futex_wake - Wake a number of futexes waiting on an address ++ * @uaddr: Address of futex to be woken up ++ * @nr_wake: Number of futexes waiting in uaddr to be woken up ++ * @flags: Flags for size and shared ++ * ++ * Wake `nr_wake` threads waiting at uaddr. ++ * ++ * Returns the number of woken threads on success, error code otherwise. ++ */ ++SYSCALL_DEFINE3(futex_wake, void __user *, uaddr, unsigned int, nr_wake, ++ unsigned int, flags) ++{ ++ return ksys_futex_wake(uaddr, nr_wake, flags); ++} ++ + static void futex_double_unlock(struct futex_bucket *b1, struct futex_bucket *b2) + { + spin_unlock(&b1->lock); +-- +GitLab + +From 3fe11532bcb2560097e0f19bfd612ca4e19cd098 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Andr=C3=A9=20Almeida?= <andrealmeid@collabora.com> Date: Fri, 5 Feb 2021 10:34:02 -0300 -Subject: [PATCH 13/13] futex2: Add sysfs entry for syscall numbers +Subject: [PATCH] futex2: Add sysfs entry for syscall numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit @@ -4388,16 +4469,15 @@ experimenting with futex2 (like Proton/Wine) can test it and set the syscall number at runtime, rather than setting it at compilation time. Signed-off-by: André Almeida <andrealmeid@collabora.com> -Signed-off-by: Jan200101 <sentrycraft123@gmail.com> --- kernel/futex2.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/kernel/futex2.c b/kernel/futex2.c -index 8a8b45f98..1eb20410d 100644 +index b92c3ca5e89f..d138340a3f7b 100644 --- a/kernel/futex2.c +++ b/kernel/futex2.c -@@ -1220,6 +1220,48 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, +@@ -1224,6 +1224,48 @@ SYSCALL_DEFINE6(futex_requeue, struct futex_requeue __user *, uaddr1, return __futex_requeue(rq1, rq2, nr_wake, nr_requeue, cmpval, shared1, shared2); } @@ -4447,5 +4527,4 @@ index 8a8b45f98..1eb20410d 100644 { int i; -- -2.30.2 - +GitLab |