kernel 6.9.12 cleaning

author: Jan200101 <sentrycraft123@gmail.com> 2024-09-26 20:08:54 +0200
committer: Jan200101 <sentrycraft123@gmail.com> 2024-09-26 20:08:54 +0200
commit: 28cd700b822627cde036087f4c613a234f0d0858 (patch)
tree: 9b78517a97e9ff6447151b0a4d54977d80931c83 /SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch
parent: c44454c0aa0ae1b0999ef46e18516670536b122a (diff)
download: kernel-fsync-28cd700b822627cde036087f4c613a234f0d0858.tar.gz
kernel-fsync-28cd700b822627cde036087f4c613a234f0d0858.zip
1 files changed, 67 insertions, 118 deletions
diff --git a/SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch b/SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch
index eb3cf54..ef562c9 100644
--- a/SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch
+++ b/SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch
@@ -1,125 +1,74 @@
-Processors based on the Zen microarchitecture support IOPORT based deeper
-C-states. The idle driver reads the acpi_gbl_FADT.xpm_timer_block.address
-in the IOPORT based C-state exit path which is claimed to be a
-"Dummy wait op" and has been around since ACPI introduction to Linux
-dating back to Andy Grover's Mar 14, 2002 posting [1].
-The comment above the dummy operation was elaborated by Andreas Mohr back
-in 2006 in commit b488f02156d3d ("ACPI: restore comment justifying 'extra'
-P_LVLx access") [2] where the commit log claims:
-"this dummy read was about: STPCLK# doesn't get asserted in time on
-(some) chipsets, which is why we need to have a dummy I/O read to delay
-further instruction processing until the CPU is fully stopped."
-
-However, sampling certain workloads with IBS on AMD Zen3 system shows
-that a significant amount of time is spent in the dummy op, which
-incorrectly gets accounted as C-State residency. A large C-State
-residency value can prime the cpuidle governor to recommend a deeper
-C-State during the subsequent idle instances, starting a vicious cycle,
-leading to performance degradation on workloads that rapidly switch
-between busy and idle phases.
-
-One such workload is tbench where a massive performance degradation can
-be observed during certain runs. Following are some statistics gathered
-by running tbench with 128 clients, on a dual socket (2 x 64C/128T) Zen3
-system with the baseline kernel, baseline kernel keeping C2 disabled,
-and baseline kernel with this patch applied keeping C2 enabled:
-
-baseline kernel was tip:sched/core at
-commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
-wakelist if wakee cpu is idle")
-
-Kernel        : baseline      baseline + C2 disabled   baseline + patch
-
-Min (MB/s)    : 2215.06       33072.10 (+1393.05%)     33016.10 (+1390.52%)
-Max (MB/s)    : 32938.80      34399.10                 34774.50
-Median (MB/s) : 32191.80      33476.60                 33805.70
-AMean (MB/s)  : 22448.55      33649.27 (+49.89%)       33865.43 (+50.85%)
-AMean Stddev  : 17526.70      680.14                   880.72
-AMean CoefVar : 78.07%        2.02%                    2.60%
-
-The data shows there are edge cases that can cause massive regressions
-in case of tbench. Profiling the bad runs with IBS shows a significant
-amount of time being spent in acpi_idle_do_entry method:
-
-Overhead  Command          Shared Object             Symbol
-  74.76%  swapper          [kernel.kallsyms]         [k] acpi_idle_do_entry
-   0.71%  tbench           [kernel.kallsyms]         [k] update_sd_lb_stats.constprop.0
-   0.69%  tbench_srv       [kernel.kallsyms]         [k] update_sd_lb_stats.constprop.0
-   0.49%  swapper          [kernel.kallsyms]         [k] psi_group_change
-   ...
-
-Annotation of acpi_idle_do_entry method reveals almost all the time in
-acpi_idle_do_entry is spent on the port I/O in wait_for_freeze():
-
-  0.14 │      in     (%dx),%al       # <------ First "in" corresponding to inb(cx->address)
-  0.51 │      mov    0x144d64d(%rip),%rax
-  0.00 │      test   $0x80000000,%eax
-       │    ↓ jne    62 	     # <------ Skip if running in guest
-  0.00 │      mov    0x19800c3(%rip),%rdx
- 99.33 │      in     (%dx),%eax      # <------ Second "in" corresponding to inl(acpi_gbl_FADT.xpm_timer_block.address)
-  0.00 │62:   mov    -0x8(%rbp),%r12
-  0.00 │      leave
-  0.00 │    ← ret
-
-This overhead is reflected in the C2 residency on the test system where
-C2 is an IOPORT based C-State. The total C-state residency reported by
-"cpupower idle-info" on CPU0 for good and bad case over the 80s tbench
-run is as follows (all numbers are in microseconds):
-
-			    Good Run 		Bad Run
-			   (Baseline)
-
-POLL: 			       43338		   6231  (-85.62%)
-C1 (MWAIT Based): 	    23576156 		 363861  (-98.45%)
-C2 (IOPORT Based): 	    10781218 	       77027280  (+614.45%)
-
-The larger residency value in bad case leads to the system recommending
-C2 state again for subsequent idle instances. The pattern lasts till the
-end of the tbench run. Following is the breakdown of "entry_method"
-passed to acpi_idle_do_entry during good run and bad run:
-
-                                        			Good Run    Bad Run
-							       (Baseline)
-
-Number of times acpi_idle_do_entry was called:             	6149573     6149050  (-0.01%)
- |-> Number of times entry_method was "ACPI_CSTATE_FFH":        6141494       88144  (-98.56%)
- |-> Number of times entry_method was "ACPI_CSTATE_HALT":             0           0  (+0.00%)
- |-> Number of times entry_method was "ACPI_CSTATE_SYSTEMIO":      8079     6060906  (+74920.49%)
-
-For processors based on the Zen microarchitecture, this dummy wait op is
-unnecessary and can be skipped when choosing IOPORT based C-States to
-avoid polluting the C-state residency information.
-
-Link: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=972c16130d9dc182cedcdd408408d9eacc7d6a2d [1]
-Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b488f02156d3deb08f5ad7816d565c370a8cc6f1 [2]
-
-Suggested-by: Calvin Ong <calvin.ong@amd.com>
-Cc: stable@vger.kernel.org
-Cc: regressions@lists.linux.dev
-Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
+From e400ad8b7e6a1b9102123c6240289a811501f7d9 Mon Sep 17 00:00:00 2001
+From: Dave Hansen <dave.hansen@intel.com>
+Date: Thu, 22 Sep 2022 11:47:45 -0700
+Subject: [PATCH] ACPI: processor idle: Practically limit "Dummy wait"
+ workaround to old Intel systems
+
+Old, circa 2002 chipsets have a bug: they don't go idle when they are
+supposed to.  So, a workaround was added to slow the CPU down and
+ensure that the CPU waits a bit for the chipset to actually go idle.
+This workaround is ancient and has been in place in some form since
+the original kernel ACPI implementation.
+
+But, this workaround is very painful on modern systems.  The "inl()"
+can take thousands of cycles (see Link: for some more detailed
+numbers and some fun kernel archaeology).
+
+First and foremost, modern systems should not be using this code.
+Typical Intel systems have not used it in over a decade because it is
+horribly inferior to MWAIT-based idle.
+
+Despite this, people do seem to be tripping over this workaround on
+AMD system today.
+
+Limit the "dummy wait" workaround to Intel systems.  Keep Modern AMD
+systems from tripping over the workaround.  Remotely modern Intel
+systems use intel_idle instead of this code and will, in practice,
+remain unaffected by the dummy wait.
+
+Reported-by: K Prateek Nayak <kprateek.nayak@amd.com>
+Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
+Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
+Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
+Link: https://lore.kernel.org/all/20220921063638.2489-1-kprateek.nayak@amd.com/
+Link: https://lkml.kernel.org/r/20220922184745.3252932-1-dave.hansen@intel.com
 ---
- drivers/acpi/processor_idle.c | 7 +++++--
- 1 file changed, 5 insertions(+), 2 deletions(-)
+ drivers/acpi/processor_idle.c | 23 ++++++++++++++++++++---
+ 1 file changed, 20 insertions(+), 3 deletions(-)
 
 diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
-index 16a1663d02d4..18850aa2b79b 100644
+index 16a1663d02d460..9f40917c49efbf 100644
 --- a/drivers/acpi/processor_idle.c
 +++ b/drivers/acpi/processor_idle.c
-@@ -529,9 +529,11 @@ static __cpuidle void io_idle(unsigned long addr)
- 	inb(addr);
- 
- #ifdef	CONFIG_X86
--	/* No delay is needed if we are in guest */
--	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
--		return;
+@@ -531,10 +531,27 @@ static void wait_for_freeze(void)
+ 	/* No delay is needed if we are in guest */
+ 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ 		return;
 +	/*
-+	 * No delay is needed if we are in guest or on a processor
-+	 * based on the Zen microarchitecture.
++	 * Modern (>=Nehalem) Intel systems use ACPI via intel_idle,
++	 * not this code.  Assume that any Intel systems using this
++	 * are ancient and may need the dummy wait.  This also assumes
++	 * that the motivating chipset issue was Intel-only.
 +	 */
-+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
- 	/*
- 	 * Modern (>=Nehalem) Intel systems use ACPI via intel_idle,
- 	 * not this code.  Assume that any Intel systems using this
-
--- 
-2.25.1
++	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
++		return;
+ #endif
+-	/* Dummy wait op - must do something useless after P_LVL2 read
+-	   because chipsets cannot guarantee that STPCLK# signal
+-	   gets asserted in time to freeze execution properly. */
++	/*
++	 * Dummy wait op - must do something useless after P_LVL2 read
++	 * because chipsets cannot guarantee that STPCLK# signal gets
++	 * asserted in time to freeze execution properly
++	 *
++	 * This workaround has been in place since the original ACPI
++	 * implementation was merged, circa 2002.
++	 *
++	 * If a profile is pointing to this instruction, please first
++	 * consider moving your system to a more modern idle
++	 * mechanism.
++	 */
+ 	inl(acpi_gbl_FADT.xpm_timer_block.address);
+ }
+
author	Jan200101 <sentrycraft123@gmail.com>	2024-09-26 20:08:54 +0200
committer	Jan200101 <sentrycraft123@gmail.com>	2024-09-26 20:08:54 +0200
commit	28cd700b822627cde036087f4c613a234f0d0858 (patch)
tree	9b78517a97e9ff6447151b0a4d54977d80931c83 /SOURCES/0001-acpi-proc-idle-skip-dummy-wait.patch
parent	c44454c0aa0ae1b0999ef46e18516670536b122a (diff)
download	kernel-fsync-28cd700b822627cde036087f4c613a234f0d0858.tar.gz kernel-fsync-28cd700b822627cde036087f4c613a234f0d0858.zip