diff options
author | Jan200101 <sentrycraft123@gmail.com> | 2022-02-07 16:57:52 +0100 |
---|---|---|
committer | Jan200101 <sentrycraft123@gmail.com> | 2022-02-07 16:57:52 +0100 |
commit | 0726dba202d0ace4e348d67bb2c378cb883411c5 (patch) | |
tree | 1b1a5cfe1856a285c674b32a95f021f5fd7ab643 /SOURCES/AMD_CPPC.patch | |
parent | 4d6b39f4963c4da7d44cdcd78c457354ff1629e4 (diff) | |
download | kernel-fsync-0726dba202d0ace4e348d67bb2c378cb883411c5.tar.gz kernel-fsync-0726dba202d0ace4e348d67bb2c378cb883411c5.zip |
kernel 5.16.5
Diffstat (limited to 'SOURCES/AMD_CPPC.patch')
-rw-r--r-- | SOURCES/AMD_CPPC.patch | 2923 |
1 files changed, 1144 insertions, 1779 deletions
diff --git a/SOURCES/AMD_CPPC.patch b/SOURCES/AMD_CPPC.patch index fe4c48f..8a4b71e 100644 --- a/SOURCES/AMD_CPPC.patch +++ b/SOURCES/AMD_CPPC.patch @@ -1,18 +1,435 @@ -Add Collaborative Processor Performance Control feature flag for AMD -processors. - -This feature flag will be used on the following amd-pstate driver. The -amd-pstate driver has two approaches to implement the frequency control -behavior. That depends on the CPU hardware implementation. One is "Full -MSR Support" and another is "Shared Memory Support". The feature flag -indicates the current processors with "Full MSR Support". - -Acked-by: Borislav Petkov <bp@suse.de> -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - arch/x86/include/asm/cpufeatures.h | 1 + - 1 file changed, 1 insertion(+) - +diff --git a/Documentation/admin-guide/acpi/cppc_sysfs.rst b/Documentation/admin-guide/acpi/cppc_sysfs.rst +index fccf22114e85..e53d76365aa7 100644 +--- a/Documentation/admin-guide/acpi/cppc_sysfs.rst ++++ b/Documentation/admin-guide/acpi/cppc_sysfs.rst +@@ -4,6 +4,8 @@ + Collaborative Processor Performance Control (CPPC) + ================================================== + ++.. _cppc_sysfs: ++ + CPPC + ==== + +diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst +new file mode 100644 +index 000000000000..6bafb9354ba0 +--- /dev/null ++++ b/Documentation/admin-guide/pm/amd-pstate.rst +@@ -0,0 +1,383 @@ ++.. SPDX-License-Identifier: GPL-2.0 ++.. include:: <isonum.txt> ++ ++=============================================== ++``amd-pstate`` CPU Performance Scaling Driver ++=============================================== ++ ++:Copyright: |copy| 2021 Advanced Micro Devices, Inc. ++ ++:Author: Huang Rui <ray.huang@amd.com> ++ ++ ++Introduction ++=================== ++ ++``amd-pstate`` is the AMD CPU performance scaling driver that introduces a ++new CPU frequency control mechanism on modern AMD APU and CPU series in ++Linux kernel. The new mechanism is based on Collaborative Processor ++Performance Control (CPPC) which provides finer grain frequency management ++than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using ++the ACPI P-states driver to manage CPU frequency and clocks with switching ++only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a ++flexible, low-latency interface for the Linux kernel to directly ++communicate the performance hints to hardware. ++ ++``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``, ++``ondemand``, etc. to manage the performance hints which are provided by ++CPPC hardware functionality that internally follows the hardware ++specification (for details refer to AMD64 Architecture Programmer's Manual ++Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic ++frequency control function according to kernel governors on some of the ++Zen2 and Zen3 processors, and we will implement more AMD specific functions ++in future after we verify them on the hardware and SBIOS. ++ ++ ++AMD CPPC Overview ++======================= ++ ++Collaborative Processor Performance Control (CPPC) interface enumerates a ++continuous, abstract, and unit-less performance value in a scale that is ++not tied to a specific performance state / frequency. This is an ACPI ++standard [2]_ which software can specify application performance goals and ++hints as a relative target to the infrastructure limits. AMD processors ++provides the low latency register model (MSR) instead of AML code ++interpreter for performance adjustments. ``amd-pstate`` will initialize a ++``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks ++to manage each performance update behavior. :: ++ ++ Highest Perf ------>+-----------------------+ +-----------------------+ ++ | | | | ++ | | | | ++ | | Max Perf ---->| | ++ | | | | ++ | | | | ++ Nominal Perf ------>+-----------------------+ +-----------------------+ ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | Desired Perf ---->| | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ | | | | ++ Lowest non- | | | | ++ linear perf ------>+-----------------------+ +-----------------------+ ++ | | | | ++ | | Lowest perf ---->| | ++ | | | | ++ Lowest perf ------>+-----------------------+ +-----------------------+ ++ | | | | ++ | | | | ++ | | | | ++ 0 ------>+-----------------------+ +-----------------------+ ++ ++ AMD P-States Performance Scale ++ ++ ++.. _perf_cap: ++ ++AMD CPPC Performance Capability ++-------------------------------- ++ ++Highest Performance (RO) ++......................... ++ ++It is the absolute maximum performance an individual processor may reach, ++assuming ideal conditions. This performance level may not be sustainable ++for long durations and may only be achievable if other platform components ++are in a specific state; for example, it may require other processors be in ++an idle state. This would be equivalent to the highest frequencies ++supported by the processor. ++ ++Nominal (Guaranteed) Performance (RO) ++...................................... ++ ++It is the maximum sustained performance level of the processor, assuming ++ideal operating conditions. In absence of an external constraint (power, ++thermal, etc.) this is the performance level the processor is expected to ++be able to maintain continuously. All cores/processors are expected to be ++able to sustain their nominal performance state simultaneously. ++ ++Lowest non-linear Performance (RO) ++................................... ++ ++It is the lowest performance level at which nonlinear power savings are ++achieved, for example, due to the combined effects of voltage and frequency ++scaling. Above this threshold, lower performance levels should be generally ++more energy efficient than higher performance levels. This register ++effectively conveys the most efficient performance level to ``amd-pstate``. ++ ++Lowest Performance (RO) ++........................ ++ ++It is the absolute lowest performance level of the processor. Selecting a ++performance level lower than the lowest nonlinear performance level may ++cause an efficiency penalty but should reduce the instantaneous power ++consumption of the processor. ++ ++AMD CPPC Performance Control ++------------------------------ ++ ++``amd-pstate`` passes performance goals through these registers. The ++register drives the behavior of the desired performance target. ++ ++Minimum requested performance (RW) ++................................... ++ ++``amd-pstate`` specifies the minimum allowed performance level. ++ ++Maximum requested performance (RW) ++................................... ++ ++``amd-pstate`` specifies a limit the maximum performance that is expected ++to be supplied by the hardware. ++ ++Desired performance target (RW) ++................................... ++ ++``amd-pstate`` specifies a desired target in the CPPC performance scale as ++a relative number. This can be expressed as percentage of nominal ++performance (infrastructure max). Below the nominal sustained performance ++level, desired performance expresses the average performance level of the ++processor subject to hardware. Above the nominal performance level, ++processor must provide at least nominal performance requested and go higher ++if current operating conditions allow. ++ ++Energy Performance Preference (EPP) (RW) ++......................................... ++ ++Provides a hint to the hardware if software wants to bias toward performance ++(0x0) or energy efficiency (0xff). ++ ++ ++Key Governors Support ++======================= ++ ++``amd-pstate`` can be used with all the (generic) scaling governors listed ++by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then, ++it is responsible for the configuration of policy objects corresponding to ++CPUs and provides the ``CPUFreq`` core (and the scaling governors attached ++to the policy objects) with accurate information on the maximum and minimum ++operating frequencies supported by the hardware. Users can check the ++``scaling_cur_freq`` information comes from the ``CPUFreq`` core. ++ ++``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic ++frequency control. It is to fine tune the processor configuration on ++``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate`` ++registers adjust_perf callback to implement the CPPC similar performance ++update behavior. It is initialized by ``sugov_start`` and then populate the ++CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as ++the utilization update callback function in CPU scheduler. CPU scheduler ++will call ``cpufreq_update_util`` and assign the target performance ++according to the ``struct sugov_cpu`` that utilization update belongs to. ++Then ``amd-pstate`` updates the desired performance according to the CPU ++scheduler assigned. ++ ++ ++Processor Support ++======================= ++ ++The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is ++not existed at the detected processor, and it uses ``acpi_cpc_valid`` to ++check the _CPC existence. All Zen based processors support legacy ACPI ++hardware P-States function, so while the ``amd-pstate`` fails to be ++initialized, the kernel will fall back to initialize ``acpi-cpufreq`` ++driver. ++ ++There are two types of hardware implementations for ``amd-pstate``: one is ++`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support ++<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for ++details refer to Processor Programming Reference (PPR) for AMD Family ++19h Model 51h, Revision A1 Processors [3]_) to indicate the different ++types. ``amd-pstate`` is to register different ``static_call`` instances ++for different hardware implementations. ++ ++Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the ++future, it will be supported on more and more AMD processors. ++ ++Full MSR Support ++----------------- ++ ++Some new Zen3 processors such as Cezanne provide the MSR registers directly ++while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set. ++``amd-pstate`` can handle the MSR register to implement the fast switch ++function in ``CPUFreq`` that can shrink latency of frequency control on the ++interrupt context. The functions with ``pstate_xxx`` prefix represent the ++operations of MSR registers. ++ ++Shared Memory Support ++---------------------- ++ ++If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the ++processor supports shared memory solution. In this case, ``amd-pstate`` ++uses the ``cppc_acpi`` helper methods to implement the callback functions ++that defined on ``static_call``. The functions with ``cppc_xxx`` prefix ++represent the operations of acpi cppc helpers for shared memory solution. ++ ++ ++AMD P-States and ACPI hardware P-States always can be supported in one ++processor. But AMD P-States has the higher priority and if it is enabled ++with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond ++to the request from AMD P-States. ++ ++ ++User Space Interface in ``sysfs`` ++================================== ++ ++``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to ++control its functionality at the system level. They located in the ++``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. :: ++ ++ root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* ++ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf ++ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq ++ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq ++ ++ ++``amd_pstate_highest_perf / amd_pstate_max_freq`` ++ ++Maximum CPPC performance and CPU frequency that the driver is allowed to ++set in percent of the maximum supported CPPC performance level (the highest ++performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). ++In some of ASICs, the highest CPPC performance is not the one in the _CPC ++table, so we need to expose it to sysfs. If boost is not active but ++supported, this maximum frequency will be larger than the one in ++``cpuinfo``. ++This attribute is read-only. ++ ++``amd_pstate_lowest_nonlinear_freq`` ++ ++The lowest non-linear CPPC CPU frequency that the driver is allowed to set ++in percent of the maximum supported CPPC performance level (Please see the ++lowest non-linear performance in `AMD CPPC Performance Capability ++<perf_cap_>`_). ++This attribute is read-only. ++ ++For other performance and frequency values, we can read them back from ++``/sys/devices/system/cpu/cpuX/acpi_cppc/``, see :ref:`cppc_sysfs`. ++ ++ ++``amd-pstate`` vs ``acpi-cpufreq`` ++====================================== ++ ++On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables ++provided by the platform firmware used for CPU performance scaling, but ++only provides 3 P-states on AMD processors. ++However, on modern AMD APU and CPU series, it provides the collaborative ++processor performance control according to ACPI protocol and customize this ++for AMD platforms. That is fine-grain and continuous frequency range ++instead of the legacy hardware P-states. ``amd-pstate`` is the kernel ++module which supports the new AMD P-States mechanism on most of future AMD ++platforms. The AMD P-States mechanism will be the more performance and energy ++efficiency frequency management method on AMD processors. ++ ++Kernel Module Options for ``amd-pstate`` ++========================================= ++ ++``shared_mem`` ++Use a module param (shared_mem) to enable related processors manually with ++**amd_pstate.shared_mem=1**. ++Due to the performance issue on the processors with `Shared Memory Support ++<perf_cap_>`_, so we disable it for the moment and will enable this by default ++once we address performance issue on this solution. ++ ++The way to check whether current processor is `Full MSR Support <perf_cap_>`_ ++or `Shared Memory Support <perf_cap_>`_ : :: ++ ++ ray@hr-test1:~$ lscpu | grep cppc ++ Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm ++ ++If CPU Flags have cppc, then this processor supports `Full MSR Support ++<perf_cap_>`_. Otherwise it supports `Shared Memory Support <perf_cap_>`_. ++ ++ ++``cpupower`` tool support for ``amd-pstate`` ++=============================================== ++ ++``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency ++information. And it is in progress to support more and more operations for new ++``amd-pstate`` module with this tool. :: ++ ++ root@hr-test1:/home/ray# cpupower frequency-info ++ analyzing CPU 0: ++ driver: amd-pstate ++ CPUs which run at the same hardware frequency: 0 ++ CPUs which need to have their frequency coordinated by software: 0 ++ maximum transition latency: 131 us ++ hardware limits: 400 MHz - 4.68 GHz ++ available cpufreq governors: ondemand conservative powersave userspace performance schedutil ++ current policy: frequency should be within 400 MHz and 4.68 GHz. ++ The governor "schedutil" may decide which speed to use ++ within this range. ++ current CPU frequency: Unable to call hardware ++ current CPU frequency: 4.02 GHz (asserted by call to kernel) ++ boost state support: ++ Supported: yes ++ Active: yes ++ AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz. ++ AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz. ++ AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz. ++ AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz. ++ ++ ++Diagnostics and Tuning ++======================= ++ ++Trace Events ++-------------- ++ ++There are two static trace events that can be used for ``amd-pstate`` ++diagnostics. One of them is the cpu_frequency trace event generally used ++by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event ++specific to ``amd-pstate``. The following sequence of shell commands can ++be used to enable them and see their output (if the kernel is generally ++configured to support event tracing). :: ++ ++ root@hr-test1:/home/ray# cd /sys/kernel/tracing/ ++ root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable ++ root@hr-test1:/sys/kernel/tracing# cat trace ++ # tracer: nop ++ # ++ # entries-in-buffer/entries-written: 47827/42233061 #P:2 ++ # ++ # _-----=> irqs-off ++ # / _----=> need-resched ++ # | / _---=> hardirq/softirq ++ # || / _--=> preempt-depth ++ # ||| / delay ++ # TASK-PID CPU# |||| TIMESTAMP FUNCTION ++ # | | | |||| | | ++ <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true ++ <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true ++ cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true ++ sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true ++ <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true ++ <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true ++ <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true ++ ++The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling ++governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the ++policies with other scaling governors). ++ ++ ++Reference ++=========== ++ ++.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, ++ https://www.amd.com/system/files/TechDocs/24593.pdf ++ ++.. [2] Advanced Configuration and Power Interface Specification, ++ https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf ++ ++.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 51h, Revision A1 Processors ++ https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip ++ +diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst +index f40994c422dc..5d2757e2de65 100644 +--- a/Documentation/admin-guide/pm/working-state.rst ++++ b/Documentation/admin-guide/pm/working-state.rst +@@ -11,6 +11,7 @@ Working-State Power Management + intel_idle + cpufreq + intel_pstate ++ amd-pstate + cpufreq_drivers + intel_epb + intel-speed-select +diff --git a/MAINTAINERS b/MAINTAINERS +index fe347675fb5c..8e0666a552df 100644 +--- a/MAINTAINERS ++++ b/MAINTAINERS +@@ -975,6 +975,13 @@ S: Supported + T: git https://gitlab.freedesktop.org/agd5f/linux.git + F: drivers/gpu/drm/amd/pm/ + ++AMD PSTATE DRIVER ++M: Huang Rui <ray.huang@amd.com> ++L: linux-pm@vger.kernel.org ++S: Supported ++F: Documentation/admin-guide/pm/amd-pstate.rst ++F: drivers/cpufreq/amd-pstate* ++ + AMD PTDMA DRIVER + M: Sanjay R Mehta <sanju.mehta@amd.com> + L: dmaengine@vger.kernel.org diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d5b5f2ab87a0..18de5f76f198 100644 --- a/arch/x86/include/asm/cpufeatures.h @@ -25,20 +442,8 @@ index d5b5f2ab87a0..18de5f76f198 100644 /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ - - - -AMD CPPC (Collaborative Processor Performance Control) function uses MSR -registers to manage the performance hints. So add the MSR register macro -here. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - arch/x86/include/asm/msr-index.h | 17 +++++++++++++++++ - 1 file changed, 17 insertions(+) - diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h -index 01e2650b9585..e7945ef6a8df 100644 +index 01e2650b9585..3faf0f97edb1 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -486,6 +486,23 @@ @@ -52,56 +457,48 @@ index 01e2650b9585..e7945ef6a8df 100644 +#define MSR_AMD_CPPC_REQ 0xc00102b3 +#define MSR_AMD_CPPC_STATUS 0xc00102b4 + -+#define CAP1_LOWEST_PERF(x) (((x) >> 0) & 0xff) -+#define CAP1_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff) -+#define CAP1_NOMINAL_PERF(x) (((x) >> 16) & 0xff) -+#define CAP1_HIGHEST_PERF(x) (((x) >> 24) & 0xff) ++#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff) ++#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff) ++#define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff) ++#define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff) + -+#define REQ_MAX_PERF(x) (((x) & 0xff) << 0) -+#define REQ_MIN_PERF(x) (((x) & 0xff) << 8) -+#define REQ_DES_PERF(x) (((x) & 0xff) << 16) -+#define REQ_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24) ++#define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0) ++#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8) ++#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16) ++#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24) + /* Fam 17h MSRs */ #define MSR_F17H_IRPERF 0xc00000e9 - - - -From: Steven Noonan <steven@valvesoftware.com> - -According to the ACPI v6.2 (and later) specification, SystemIO can be -used for _CPC registers. This teaches cppc_acpi how to handle such -registers. - -This patch was tested using the amd_pstate driver on my Zephyrus G15 -(model GA503QS) using the current version 410 BIOS, which uses -a SystemIO register for the HighestPerformance element in _CPC. - -Signed-off-by: Steven Noonan <steven@valvesoftware.com> -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/acpi/cppc_acpi.c | 46 +++++++++++++++++++++++++++++++++++++--- - 1 file changed, 43 insertions(+), 3 deletions(-) - diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c -index a85c351589be..ca62c3dc9899 100644 +index a85c351589be..6c0a55a17dfc 100644 --- a/drivers/acpi/cppc_acpi.c +++ b/drivers/acpi/cppc_acpi.c -@@ -746,9 +746,24 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr) +@@ -118,6 +118,8 @@ static DEFINE_PER_CPU(struct cpc_desc *, cpc_desc_ptr); + */ + #define NUM_RETRIES 500ULL + ++#define OVER_16BTS_MASK ~0xFFFFULL ++ + #define define_one_cppc_ro(_name) \ + static struct kobj_attribute _name = \ + __ATTR(_name, 0444, show_##_name, NULL) +@@ -746,9 +748,26 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr) goto out_free; cpc_ptr->cpc_regs[i-2].sys_mem_vaddr = addr; } + } else if (gas_t->space_id == ACPI_ADR_SPACE_SYSTEM_IO) { + if (gas_t->access_width < 1 || gas_t->access_width > 3) { -+ /* 1 = 8-bit, 2 = 16-bit, and 3 = 32-bit. SystemIO doesn't -+ * implement 64-bit registers. ++ /* ++ * 1 = 8-bit, 2 = 16-bit, and 3 = 32-bit. ++ * SystemIO doesn't implement 64-bit ++ * registers. + */ + pr_debug("Invalid access width %d for SystemIO register\n", + gas_t->access_width); + goto out_free; + } -+ if (gas_t->address & ~0xFFFFULL) { ++ if (gas_t->address & OVER_16BTS_MASK) { + /* SystemIO registers use 16-bit integer addresses */ + pr_debug("Invalid IO port %llu for SystemIO register\n", + gas_t->address); @@ -114,7 +511,7 @@ index a85c351589be..ca62c3dc9899 100644 pr_debug("Unsupported register type: %d\n", gas_t->space_id); goto out_free; } -@@ -923,7 +938,20 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val) +@@ -923,7 +942,21 @@ static int cpc_read(int cpu, struct cpc_register_resource *reg_res, u64 *val) } *val = 0; @@ -124,10 +521,11 @@ index a85c351589be..ca62c3dc9899 100644 + u32 width = 8 << (reg->access_width - 1); + acpi_status status; + -+ status = acpi_os_read_port((acpi_io_address)reg->address, (u32 *)val, width); -+ -+ if (status != AE_OK) { -+ pr_debug("Error: Failed to read SystemIO port %llx\n", reg->address); ++ status = acpi_os_read_port((acpi_io_address)reg->address, ++ (u32 *)val, width); ++ if (ACPI_FAILURE(status)) { ++ pr_debug("Error: Failed to read SystemIO port %llx\n", ++ reg->address); + return -EFAULT; + } + @@ -136,7 +534,7 @@ index a85c351589be..ca62c3dc9899 100644 vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id); else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) vaddr = reg_res->sys_mem_vaddr; -@@ -962,7 +990,19 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val) +@@ -962,7 +995,20 @@ static int cpc_write(int cpu, struct cpc_register_resource *reg_res, u64 val) int pcc_ss_id = per_cpu(cpu_pcc_subspace_idx, cpu); struct cpc_reg *reg = ®_res->cpc_entry.reg; @@ -145,10 +543,11 @@ index a85c351589be..ca62c3dc9899 100644 + u32 width = 8 << (reg->access_width - 1); + acpi_status status; + -+ status = acpi_os_write_port((acpi_io_address)reg->address, (u32)val, width); -+ -+ if (status != AE_OK) { -+ pr_debug("Error: Failed to write SystemIO port %llx\n", reg->address); ++ status = acpi_os_write_port((acpi_io_address)reg->address, ++ (u32)val, width); ++ if (ACPI_FAILURE(status)) { ++ pr_debug("Error: Failed to write SystemIO port %llx\n", ++ reg->address); + return -EFAULT; + } + @@ -157,71 +556,7 @@ index a85c351589be..ca62c3dc9899 100644 vaddr = GET_PCC_VADDR(reg->address, pcc_ss_id); else if (reg->space_id == ACPI_ADR_SPACE_SYSTEM_MEMORY) vaddr = reg_res->sys_mem_vaddr; - - - -From: Mario Limonciello <mario.limonciello@amd.com> - -As this is a static check, it should be based upon what is currently -present on the system. This makes probeing more deterministic. - -While local APIC flags field (lapic_flags) of cpu core in MADT table is -0, then the cpu core won't be enabled. In this case, _CPC won't be found -in this core, and return back to _CPC invalid with walking through -possible cpus (include disable cpus). This is not expected, so switch to -check present CPUs instead. - -Reported-by: Jinzhou Su <Jinzhou.Su@amd.com> -Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/acpi/cppc_acpi.c | 2 +- - 1 file changed, 1 insertion(+), 1 deletion(-) - -diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c -index ca62c3dc9899..a46f227dc254 100644 ---- a/drivers/acpi/cppc_acpi.c -+++ b/drivers/acpi/cppc_acpi.c -@@ -411,7 +411,7 @@ bool acpi_cpc_valid(void) - struct cpc_desc *cpc_ptr; - int cpu; - -- for_each_possible_cpu(cpu) { -+ for_each_present_cpu(cpu) { - cpc_ptr = per_cpu(cpc_desc_ptr, cpu); - if (!cpc_ptr) - return false; - - - -From: Jinzhou Su <Jinzhou.Su@amd.com> - -Add a new function to enable CPPC feature. This function -will write Continuous Performance Control package -EnableRegister field on the processor. - -CPPC EnableRegister register described in section 8.4.7.1 of ACPI 6.4: -This element is optional. If supported, contains a resource descriptor -with a single Register() descriptor that describes a register to which -OSPM writes a One to enable CPPC on this processor. Before this register -is set, the processor will be controlled by legacy mechanisms (ACPI -Pstates, firmware, etc.). - -This register will be used for AMD processors to enable amd-pstate -function instead of legacy ACPI P-States. - -Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/acpi/cppc_acpi.c | 45 ++++++++++++++++++++++++++++++++++++++++ - include/acpi/cppc_acpi.h | 5 +++++ - 2 files changed, 50 insertions(+) - -diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c -index a46f227dc254..003df9fba122 100644 ---- a/drivers/acpi/cppc_acpi.c -+++ b/drivers/acpi/cppc_acpi.c -@@ -1262,6 +1262,51 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs) +@@ -1222,6 +1268,51 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs) } EXPORT_SYMBOL_GPL(cppc_get_perf_ctrs); @@ -273,150 +608,8 @@ index a46f227dc254..003df9fba122 100644 /** * cppc_set_perf - Set a CPU's performance controls. * @cpu: CPU for which to set performance controls. -diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h -index bc159a9b4a73..92b7ea8d8f5e 100644 ---- a/include/acpi/cppc_acpi.h -+++ b/include/acpi/cppc_acpi.h -@@ -138,6 +138,7 @@ extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); - extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); - extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); - extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); -+extern int cppc_set_enable(int cpu, bool enable); - extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); - extern bool acpi_cpc_valid(void); - extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data); -@@ -162,6 +163,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls) - { - return -ENOTSUPP; - } -+static inline int cppc_set_enable(int cpu, bool enable) -+{ -+ return -ENOTSUPP; -+} - static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps) - { - return -ENOTSUPP; - - - -amd-pstate is the AMD CPU performance scaling driver that introduces a -new CPU frequency control mechanism on AMD Zen based CPU series in Linux -kernel. The new mechanism is based on Collaborative processor -performance control (CPPC) which is finer grain frequency management -than legacy ACPI hardware P-States. Current AMD CPU platforms are using -the ACPI P-states driver to manage CPU frequency and clocks with -switching only in 3 P-states. AMD P-States is to replace the ACPI -P-states controls, allows a flexible, low-latency interface for the -Linux kernel to directly communicate the performance hints to hardware. - -"amd-pstate" leverages the Linux kernel governors such as *schedutil*, -*ondemand*, etc. to manage the performance hints which are provided by CPPC -hardware functionality. The first version for amd-pstate is to support one -of the Zen3 processors, and we will support more in future after we verify -the hardware and SBIOS functionalities. - -There are two types of hardware implementations for amd-pstate: one is full -MSR support and another is shared memory support. It can use -X86_FEATURE_CPPC feature flag to distinguish the different types. - -Using the new AMD P-States method + kernel governors (*schedutil*, -*ondemand*, ...) to manage the frequency update is the most appropriate -bridge between AMD Zen based hardware processor and Linux kernel, the -processor is able to adjust to the most efficiency frequency according to -the kernel scheduler loading. - -Performance Per Watt (PPW) Calculation: - -The PPW calculation is referred by below paper: -https://software.intel.com/content/dam/develop/external/us/en/documents/performance-per-what-paper.pdf - -Below formula is referred from below spec to measure the PPW: - -(F / t) / P = F * t / (t * E) = F / E, - -"F" is the number of frames per second. -"P" is power measured in watts. -"E" is energy measured in joules. - -We use the RAPL interface with "perf" tool to get the energy data of the -package power. - -The data comparisons between amd-pstate and acpi-freq module are tested on -AMD Cezanne processor: - -1) TBench CPU benchmark: - -+---------------------------------------------------------------------+ -| | -| TBench (Performance Per Watt) | -| Higher is better | -+-------------------+------------------------+------------------------+ -| | Performance Per Watt | Performance Per Watt | -| Kernel Module | (Schedutil) | (Ondemand) | -| | Unit: MB / (s * J) | Unit: MB / (s * J) | -+-------------------+------------------------+------------------------+ -| | | | -| acpi-cpufreq | 3.022 | 2.969 | -| | | | -+-------------------+------------------------+------------------------+ -| | | | -| amd-pstate | 3.131 | 3.284 | -| | | | -+-------------------+------------------------+------------------------+ - -2) Gitsource CPU benchmark: - -+---------------------------------------------------------------------+ -| | -| Gitsource (Performance Per Watt) | -| Higher is better | -+-------------------+------------------------+------------------------+ -| | Performance Per Watt | Performance Per Watt | -| Kernel Module | (Schedutil) | (Ondemand) | -| | Unit: 1 / (s * J) | Unit: 1 / (s * J) | -+-------------------+------------------------+------------------------+ -| | | | -| acpi-cpufreq | 3.42172E-07 | 2.74508E-07 | -| | | | -+-------------------+------------------------+------------------------+ -| | | | -| amd-pstate | 4.09141E-07 | 3.47610E-07 | -| | | | -+-------------------+------------------------+------------------------+ - -3) Speedometer 2.0 CPU benchmark: - -+---------------------------------------------------------------------+ -| | -| Speedometer 2.0 (Performance Per Watt) | -| Higher is better | -+-------------------+------------------------+------------------------+ -| | Performance Per Watt | Performance Per Watt | -| Kernel Module | (Schedutil) | (Ondemand) | -| | Unit: 1 / (s * J) | Unit: 1 / (s * J) | -+-------------------+------------------------+------------------------+ -| | | | -| acpi-cpufreq | 0.116111767 | 0.110321664 | -| | | | -+-------------------+------------------------+------------------------+ -| | | | -| amd-pstate | 0.115825281 | 0.122024299 | -| | | | -+-------------------+------------------------+------------------------+ - -According to above average data, we can see this solution has shown better -performance per watt scaling on mobile CPU benchmarks in most of cases. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/Kconfig.x86 | 17 ++ - drivers/cpufreq/Makefile | 1 + - drivers/cpufreq/amd-pstate.c | 398 +++++++++++++++++++++++++++++++++++ - 3 files changed, 416 insertions(+) - create mode 100644 drivers/cpufreq/amd-pstate.c - diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86 -index 92701a18bdd9..21837eb1698b 100644 +index 92701a18bdd9..a951768c3ebb 100644 --- a/drivers/cpufreq/Kconfig.x86 +++ b/drivers/cpufreq/Kconfig.x86 @@ -34,6 +34,23 @@ config X86_PCC_CPUFREQ @@ -432,8 +625,8 @@ index 92701a18bdd9..21837eb1698b 100644 + help + This driver adds a CPUFreq driver which utilizes a fine grain + processor performance frequency control range instead of legacy -+ performance levels. This driver supports the AMD processors with -+ _CPC object in the SBIOS. ++ performance levels. _CPC needs to be present in the ACPI tables ++ of the system. + + For details, take a look at: + <file:Documentation/admin-guide/pm/amd-pstate.rst>. @@ -444,23 +637,125 @@ index 92701a18bdd9..21837eb1698b 100644 tristate "ACPI Processor P-States driver" depends on ACPI_PROCESSOR diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile -index 48ee5859030c..c8d307010922 100644 +index 48ee5859030c..285de70af877 100644 --- a/drivers/cpufreq/Makefile +++ b/drivers/cpufreq/Makefile -@@ -25,6 +25,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o +@@ -17,6 +17,10 @@ obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET) += cpufreq_governor_attr_set.o + obj-$(CONFIG_CPUFREQ_DT) += cpufreq-dt.o + obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o + ++# Traces ++CFLAGS_amd-pstate-trace.o := -I$(src) ++amd_pstate-y := amd-pstate.o amd-pstate-trace.o ++ + ################################################################################## + # x86 drivers. + # Link order matters. K8 is preferred to ACPI because of firmware bugs in early +@@ -25,6 +29,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o # speedstep-* is preferred over p4-clockmod. obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o -+obj-$(CONFIG_X86_AMD_PSTATE) += amd-pstate.o ++obj-$(CONFIG_X86_AMD_PSTATE) += amd_pstate.o obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o +diff --git a/drivers/cpufreq/amd-pstate-trace.c b/drivers/cpufreq/amd-pstate-trace.c +new file mode 100644 +index 000000000000..891b696dcd69 +--- /dev/null ++++ b/drivers/cpufreq/amd-pstate-trace.c +@@ -0,0 +1,2 @@ ++#define CREATE_TRACE_POINTS ++#include "amd-pstate-trace.h" +diff --git a/drivers/cpufreq/amd-pstate-trace.h b/drivers/cpufreq/amd-pstate-trace.h +new file mode 100644 +index 000000000000..647505957d4f +--- /dev/null ++++ b/drivers/cpufreq/amd-pstate-trace.h +@@ -0,0 +1,77 @@ ++/* SPDX-License-Identifier: GPL-2.0 */ ++/* ++ * amd-pstate-trace.h - AMD Processor P-state Frequency Driver Tracer ++ * ++ * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved. ++ * ++ * Author: Huang Rui <ray.huang@amd.com> ++ */ ++ ++#if !defined(_AMD_PSTATE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) ++#define _AMD_PSTATE_TRACE_H ++ ++#include <linux/cpufreq.h> ++#include <linux/tracepoint.h> ++#include <linux/trace_events.h> ++ ++#undef TRACE_SYSTEM ++#define TRACE_SYSTEM amd_cpu ++ ++#undef TRACE_INCLUDE_FILE ++#define TRACE_INCLUDE_FILE amd-pstate-trace ++ ++#define TPS(x) tracepoint_string(x) ++ ++TRACE_EVENT(amd_pstate_perf, ++ ++ TP_PROTO(unsigned long min_perf, ++ unsigned long target_perf, ++ unsigned long capacity, ++ unsigned int cpu_id, ++ bool changed, ++ bool fast_switch ++ ), ++ ++ TP_ARGS(min_perf, ++ target_perf, ++ capacity, ++ cpu_id, ++ changed, ++ fast_switch ++ ), ++ ++ TP_STRUCT__entry( ++ __field(unsigned long, min_perf) ++ __field(unsigned long, target_perf) ++ __field(unsigned long, capacity) ++ __field(unsigned int, cpu_id) ++ __field(bool, changed) ++ __field(bool, fast_switch) ++ ), ++ ++ TP_fast_assign( ++ __entry->min_perf = min_perf; ++ __entry->target_perf = target_perf; ++ __entry->capacity = capacity; ++ __entry->cpu_id = cpu_id; ++ __entry->changed = changed; ++ __entry->fast_switch = fast_switch; ++ ), ++ ++ TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s", ++ (unsigned long)__entry->min_perf, ++ (unsigned long)__entry->target_perf, ++ (unsigned long)__entry->capacity, ++ (unsigned int)__entry->cpu_id, ++ (__entry->changed) ? "true" : "false", ++ (__entry->fast_switch) ? "true" : "false" ++ ) ++); ++ ++#endif /* _AMD_PSTATE_TRACE_H */ ++ ++/* This part must be outside protection */ ++#undef TRACE_INCLUDE_PATH ++#define TRACE_INCLUDE_PATH . ++ ++#include <trace/define_trace.h> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c new file mode 100644 -index 000000000000..8b501a72c3dd +index 000000000000..40ceb031abf5 --- /dev/null +++ b/drivers/cpufreq/amd-pstate.c -@@ -0,0 +1,398 @@ +@@ -0,0 +1,643 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * amd-pstate.c - AMD Processor P-state Frequency Driver @@ -468,6 +763,19 @@ index 000000000000..8b501a72c3dd + * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved. + * + * Author: Huang Rui <ray.huang@amd.com> ++ * ++ * AMD P-State introduces a new CPU performance scaling design for AMD ++ * processors using the ACPI Collaborative Performance and Power Control (CPPC) ++ * feature which works with the AMD SMU firmware providing a finer grained ++ * frequency control range. It is to replace the legacy ACPI P-States control, ++ * allows a flexible, low-latency interface for the Linux kernel to directly ++ * communicate the performance hints to hardware. ++ * ++ * AMD P-State is supported on recent AMD Zen base CPU series include some of ++ * Zen2 and Zen3 processors. _CPC needs to be present in the ACPI tables of AMD ++ * P-State supported system. And there are two types of hardware implementations ++ * for AMD P-State: 1) Full MSR Solution and 2) Shared Memory Solution. ++ * X86_FEATURE_CPPC CPU feature flag is used to distinguish the different types. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -494,17 +802,50 @@ index 000000000000..8b501a72c3dd +#include <asm/processor.h> +#include <asm/cpufeature.h> +#include <asm/cpu_device_id.h> ++#include "amd-pstate-trace.h" + +#define AMD_PSTATE_TRANSITION_LATENCY 0x20000 +#define AMD_PSTATE_TRANSITION_DELAY 500 + ++/* ++ * TODO: We need more time to fine tune processors with shared memory solution ++ * with community together. ++ * ++ * There are some performance drops on the CPU benchmarks which reports from ++ * Suse. We are co-working with them to fine tune the shared memory solution. So ++ * we disable it by default to go acpi-cpufreq on these processors and add a ++ * module parameter to be able to enable it manually for debugging. ++ */ ++static bool shared_mem = false; ++module_param(shared_mem, bool, 0444); ++MODULE_PARM_DESC(shared_mem, ++ "enable amd-pstate on processors with shared memory solution (false = disabled (default), true = enabled)"); ++ +static struct cpufreq_driver amd_pstate_driver; + ++/** ++ * struct amd_cpudata - private CPU data for AMD P-State ++ * @cpu: CPU number ++ * @cppc_req_cached: cached performance request hints ++ * @highest_perf: the maximum performance an individual processor may reach, ++ * assuming ideal conditions ++ * @nominal_perf: the maximum sustained performance level of the processor, ++ * assuming ideal operating conditions ++ * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power ++ * savings are achieved ++ * @lowest_perf: the absolute lowest performance level of the processor ++ * @max_freq: the frequency that mapped to highest_perf ++ * @min_freq: the frequency that mapped to lowest_perf ++ * @nominal_freq: the frequency that mapped to nominal_perf ++ * @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf ++ * ++ * The amd_cpudata is key private data for each CPU thread in AMD P-State, and ++ * represents all the attributes and goals that AMD P-State requests at runtime. ++ */ +struct amd_cpudata { + int cpu; + -+ struct freq_qos_request req[2]; -+ ++ struct freq_qos_request req[2]; + u64 cppc_req_cached; + + u32 highest_perf; @@ -516,11 +857,26 @@ index 000000000000..8b501a72c3dd + u32 min_freq; + u32 nominal_freq; + u32 lowest_nonlinear_freq; ++ ++ bool boost_supported; +}; + +static inline int pstate_enable(bool enable) +{ -+ return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0); ++ return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable); ++} ++ ++static int cppc_enable(bool enable) ++{ ++ int cpu, ret = 0; ++ ++ for_each_present_cpu(cpu) { ++ ret = cppc_set_enable(cpu, enable); ++ if (ret) ++ return ret; ++ } ++ ++ return ret; +} + +DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable); @@ -546,9 +902,27 @@ index 000000000000..8b501a72c3dd + */ + WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf()); + -+ WRITE_ONCE(cpudata->nominal_perf, CAP1_NOMINAL_PERF(cap1)); -+ WRITE_ONCE(cpudata->lowest_nonlinear_perf, CAP1_LOWNONLIN_PERF(cap1)); -+ WRITE_ONCE(cpudata->lowest_perf, CAP1_LOWEST_PERF(cap1)); ++ WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1)); ++ WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1)); ++ WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1)); ++ ++ return 0; ++} ++ ++static int cppc_init_perf(struct amd_cpudata *cpudata) ++{ ++ struct cppc_perf_caps cppc_perf; ++ ++ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); ++ if (ret) ++ return ret; ++ ++ WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf()); ++ ++ WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); ++ WRITE_ONCE(cpudata->lowest_nonlinear_perf, ++ cppc_perf.lowest_nonlinear_perf); ++ WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); + + return 0; +} @@ -570,6 +944,19 @@ index 000000000000..8b501a72c3dd + READ_ONCE(cpudata->cppc_req_cached)); +} + ++static void cppc_update_perf(struct amd_cpudata *cpudata, ++ u32 min_perf, u32 des_perf, ++ u32 max_perf, bool fast_switch) ++{ ++ struct cppc_perf_ctrls perf_ctrls; ++ ++ perf_ctrls.max_perf = max_perf; ++ perf_ctrls.min_perf = min_perf; ++ perf_ctrls.desired_perf = des_perf; ++ ++ cppc_set_perf(cpudata->cpu, &perf_ctrls); ++} ++ +DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf); + +static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata, @@ -586,14 +973,17 @@ index 000000000000..8b501a72c3dd + u64 prev = READ_ONCE(cpudata->cppc_req_cached); + u64 value = prev; + -+ value &= ~REQ_MIN_PERF(~0L); -+ value |= REQ_MIN_PERF(min_perf); ++ value &= ~AMD_CPPC_MIN_PERF(~0L); ++ value |= AMD_CPPC_MIN_PERF(min_perf); ++ ++ value &= ~AMD_CPPC_DES_PERF(~0L); ++ value |= AMD_CPPC_DES_PERF(des_perf); + -+ value &= ~REQ_DES_PERF(~0L); -+ value |= REQ_DES_PERF(des_perf); ++ value &= ~AMD_CPPC_MAX_PERF(~0L); ++ value |= AMD_CPPC_MAX_PERF(max_perf); + -+ value &= ~REQ_MAX_PERF(~0L); -+ value |= REQ_MAX_PERF(max_perf); ++ trace_amd_pstate_perf(min_perf, des_perf, max_perf, ++ cpudata->cpu, (value != prev), fast_switch); + + if (value == prev) + return; @@ -640,6 +1030,39 @@ index 000000000000..8b501a72c3dd + return 0; +} + ++static void amd_pstate_adjust_perf(unsigned int cpu, ++ unsigned long _min_perf, ++ unsigned long target_perf, ++ unsigned long capacity) ++{ ++ unsigned long max_perf, min_perf, des_perf, ++ cap_perf, lowest_nonlinear_perf; ++ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); ++ struct amd_cpudata *cpudata = policy->driver_data; ++ ++ cap_perf = READ_ONCE(cpudata->highest_perf); ++ lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); ++ ++ des_perf = cap_perf; ++ if (target_perf < capacity) ++ des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity); ++ ++ min_perf = READ_ONCE(cpudata->highest_perf); ++ if (_min_perf < capacity) ++ min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity); ++ ++ if (min_perf < lowest_nonlinear_perf) ++ min_perf = lowest_nonlinear_perf; ++ ++ max_perf = cap_perf; ++ if (max_perf < min_perf) ++ max_perf = min_perf; ++ ++ des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf); ++ ++ amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true); ++} ++ +static int amd_get_min_freq(struct amd_cpudata *cpudata) +{ + struct cppc_perf_caps cppc_perf; @@ -712,6 +1135,45 @@ index 000000000000..8b501a72c3dd + return lowest_nonlinear_freq * 1000; +} + ++static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state) ++{ ++ struct amd_cpudata *cpudata = policy->driver_data; ++ int ret; ++ ++ if (!cpudata->boost_supported) { ++ pr_err("Boost mode is not supported by this processor or SBIOS\n"); ++ return -EINVAL; ++ } ++ ++ if (state) ++ policy->cpuinfo.max_freq = cpudata->max_freq; ++ else ++ policy->cpuinfo.max_freq = cpudata->nominal_freq; ++ ++ policy->max = policy->cpuinfo.max_freq; ++ ++ ret = freq_qos_update_request(&cpudata->req[1], ++ policy->cpuinfo.max_freq); ++ if (ret < 0) ++ return ret; ++ ++ return 0; ++} ++ ++static void amd_pstate_boost_init(struct amd_cpudata *cpudata) ++{ ++ u32 highest_perf, nominal_perf; ++ ++ highest_perf = READ_ONCE(cpudata->highest_perf); ++ nominal_perf = READ_ONCE(cpudata->nominal_perf); ++ ++ if (highest_perf <= nominal_perf) ++ return; ++ ++ cpudata->boost_supported = true; ++ amd_pstate_driver.boost_enabled = true; ++} ++ +static int amd_pstate_cpu_init(struct cpufreq_policy *policy) +{ + int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -756,6 +1218,9 @@ index 000000000000..8b501a72c3dd + /* It will be updated by governor */ + policy->cur = policy->cpuinfo.min_freq; + ++ if (boot_cpu_has(X86_FEATURE_CPPC)) ++ policy->fast_switch_possible = true; ++ + ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0], + FREQ_QOS_MIN, policy->cpuinfo.min_freq); + if (ret < 0) { @@ -778,6 +1243,8 @@ index 000000000000..8b501a72c3dd + + policy->driver_data = cpudata; + ++ amd_pstate_boost_init(cpudata); ++ + return 0; + +free_cpudata2: @@ -800,541 +1267,10 @@ index 000000000000..8b501a72c3dd + return 0; +} + -+static struct cpufreq_driver amd_pstate_driver = { -+ .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, -+ .verify = amd_pstate_verify, -+ .target = amd_pstate_target, -+ .init = amd_pstate_cpu_init, -+ .exit = amd_pstate_cpu_exit, -+ .name = "amd-pstate", -+}; -+ -+static int __init amd_pstate_init(void) -+{ -+ int ret; -+ -+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) -+ return -ENODEV; -+ -+ if (!acpi_cpc_valid()) { -+ pr_debug("the _CPC object is not present in SBIOS\n"); -+ return -ENODEV; -+ } -+ -+ /* don't keep reloading if cpufreq_driver exists */ -+ if (cpufreq_get_current_driver()) -+ return -EEXIST; -+ -+ /* capability check */ -+ if (!boot_cpu_has(X86_FEATURE_CPPC)) { -+ pr_debug("AMD CPPC MSR based functionality is not supported\n"); -+ return -ENODEV; -+ } -+ -+ /* enable amd pstate feature */ -+ ret = amd_pstate_enable(true); -+ if (ret) { -+ pr_err("failed to enable amd-pstate with return %d\n", ret); -+ return ret; -+ } -+ -+ ret = cpufreq_register_driver(&amd_pstate_driver); -+ if (ret) -+ pr_err("failed to register amd_pstate_driver with return %d\n", -+ ret); -+ -+ return ret; -+} -+ -+static void __exit amd_pstate_exit(void) -+{ -+ cpufreq_unregister_driver(&amd_pstate_driver); -+ -+ amd_pstate_enable(false); -+} -+ -+module_init(amd_pstate_init); -+module_exit(amd_pstate_exit); -+ -+MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>"); -+MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); -+MODULE_LICENSE("GPL"); - - - -Introduce the fast switch function for amd-pstate on the AMD processors -which support the full MSR register control. It's able to decrease the -latency on interrupt context. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/amd-pstate.c | 35 +++++++++++++++++++++++++++++++++++ - 1 file changed, 35 insertions(+) - -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 8b501a72c3dd..4a02a42f4113 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -177,6 +177,38 @@ static int amd_pstate_target(struct cpufreq_policy *policy, - return 0; - } - -+static void amd_pstate_adjust_perf(unsigned int cpu, -+ unsigned long _min_perf, -+ unsigned long target_perf, -+ unsigned long capacity) -+{ -+ unsigned long max_perf, min_perf, des_perf, -+ cap_perf, lowest_nonlinear_perf; -+ struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); -+ struct amd_cpudata *cpudata = policy->driver_data; -+ -+ cap_perf = READ_ONCE(cpudata->highest_perf); -+ lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); -+ -+ if (target_perf < capacity) -+ des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity); -+ -+ min_perf = READ_ONCE(cpudata->highest_perf); -+ if (_min_perf < capacity) -+ min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity); -+ -+ if (min_perf < lowest_nonlinear_perf) -+ min_perf = lowest_nonlinear_perf; -+ -+ max_perf = cap_perf; -+ if (max_perf < min_perf) -+ max_perf = min_perf; -+ -+ des_perf = clamp_t(unsigned long, des_perf, min_perf, max_perf); -+ -+ amd_pstate_update(cpudata, min_perf, des_perf, max_perf, true); -+} -+ - static int amd_get_min_freq(struct amd_cpudata *cpudata) - { - struct cppc_perf_caps cppc_perf; -@@ -293,6 +325,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) - /* It will be updated by governor */ - policy->cur = policy->cpuinfo.min_freq; - -+ policy->fast_switch_possible = true; -+ - ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0], - FREQ_QOS_MIN, policy->cpuinfo.min_freq); - if (ret < 0) { -@@ -341,6 +375,7 @@ static struct cpufreq_driver amd_pstate_driver = { - .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, - .verify = amd_pstate_verify, - .target = amd_pstate_target, -+ .adjust_perf = amd_pstate_adjust_perf, - .init = amd_pstate_cpu_init, - .exit = amd_pstate_cpu_exit, - .name = "amd-pstate", - - - -In some of Zen2 and Zen3 based processors, they are using the shared -memory that exposed from ACPI SBIOS. In this kind of the processors, -there is no MSR support, so we add acpi cppc function as the backend for -them. - -It is using a module param (shared_mem) to enable related processors -manually. We will enable this by default once we address performance -issue on this solution. - -Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com> -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/amd-pstate.c | 71 ++++++++++++++++++++++++++++++++++-- - 1 file changed, 67 insertions(+), 4 deletions(-) - -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 4a02a42f4113..14a29326ceae 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -35,6 +35,19 @@ - #define AMD_PSTATE_TRANSITION_LATENCY 0x20000 - #define AMD_PSTATE_TRANSITION_DELAY 500 - -+/* TODO: We need more time to fine tune processors with shared memory solution -+ * with community together. -+ * -+ * There are some performance drops on the CPU benchmarks which reports from -+ * Suse. We are co-working with them to fine tune the shared memory solution. So -+ * we disable it by default to go acpi-cpufreq on these processors and add a -+ * module parameter to be able to enable it manually for debugging. -+ */ -+static bool shared_mem = false; -+module_param(shared_mem, bool, 0444); -+MODULE_PARM_DESC(shared_mem, -+ "enable amd-pstate on processors with shared memory solution (false = disabled (default), true = enabled)"); -+ - static struct cpufreq_driver amd_pstate_driver; - - struct amd_cpudata { -@@ -60,6 +73,19 @@ static inline int pstate_enable(bool enable) - return wrmsrl_safe(MSR_AMD_CPPC_ENABLE, enable ? 1 : 0); - } - -+static int cppc_enable(bool enable) -+{ -+ int cpu, ret = 0; -+ -+ for_each_online_cpu(cpu) { -+ ret = cppc_set_enable(cpu, enable ? 1 : 0); -+ if (ret) -+ return ret; -+ } -+ -+ return ret; -+} -+ - DEFINE_STATIC_CALL(amd_pstate_enable, pstate_enable); - - static inline int amd_pstate_enable(bool enable) -@@ -90,6 +116,24 @@ static int pstate_init_perf(struct amd_cpudata *cpudata) - return 0; - } - -+static int cppc_init_perf(struct amd_cpudata *cpudata) -+{ -+ struct cppc_perf_caps cppc_perf; -+ -+ int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); -+ if (ret) -+ return ret; -+ -+ WRITE_ONCE(cpudata->highest_perf, amd_get_highest_perf()); -+ -+ WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf); -+ WRITE_ONCE(cpudata->lowest_nonlinear_perf, -+ cppc_perf.lowest_nonlinear_perf); -+ WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf); -+ -+ return 0; -+} -+ - DEFINE_STATIC_CALL(amd_pstate_init_perf, pstate_init_perf); - - static inline int amd_pstate_init_perf(struct amd_cpudata *cpudata) -@@ -107,6 +151,19 @@ static void pstate_update_perf(struct amd_cpudata *cpudata, u32 min_perf, - READ_ONCE(cpudata->cppc_req_cached)); - } - -+static void cppc_update_perf(struct amd_cpudata *cpudata, -+ u32 min_perf, u32 des_perf, -+ u32 max_perf, bool fast_switch) -+{ -+ struct cppc_perf_ctrls perf_ctrls; -+ -+ perf_ctrls.max_perf = max_perf; -+ perf_ctrls.min_perf = min_perf; -+ perf_ctrls.desired_perf = des_perf; -+ -+ cppc_set_perf(cpudata->cpu, &perf_ctrls); -+} -+ - DEFINE_STATIC_CALL(amd_pstate_update_perf, pstate_update_perf); - - static inline void amd_pstate_update_perf(struct amd_cpudata *cpudata, -@@ -325,7 +382,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) - /* It will be updated by governor */ - policy->cur = policy->cpuinfo.min_freq; - -- policy->fast_switch_possible = true; -+ if (boot_cpu_has(X86_FEATURE_CPPC)) -+ policy->fast_switch_possible = true; - - ret = freq_qos_add_request(&policy->constraints, &cpudata->req[0], - FREQ_QOS_MIN, policy->cpuinfo.min_freq); -@@ -375,7 +433,6 @@ static struct cpufreq_driver amd_pstate_driver = { - .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, - .verify = amd_pstate_verify, - .target = amd_pstate_target, -- .adjust_perf = amd_pstate_adjust_perf, - .init = amd_pstate_cpu_init, - .exit = amd_pstate_cpu_exit, - .name = "amd-pstate", -@@ -398,8 +455,14 @@ static int __init amd_pstate_init(void) - return -EEXIST; - - /* capability check */ -- if (!boot_cpu_has(X86_FEATURE_CPPC)) { -- pr_debug("AMD CPPC MSR based functionality is not supported\n"); -+ if (boot_cpu_has(X86_FEATURE_CPPC)) { -+ pr_debug("AMD CPPC MSR based functionality is supported\n"); -+ amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf; -+ } else if (shared_mem) { -+ static_call_update(amd_pstate_enable, cppc_enable); -+ static_call_update(amd_pstate_init_perf, cppc_init_perf); -+ static_call_update(amd_pstate_update_perf, cppc_update_perf); -+ } else { - return -ENODEV; - } - - - - -Add trace event to monitor the performance value changes which is -controlled by cpu governors. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/Makefile | 6 ++- - drivers/cpufreq/amd-pstate-trace.c | 2 + - drivers/cpufreq/amd-pstate-trace.h | 77 ++++++++++++++++++++++++++++++ - drivers/cpufreq/amd-pstate.c | 4 ++ - 4 files changed, 88 insertions(+), 1 deletion(-) - create mode 100644 drivers/cpufreq/amd-pstate-trace.c - create mode 100644 drivers/cpufreq/amd-pstate-trace.h - -diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile -index c8d307010922..285de70af877 100644 ---- a/drivers/cpufreq/Makefile -+++ b/drivers/cpufreq/Makefile -@@ -17,6 +17,10 @@ obj-$(CONFIG_CPU_FREQ_GOV_ATTR_SET) += cpufreq_governor_attr_set.o - obj-$(CONFIG_CPUFREQ_DT) += cpufreq-dt.o - obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o - -+# Traces -+CFLAGS_amd-pstate-trace.o := -I$(src) -+amd_pstate-y := amd-pstate.o amd-pstate-trace.o -+ - ################################################################################## - # x86 drivers. - # Link order matters. K8 is preferred to ACPI because of firmware bugs in early -@@ -25,7 +29,7 @@ obj-$(CONFIG_CPUFREQ_DT_PLATDEV) += cpufreq-dt-platdev.o - # speedstep-* is preferred over p4-clockmod. - - obj-$(CONFIG_X86_ACPI_CPUFREQ) += acpi-cpufreq.o --obj-$(CONFIG_X86_AMD_PSTATE) += amd-pstate.o -+obj-$(CONFIG_X86_AMD_PSTATE) += amd_pstate.o - obj-$(CONFIG_X86_POWERNOW_K8) += powernow-k8.o - obj-$(CONFIG_X86_PCC_CPUFREQ) += pcc-cpufreq.o - obj-$(CONFIG_X86_POWERNOW_K6) += powernow-k6.o -diff --git a/drivers/cpufreq/amd-pstate-trace.c b/drivers/cpufreq/amd-pstate-trace.c -new file mode 100644 -index 000000000000..891b696dcd69 ---- /dev/null -+++ b/drivers/cpufreq/amd-pstate-trace.c -@@ -0,0 +1,2 @@ -+#define CREATE_TRACE_POINTS -+#include "amd-pstate-trace.h" -diff --git a/drivers/cpufreq/amd-pstate-trace.h b/drivers/cpufreq/amd-pstate-trace.h -new file mode 100644 -index 000000000000..647505957d4f ---- /dev/null -+++ b/drivers/cpufreq/amd-pstate-trace.h -@@ -0,0 +1,77 @@ -+/* SPDX-License-Identifier: GPL-2.0 */ -+/* -+ * amd-pstate-trace.h - AMD Processor P-state Frequency Driver Tracer -+ * -+ * Copyright (C) 2021 Advanced Micro Devices, Inc. All Rights Reserved. -+ * -+ * Author: Huang Rui <ray.huang@amd.com> -+ */ -+ -+#if !defined(_AMD_PSTATE_TRACE_H) || defined(TRACE_HEADER_MULTI_READ) -+#define _AMD_PSTATE_TRACE_H -+ -+#include <linux/cpufreq.h> -+#include <linux/tracepoint.h> -+#include <linux/trace_events.h> -+ -+#undef TRACE_SYSTEM -+#define TRACE_SYSTEM amd_cpu -+ -+#undef TRACE_INCLUDE_FILE -+#define TRACE_INCLUDE_FILE amd-pstate-trace -+ -+#define TPS(x) tracepoint_string(x) -+ -+TRACE_EVENT(amd_pstate_perf, -+ -+ TP_PROTO(unsigned long min_perf, -+ unsigned long target_perf, -+ unsigned long capacity, -+ unsigned int cpu_id, -+ bool changed, -+ bool fast_switch -+ ), -+ -+ TP_ARGS(min_perf, -+ target_perf, -+ capacity, -+ cpu_id, -+ changed, -+ fast_switch -+ ), -+ -+ TP_STRUCT__entry( -+ __field(unsigned long, min_perf) -+ __field(unsigned long, target_perf) -+ __field(unsigned long, capacity) -+ __field(unsigned int, cpu_id) -+ __field(bool, changed) -+ __field(bool, fast_switch) -+ ), -+ -+ TP_fast_assign( -+ __entry->min_perf = min_perf; -+ __entry->target_perf = target_perf; -+ __entry->capacity = capacity; -+ __entry->cpu_id = cpu_id; -+ __entry->changed = changed; -+ __entry->fast_switch = fast_switch; -+ ), -+ -+ TP_printk("amd_min_perf=%lu amd_des_perf=%lu amd_max_perf=%lu cpu_id=%u changed=%s fast_switch=%s", -+ (unsigned long)__entry->min_perf, -+ (unsigned long)__entry->target_perf, -+ (unsigned long)__entry->capacity, -+ (unsigned int)__entry->cpu_id, -+ (__entry->changed) ? "true" : "false", -+ (__entry->fast_switch) ? "true" : "false" -+ ) -+); -+ -+#endif /* _AMD_PSTATE_TRACE_H */ -+ -+/* This part must be outside protection */ -+#undef TRACE_INCLUDE_PATH -+#define TRACE_INCLUDE_PATH . -+ -+#include <trace/define_trace.h> -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 14a29326ceae..5e080d0dc45f 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -31,6 +31,7 @@ - #include <asm/processor.h> - #include <asm/cpufeature.h> - #include <asm/cpu_device_id.h> -+#include "amd-pstate-trace.h" - - #define AMD_PSTATE_TRANSITION_LATENCY 0x20000 - #define AMD_PSTATE_TRANSITION_DELAY 500 -@@ -189,6 +190,9 @@ static void amd_pstate_update(struct amd_cpudata *cpudata, u32 min_perf, - value &= ~REQ_MAX_PERF(~0L); - value |= REQ_MAX_PERF(max_perf); - -+ trace_amd_pstate_perf(min_perf, des_perf, max_perf, -+ cpudata->cpu, (value != prev), fast_switch); -+ - if (value == prev) - return; - - - - -If the sbios supports the boost mode of amd-pstate, let's switch to -boost enabled by default. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/amd-pstate.c | 44 ++++++++++++++++++++++++++++++++++++ - 1 file changed, 44 insertions(+) - -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 5e080d0dc45f..0c335a917307 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -67,6 +67,8 @@ struct amd_cpudata { - u32 min_freq; - u32 nominal_freq; - u32 lowest_nonlinear_freq; -+ -+ bool boost_supported; - }; - - static inline int pstate_enable(bool enable) -@@ -342,6 +344,45 @@ static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata) - return lowest_nonlinear_freq * 1000; - } - -+static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state) -+{ -+ struct amd_cpudata *cpudata = policy->driver_data; -+ int ret; -+ -+ if (!cpudata->boost_supported) { -+ pr_err("Boost mode is not supported by this processor or SBIOS\n"); -+ return -EINVAL; -+ } -+ -+ if (state) -+ policy->cpuinfo.max_freq = cpudata->max_freq; -+ else -+ policy->cpuinfo.max_freq = cpudata->nominal_freq; -+ -+ policy->max = policy->cpuinfo.max_freq; -+ -+ ret = freq_qos_update_request(&cpudata->req[1], -+ policy->cpuinfo.max_freq); -+ if (ret < 0) -+ return ret; -+ -+ return 0; -+} -+ -+static void amd_pstate_boost_init(struct amd_cpudata *cpudata) -+{ -+ u32 highest_perf, nominal_perf; -+ -+ highest_perf = READ_ONCE(cpudata->highest_perf); -+ nominal_perf = READ_ONCE(cpudata->nominal_perf); -+ -+ if (highest_perf <= nominal_perf) -+ return; -+ -+ cpudata->boost_supported = true; -+ amd_pstate_driver.boost_enabled = true; -+} -+ - static int amd_pstate_cpu_init(struct cpufreq_policy *policy) - { - int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; -@@ -411,6 +452,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) - - policy->driver_data = cpudata; - -+ amd_pstate_boost_init(cpudata); -+ - return 0; - - free_cpudata2: -@@ -439,6 +482,7 @@ static struct cpufreq_driver amd_pstate_driver = { - .target = amd_pstate_target, - .init = amd_pstate_cpu_init, - .exit = amd_pstate_cpu_exit, -+ .set_boost = amd_pstate_set_boost, - .name = "amd-pstate", - }; - - - - -Introduce sysfs attributes to get the different level processor -frequencies. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/amd-pstate.c | 46 ++++++++++++++++++++++++++++++++++++ - 1 file changed, 46 insertions(+) - -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 0c335a917307..09c5fd8bd9da 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -476,6 +476,51 @@ static int amd_pstate_cpu_exit(struct cpufreq_policy *policy) - return 0; - } - +/* Sysfs attributes */ + -+/* This frequency is to indicate the maximum hardware frequency. ++/* ++ * This frequency is to indicate the maximum hardware frequency. + * If boost is not active but supported, the frequency will be larger than the + * one in cpuinfo. + */ @@ -1368,46 +1304,8 @@ index 0c335a917307..09c5fd8bd9da 100644 + return sprintf(&buf[0], "%u\n", freq); +} + -+cpufreq_freq_attr_ro(amd_pstate_max_freq); -+cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); -+ -+static struct freq_attr *amd_pstate_attr[] = { -+ &amd_pstate_max_freq, -+ &amd_pstate_lowest_nonlinear_freq, -+ NULL, -+}; -+ - static struct cpufreq_driver amd_pstate_driver = { - .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, - .verify = amd_pstate_verify, -@@ -484,6 +529,7 @@ static struct cpufreq_driver amd_pstate_driver = { - .exit = amd_pstate_cpu_exit, - .set_boost = amd_pstate_set_boost, - .name = "amd-pstate", -+ .attr = amd_pstate_attr, - }; - - static int __init amd_pstate_init(void) - - - -Introduce sysfs attributes to get the different level amd-pstate -performances. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - drivers/cpufreq/amd-pstate.c | 17 +++++++++++++++++ - 1 file changed, 17 insertions(+) - -diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c -index 09c5fd8bd9da..458313cdba93 100644 ---- a/drivers/cpufreq/amd-pstate.c -+++ b/drivers/cpufreq/amd-pstate.c -@@ -512,12 +512,29 @@ static ssize_t show_amd_pstate_lowest_nonlinear_freq(struct cpufreq_policy *poli - return sprintf(&buf[0], "%u\n", freq); - } - -+/* In some of ASICs, the highest_perf is not the one in the _CPC table, so we ++/* ++ * In some of ASICs, the highest_perf is not the one in the _CPC table, so we + * need to expose it to sysfs. + */ +static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy, @@ -1421,270 +1319,109 @@ index 09c5fd8bd9da..458313cdba93 100644 + return sprintf(&buf[0], "%u\n", perf); +} + - cpufreq_freq_attr_ro(amd_pstate_max_freq); - cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); - ++cpufreq_freq_attr_ro(amd_pstate_max_freq); ++cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq); ++ +cpufreq_freq_attr_ro(amd_pstate_highest_perf); + - static struct freq_attr *amd_pstate_attr[] = { - &amd_pstate_max_freq, - &amd_pstate_lowest_nonlinear_freq, ++static struct freq_attr *amd_pstate_attr[] = { ++ &amd_pstate_max_freq, ++ &amd_pstate_lowest_nonlinear_freq, + &amd_pstate_highest_perf, - NULL, - }; - - - - -Add AMD P-state capability flag in cpupower to indicate AMD new P-state -kernel module support on Ryzen processors. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/helpers/helpers.h | 1 + - 1 file changed, 1 insertion(+) - -diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h -index 33ffacee7fcb..b4813efdfb00 100644 ---- a/tools/power/cpupower/utils/helpers/helpers.h -+++ b/tools/power/cpupower/utils/helpers/helpers.h -@@ -73,6 +73,7 @@ enum cpupower_cpu_vendor {X86_VENDOR_UNKNOWN = 0, X86_VENDOR_INTEL, - #define CPUPOWER_CAP_AMD_HW_PSTATE 0x00000100 - #define CPUPOWER_CAP_AMD_PSTATEDEF 0x00000200 - #define CPUPOWER_CAP_AMD_CPB_MSR 0x00000400 -+#define CPUPOWER_CAP_AMD_PSTATE 0x00000800 - - #define CPUPOWER_AMD_CPBDIS 0x02000000 - - - - -The processor with amd-pstate function also supports legacy ACPI -hardware P-States feature as well. Once driver sets amd-pstate eanbled, -the processor will respond the finer grain amd-pstate feature instead of -legacy ACPI P-States. So it introduces the cpupower_amd_pstate_enabled() -to check whether the current kernel enables amd-pstate or acpi-cpufreq -module. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/helpers/helpers.h | 10 ++++++++++ - tools/power/cpupower/utils/helpers/misc.c | 18 ++++++++++++++++++ - 2 files changed, 28 insertions(+) - -diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h -index b4813efdfb00..e03cc97297aa 100644 ---- a/tools/power/cpupower/utils/helpers/helpers.h -+++ b/tools/power/cpupower/utils/helpers/helpers.h -@@ -11,6 +11,7 @@ - - #include <libintl.h> - #include <locale.h> -+#include <stdbool.h> - - #include "helpers/bitmask.h" - #include <cpupower.h> -@@ -136,6 +137,12 @@ extern int decode_pstates(unsigned int cpu, int boost_states, - - extern int cpufreq_has_boost_support(unsigned int cpu, int *support, - int *active, int * states); ++ NULL, ++}; + -+/* AMD P-States stuff **************************/ -+extern bool cpupower_amd_pstate_enabled(void); ++static struct cpufreq_driver amd_pstate_driver = { ++ .flags = CPUFREQ_CONST_LOOPS | CPUFREQ_NEED_UPDATE_LIMITS, ++ .verify = amd_pstate_verify, ++ .target = amd_pstate_target, ++ .init = amd_pstate_cpu_init, ++ .exit = amd_pstate_cpu_exit, ++ .set_boost = amd_pstate_set_boost, ++ .name = "amd-pstate", ++ .attr = amd_pstate_attr, ++}; + -+/* AMD P-States stuff **************************/ ++static int __init amd_pstate_init(void) ++{ ++ int ret; + - /* - * CPUID functions returning a single datum - */ -@@ -168,6 +175,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support, - int *active, int * states) - { return -1; } - -+static inline bool cpupower_amd_pstate_enabled(void) -+{ return false; } ++ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD) ++ return -ENODEV; + - /* cpuid and cpuinfo helpers **************************/ - - static inline unsigned int cpuid_eax(unsigned int op) { return 0; }; -diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c -index fc6e34511721..0c483cdefcc2 100644 ---- a/tools/power/cpupower/utils/helpers/misc.c -+++ b/tools/power/cpupower/utils/helpers/misc.c -@@ -3,9 +3,11 @@ - #include <stdio.h> - #include <errno.h> - #include <stdlib.h> -+#include <string.h> - - #include "helpers/helpers.h" - #include "helpers/sysfs.h" -+#include "cpufreq.h" - - #if defined(__i386__) || defined(__x86_64__) - -@@ -83,6 +85,22 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val) - return 0; - } - -+bool cpupower_amd_pstate_enabled(void) -+{ -+ char *driver = cpufreq_get_driver(0); -+ bool ret = false; ++ if (!acpi_cpc_valid()) { ++ pr_debug("the _CPC object is not present in SBIOS\n"); ++ return -ENODEV; ++ } + -+ if (!driver) -+ return ret; ++ /* don't keep reloading if cpufreq_driver exists */ ++ if (cpufreq_get_current_driver()) ++ return -EEXIST; + -+ if (!strcmp(driver, "amd-pstate")) -+ ret = true; ++ /* capability check */ ++ if (boot_cpu_has(X86_FEATURE_CPPC)) { ++ pr_debug("AMD CPPC MSR based functionality is supported\n"); ++ amd_pstate_driver.adjust_perf = amd_pstate_adjust_perf; ++ } else if (shared_mem) { ++ static_call_update(amd_pstate_enable, cppc_enable); ++ static_call_update(amd_pstate_init_perf, cppc_init_perf); ++ static_call_update(amd_pstate_update_perf, cppc_update_perf); ++ } else { ++ pr_info("This processor supports shared memory solution, you can enable it with amd_pstate.shared_mem=1\n"); ++ return -ENODEV; ++ } + -+ cpufreq_put_driver(driver); ++ /* enable amd pstate feature */ ++ ret = amd_pstate_enable(true); ++ if (ret) { ++ pr_err("failed to enable amd-pstate with return %d\n", ret); ++ return ret; ++ } ++ ++ ret = cpufreq_register_driver(&amd_pstate_driver); ++ if (ret) ++ pr_err("failed to register amd_pstate_driver with return %d\n", ++ ret); + + return ret; +} + - #endif /* #if defined(__i386__) || defined(__x86_64__) */ - - /* get_cpustate - - - -If kernel starts the amd-pstate module, the cpupower will initial the -capability flag as CPUPOWER_CAP_AMD_PSTATE. And once amd-pstate -capability is set, it won't need to set legacy ACPI relative -capabilities anymore. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/helpers/cpuid.c | 13 +++++++++++++ - 1 file changed, 13 insertions(+) - -diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c -index 72eb43593180..2a6dc104e76b 100644 ---- a/tools/power/cpupower/utils/helpers/cpuid.c -+++ b/tools/power/cpupower/utils/helpers/cpuid.c -@@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info) - if (ext_cpuid_level >= 0x80000008 && - cpuid_ebx(0x80000008) & (1 << 4)) - cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU; ++static void __exit amd_pstate_exit(void) ++{ ++ cpufreq_unregister_driver(&amd_pstate_driver); + -+ if (cpupower_amd_pstate_enabled()) { -+ cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE; ++ amd_pstate_enable(false); ++} + -+ /* -+ * If AMD P-state is enabled, the firmware will treat -+ * AMD P-state function as high priority. -+ */ -+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB; -+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR; -+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE; -+ cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF; -+ } - } - - if (cpu_info->vendor == X86_VENDOR_INTEL) { - - - -Expose the helper into cpufreq header, then cpufreq driver can use this -function to get the sysfs value if it has any specific sysfs interfaces. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/lib/cpufreq.c | 21 +++++++++++++++------ - tools/power/cpupower/lib/cpufreq.h | 12 ++++++++++++ - 2 files changed, 27 insertions(+), 6 deletions(-) - -diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c -index c3b56db8b921..02719cc400a1 100644 ---- a/tools/power/cpupower/lib/cpufreq.c -+++ b/tools/power/cpupower/lib/cpufreq.c -@@ -83,20 +83,21 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = { - [STATS_NUM_TRANSITIONS] = "stats/total_trans" - }; - -- --static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, -- enum cpufreq_value which) -+unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, -+ const char **table, -+ unsigned index, -+ unsigned size) ++module_init(amd_pstate_init); ++module_exit(amd_pstate_exit); ++ ++MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>"); ++MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver"); ++MODULE_LICENSE("GPL"); +diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h +index bc159a9b4a73..92b7ea8d8f5e 100644 +--- a/include/acpi/cppc_acpi.h ++++ b/include/acpi/cppc_acpi.h +@@ -138,6 +138,7 @@ extern int cppc_get_desired_perf(int cpunum, u64 *desired_perf); + extern int cppc_get_nominal_perf(int cpunum, u64 *nominal_perf); + extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs); + extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls); ++extern int cppc_set_enable(int cpu, bool enable); + extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps); + extern bool acpi_cpc_valid(void); + extern int acpi_get_psd_map(unsigned int cpu, struct cppc_cpudata *cpu_data); +@@ -162,6 +163,10 @@ static inline int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls) { - unsigned long value; - unsigned int len; - char linebuf[MAX_LINE_LEN]; - char *endp; - -- if (which >= MAX_CPUFREQ_VALUE_READ_FILES) -+ if (!table && !table[index] && index >= size) - return 0; - -- len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which], -- linebuf, sizeof(linebuf)); -+ len = sysfs_cpufreq_read_file(cpu, table[index], linebuf, -+ sizeof(linebuf)); - - if (len == 0) - return 0; -@@ -109,6 +110,14 @@ static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, - return value; + return -ENOTSUPP; } - -+static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, -+ enum cpufreq_value which) ++static inline int cppc_set_enable(int cpu, bool enable) +{ -+ return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files, -+ which, -+ MAX_CPUFREQ_VALUE_READ_FILES); ++ return -ENOTSUPP; +} -+ - /* read access to files which contain one string */ - - enum cpufreq_string { -diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h -index 95f4fd9e2656..107668c0c454 100644 ---- a/tools/power/cpupower/lib/cpufreq.h -+++ b/tools/power/cpupower/lib/cpufreq.h -@@ -203,6 +203,18 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor); - int cpufreq_set_frequency(unsigned int cpu, - unsigned long target_frequency); - -+/* -+ * get the sysfs value from specific table -+ * -+ * Read the value with the sysfs file name from specific table. Does -+ * only work if the cpufreq driver has the specific sysfs interfaces. -+ */ -+ -+unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, -+ const char **table, -+ unsigned index, -+ unsigned size); -+ - #ifdef __cplusplus - } - #endif - - - -Kernel ACPI subsytem introduced the sysfs attributes for acpi cppc -library in below path: - -/sys/devices/system/cpu/cpuX/acpi_cppc/ - -And these attributes will be used for amd-pstate driver to provide some -performance and frequency values. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/Makefile | 6 +-- - tools/power/cpupower/lib/acpi_cppc.c | 59 ++++++++++++++++++++++++++++ - tools/power/cpupower/lib/acpi_cppc.h | 21 ++++++++++ - 3 files changed, 83 insertions(+), 3 deletions(-) - create mode 100644 tools/power/cpupower/lib/acpi_cppc.c - create mode 100644 tools/power/cpupower/lib/acpi_cppc.h - + static inline int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps) + { + return -ENOTSUPP; diff --git a/tools/power/cpupower/Makefile b/tools/power/cpupower/Makefile index 3b1594447f29..e9b6de314654 100644 --- a/tools/power/cpupower/Makefile @@ -1794,172 +1531,78 @@ index 000000000000..576291155224 + enum acpi_cppc_value which); + +#endif /* _ACPI_CPPC_H */ - - - -Introduce the marco definitions and access helper function for -amd-pstate sysfs interfaces such as each performance goals and frequency -levels in amd helper file. They will be used to read the sysfs attribute -from amd-pstate cpufreq driver for cpupower utilities. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/helpers/amd.c | 30 ++++++++++++++++++++++++ - 1 file changed, 30 insertions(+) - -diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c -index 97f2c857048e..14c658daba4b 100644 ---- a/tools/power/cpupower/utils/helpers/amd.c -+++ b/tools/power/cpupower/utils/helpers/amd.c -@@ -8,7 +8,10 @@ - #include <pci/pci.h> +diff --git a/tools/power/cpupower/lib/cpufreq.c b/tools/power/cpupower/lib/cpufreq.c +index c3b56db8b921..c011bca27041 100644 +--- a/tools/power/cpupower/lib/cpufreq.c ++++ b/tools/power/cpupower/lib/cpufreq.c +@@ -83,20 +83,21 @@ static const char *cpufreq_value_files[MAX_CPUFREQ_VALUE_READ_FILES] = { + [STATS_NUM_TRANSITIONS] = "stats/total_trans" + }; - #include "helpers/helpers.h" -+#include "cpufreq.h" -+#include "acpi_cppc.h" +- +-static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, +- enum cpufreq_value which) ++unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, ++ const char **table, ++ unsigned index, ++ unsigned size) + { + unsigned long value; + unsigned int len; + char linebuf[MAX_LINE_LEN]; + char *endp; -+/* ACPI P-States Helper Functions for AMD Processors ***************/ - #define MSR_AMD_PSTATE_STATUS 0xc0010063 - #define MSR_AMD_PSTATE 0xc0010064 - #define MSR_AMD_PSTATE_LIMIT 0xc0010061 -@@ -146,4 +149,31 @@ int amd_pci_get_num_boost_states(int *active, int *states) - pci_cleanup(pci_acc); - return 0; - } -+ -+/* ACPI P-States Helper Functions for AMD Processors ***************/ -+ -+/* AMD P-States Helper Functions ***************/ -+enum amd_pstate_value { -+ AMD_PSTATE_HIGHEST_PERF, -+ AMD_PSTATE_MAX_FREQ, -+ AMD_PSTATE_LOWEST_NONLINEAR_FREQ, -+ MAX_AMD_PSTATE_VALUE_READ_FILES, -+}; -+ -+static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = { -+ [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf", -+ [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq", -+ [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq", -+}; -+ -+static unsigned long amd_pstate_get_data(unsigned int cpu, -+ enum amd_pstate_value value) -+{ -+ return cpufreq_get_sysfs_value_from_table(cpu, -+ amd_pstate_value_files, -+ value, -+ MAX_AMD_PSTATE_VALUE_READ_FILES); -+} -+ -+/* AMD P-States Helper Functions ***************/ - #endif /* defined(__i386__) || defined(__x86_64__) */ - - - -The legacy ACPI hardware P-States function has 3 P-States on ACPI table, -the CPU frequency only can be switched between the 3 P-States. While the -processor supports the boost state, it will have another boost state -that the frequency can be higher than P0 state, and the state can be -decoded by the function of decode_pstates() and read by -amd_pci_get_num_boost_states(). - -However, the new AMD P-States function is different than legacy ACPI -hardware P-State on AMD processors. That has a finer grain frequency -range between the highest and lowest frequency. And boost frequency is -actually the frequency which is mapped on highest performance ratio. The -similiar previous P0 frequency is mapped on nominal performance ratio. -If the highest performance on the processor is higher than nominal -performance, then we think the current processor supports the boost -state. And it uses amd_pstate_boost_init() to initialize boost for AMD -P-States function. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/helpers/amd.c | 18 ++++++++++++++++++ - tools/power/cpupower/utils/helpers/helpers.h | 5 +++++ - tools/power/cpupower/utils/helpers/misc.c | 2 ++ - 3 files changed, 25 insertions(+) - -diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c -index 14c658daba4b..bde6065cabf4 100644 ---- a/tools/power/cpupower/utils/helpers/amd.c -+++ b/tools/power/cpupower/utils/helpers/amd.c -@@ -175,5 +175,23 @@ static unsigned long amd_pstate_get_data(unsigned int cpu, - MAX_AMD_PSTATE_VALUE_READ_FILES); +- if (which >= MAX_CPUFREQ_VALUE_READ_FILES) ++ if (!table || index >= size || !table[index]) + return 0; + +- len = sysfs_cpufreq_read_file(cpu, cpufreq_value_files[which], +- linebuf, sizeof(linebuf)); ++ len = sysfs_cpufreq_read_file(cpu, table[index], linebuf, ++ sizeof(linebuf)); + + if (len == 0) + return 0; +@@ -109,6 +110,14 @@ static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, + return value; } -+void amd_pstate_boost_init(unsigned int cpu, int *support, int *active) ++static unsigned long sysfs_cpufreq_get_one_value(unsigned int cpu, ++ enum cpufreq_value which) +{ -+ unsigned long highest_perf, nominal_perf, cpuinfo_min, -+ cpuinfo_max, amd_pstate_max; -+ -+ highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF); -+ nominal_perf = acpi_cppc_get_data(cpu, NOMINAL_PERF); -+ -+ *support = highest_perf > nominal_perf ? 1 : 0; -+ if (!(*support)) -+ return; -+ -+ cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max); -+ amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ); -+ -+ *active = cpuinfo_max == amd_pstate_max ? 1 : 0; ++ return cpufreq_get_sysfs_value_from_table(cpu, cpufreq_value_files, ++ which, ++ MAX_CPUFREQ_VALUE_READ_FILES); +} + - /* AMD P-States Helper Functions ***************/ - #endif /* defined(__i386__) || defined(__x86_64__) */ -diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h -index e03cc97297aa..c03925bea655 100644 ---- a/tools/power/cpupower/utils/helpers/helpers.h -+++ b/tools/power/cpupower/utils/helpers/helpers.h -@@ -140,6 +140,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support, - - /* AMD P-States stuff **************************/ - extern bool cpupower_amd_pstate_enabled(void); -+extern void amd_pstate_boost_init(unsigned int cpu, -+ int *support, int *active); - - /* AMD P-States stuff **************************/ - -@@ -177,6 +179,9 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support, - - static inline bool cpupower_amd_pstate_enabled(void) - { return false; } -+static void amd_pstate_boost_init(unsigned int cpu, -+ int *support, int *active) -+{ return; } + /* read access to files which contain one string */ - /* cpuid and cpuinfo helpers **************************/ + enum cpufreq_string { +diff --git a/tools/power/cpupower/lib/cpufreq.h b/tools/power/cpupower/lib/cpufreq.h +index 95f4fd9e2656..107668c0c454 100644 +--- a/tools/power/cpupower/lib/cpufreq.h ++++ b/tools/power/cpupower/lib/cpufreq.h +@@ -203,6 +203,18 @@ int cpufreq_modify_policy_governor(unsigned int cpu, char *governor); + int cpufreq_set_frequency(unsigned int cpu, + unsigned long target_frequency); -diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c -index 0c483cdefcc2..e0d3145434d3 100644 ---- a/tools/power/cpupower/utils/helpers/misc.c -+++ b/tools/power/cpupower/utils/helpers/misc.c -@@ -41,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, - if (ret) - return ret; - } -+ } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { -+ amd_pstate_boost_init(cpu, support, active); - } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA) - *support = *active = 1; - return 0; - - - -The print_speed can be as a common function, and expose it into misc -helper header. Then it can be used on other helper files as well. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/cpufreq-info.c | 59 ++++---------------- - tools/power/cpupower/utils/helpers/helpers.h | 1 + - tools/power/cpupower/utils/helpers/misc.c | 42 ++++++++++++++ - 3 files changed, 54 insertions(+), 48 deletions(-) - ++/* ++ * get the sysfs value from specific table ++ * ++ * Read the value with the sysfs file name from specific table. Does ++ * only work if the cpufreq driver has the specific sysfs interfaces. ++ */ ++ ++unsigned long cpufreq_get_sysfs_value_from_table(unsigned int cpu, ++ const char **table, ++ unsigned index, ++ unsigned size); ++ + #ifdef __cplusplus + } + #endif diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c -index f9895e31ff5a..b429454bf3ae 100644 +index f9895e31ff5a..f828f3c35a6f 100644 --- a/tools/power/cpupower/utils/cpufreq-info.c +++ b/tools/power/cpupower/utils/cpufreq-info.c @@ -84,43 +84,6 @@ static void proc_cpufreq_output(void) @@ -2006,7 +1649,23 @@ index f9895e31ff5a..b429454bf3ae 100644 static void print_duration(unsigned long duration) { unsigned long tmp; -@@ -254,11 +217,11 @@ static int get_boost_mode(unsigned int cpu) +@@ -183,9 +146,12 @@ static int get_boost_mode_x86(unsigned int cpu) + printf(_(" Supported: %s\n"), support ? _("yes") : _("no")); + printf(_(" Active: %s\n"), active ? _("yes") : _("no")); + +- if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && +- cpupower_cpu_info.family >= 0x10) || +- cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { ++ if (cpupower_cpu_info.vendor == X86_VENDOR_AMD && ++ cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { ++ amd_pstate_show_perf_and_freq(cpu, no_rounding); ++ } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && ++ cpupower_cpu_info.family >= 0x10) || ++ cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { + ret = decode_pstates(cpu, b_states, pstates, &pstate_no); + if (ret) + return ret; +@@ -254,11 +220,11 @@ static int get_boost_mode(unsigned int cpu) if (freqs) { printf(_(" boost frequency steps: ")); while (freqs->next) { @@ -2020,7 +1679,7 @@ index f9895e31ff5a..b429454bf3ae 100644 printf("\n"); cpufreq_put_available_frequencies(freqs); } -@@ -277,7 +240,7 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) +@@ -277,7 +243,7 @@ static int get_freq_kernel(unsigned int cpu, unsigned int human) return -EINVAL; } if (human) { @@ -2029,7 +1688,7 @@ index f9895e31ff5a..b429454bf3ae 100644 } else printf("%lu", freq); printf(_(" (asserted by call to kernel)\n")); -@@ -296,7 +259,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human) +@@ -296,7 +262,7 @@ static int get_freq_hardware(unsigned int cpu, unsigned int human) return -EINVAL; } if (human) { @@ -2038,7 +1697,7 @@ index f9895e31ff5a..b429454bf3ae 100644 } else printf("%lu", freq); printf(_(" (asserted by call to hardware)\n")); -@@ -316,9 +279,9 @@ static int get_hardware_limits(unsigned int cpu, unsigned int human) +@@ -316,9 +282,9 @@ static int get_hardware_limits(unsigned int cpu, unsigned int human) if (human) { printf(_(" hardware limits: ")); @@ -2050,7 +1709,7 @@ index f9895e31ff5a..b429454bf3ae 100644 printf("\n"); } else { printf("%lu %lu\n", min, max); -@@ -350,9 +313,9 @@ static int get_policy(unsigned int cpu) +@@ -350,9 +316,9 @@ static int get_policy(unsigned int cpu) return -EINVAL; } printf(_(" current policy: frequency should be within ")); @@ -2062,7 +1721,7 @@ index f9895e31ff5a..b429454bf3ae 100644 printf(".\n "); printf(_("The governor \"%s\" may decide which speed to use\n" -@@ -436,7 +399,7 @@ static int get_freq_stats(unsigned int cpu, unsigned int human) +@@ -436,7 +402,7 @@ static int get_freq_stats(unsigned int cpu, unsigned int human) struct cpufreq_stats *stats = cpufreq_get_stats(cpu, &total_time); while (stats) { if (human) { @@ -2071,7 +1730,7 @@ index f9895e31ff5a..b429454bf3ae 100644 printf(":%.2f%%", (100.0 * stats->time_in_state) / total_time); } else -@@ -486,11 +449,11 @@ static void debug_output_one(unsigned int cpu) +@@ -486,11 +452,11 @@ static void debug_output_one(unsigned int cpu) if (freqs) { printf(_(" available frequency steps: ")); while (freqs->next) { @@ -2085,11 +1744,169 @@ index f9895e31ff5a..b429454bf3ae 100644 printf("\n"); cpufreq_put_available_frequencies(freqs); } +diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c +index 97f2c857048e..a1115891d76d 100644 +--- a/tools/power/cpupower/utils/helpers/amd.c ++++ b/tools/power/cpupower/utils/helpers/amd.c +@@ -8,7 +8,10 @@ + #include <pci/pci.h> + + #include "helpers/helpers.h" ++#include "cpufreq.h" ++#include "acpi_cppc.h" + ++/* ACPI P-States Helper Functions for AMD Processors ***************/ + #define MSR_AMD_PSTATE_STATUS 0xc0010063 + #define MSR_AMD_PSTATE 0xc0010064 + #define MSR_AMD_PSTATE_LIMIT 0xc0010061 +@@ -146,4 +149,77 @@ int amd_pci_get_num_boost_states(int *active, int *states) + pci_cleanup(pci_acc); + return 0; + } ++ ++/* ACPI P-States Helper Functions for AMD Processors ***************/ ++ ++/* AMD P-States Helper Functions ***************/ ++enum amd_pstate_value { ++ AMD_PSTATE_HIGHEST_PERF, ++ AMD_PSTATE_MAX_FREQ, ++ AMD_PSTATE_LOWEST_NONLINEAR_FREQ, ++ MAX_AMD_PSTATE_VALUE_READ_FILES, ++}; ++ ++static const char *amd_pstate_value_files[MAX_AMD_PSTATE_VALUE_READ_FILES] = { ++ [AMD_PSTATE_HIGHEST_PERF] = "amd_pstate_highest_perf", ++ [AMD_PSTATE_MAX_FREQ] = "amd_pstate_max_freq", ++ [AMD_PSTATE_LOWEST_NONLINEAR_FREQ] = "amd_pstate_lowest_nonlinear_freq", ++}; ++ ++static unsigned long amd_pstate_get_data(unsigned int cpu, ++ enum amd_pstate_value value) ++{ ++ return cpufreq_get_sysfs_value_from_table(cpu, ++ amd_pstate_value_files, ++ value, ++ MAX_AMD_PSTATE_VALUE_READ_FILES); ++} ++ ++void amd_pstate_boost_init(unsigned int cpu, int *support, int *active) ++{ ++ unsigned long highest_perf, nominal_perf, cpuinfo_min, ++ cpuinfo_max, amd_pstate_max; ++ ++ highest_perf = amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF); ++ nominal_perf = acpi_cppc_get_data(cpu, NOMINAL_PERF); ++ ++ *support = highest_perf > nominal_perf ? 1 : 0; ++ if (!(*support)) ++ return; ++ ++ cpufreq_get_hardware_limits(cpu, &cpuinfo_min, &cpuinfo_max); ++ amd_pstate_max = amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ); ++ ++ *active = cpuinfo_max == amd_pstate_max ? 1 : 0; ++} ++ ++void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding) ++{ ++ printf(_(" AMD PSTATE Highest Performance: %lu. Maximum Frequency: "), ++ amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF)); ++ /* If boost isn't active, the cpuinfo_max doesn't indicate real max ++ * frequency. So we read it back from amd-pstate sysfs entry. ++ */ ++ print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding); ++ printf(".\n"); ++ ++ printf(_(" AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "), ++ acpi_cppc_get_data(cpu, NOMINAL_PERF)); ++ print_speed(acpi_cppc_get_data(cpu, NOMINAL_FREQ) * 1000, ++ no_rounding); ++ printf(".\n"); ++ ++ printf(_(" AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "), ++ acpi_cppc_get_data(cpu, LOWEST_NONLINEAR_PERF)); ++ print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ), ++ no_rounding); ++ printf(".\n"); ++ ++ printf(_(" AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "), ++ acpi_cppc_get_data(cpu, LOWEST_PERF)); ++ print_speed(acpi_cppc_get_data(cpu, LOWEST_FREQ) * 1000, no_rounding); ++ printf(".\n"); ++} ++ ++/* AMD P-States Helper Functions ***************/ + #endif /* defined(__i386__) || defined(__x86_64__) */ +diff --git a/tools/power/cpupower/utils/helpers/cpuid.c b/tools/power/cpupower/utils/helpers/cpuid.c +index 72eb43593180..2a6dc104e76b 100644 +--- a/tools/power/cpupower/utils/helpers/cpuid.c ++++ b/tools/power/cpupower/utils/helpers/cpuid.c +@@ -149,6 +149,19 @@ int get_cpu_info(struct cpupower_cpu_info *cpu_info) + if (ext_cpuid_level >= 0x80000008 && + cpuid_ebx(0x80000008) & (1 << 4)) + cpu_info->caps |= CPUPOWER_CAP_AMD_RDPRU; ++ ++ if (cpupower_amd_pstate_enabled()) { ++ cpu_info->caps |= CPUPOWER_CAP_AMD_PSTATE; ++ ++ /* ++ * If AMD P-state is enabled, the firmware will treat ++ * AMD P-state function as high priority. ++ */ ++ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB; ++ cpu_info->caps &= ~CPUPOWER_CAP_AMD_CPB_MSR; ++ cpu_info->caps &= ~CPUPOWER_CAP_AMD_HW_PSTATE; ++ cpu_info->caps &= ~CPUPOWER_CAP_AMD_PSTATEDEF; ++ } + } + + if (cpu_info->vendor == X86_VENDOR_INTEL) { diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h -index c03925bea655..fbbfa6047c83 100644 +index b4813efdfb00..5f6862502dbf 100644 --- a/tools/power/cpupower/utils/helpers/helpers.h +++ b/tools/power/cpupower/utils/helpers/helpers.h -@@ -200,5 +200,6 @@ extern struct bitmask *offline_cpus; +@@ -11,6 +11,7 @@ + + #include <libintl.h> + #include <locale.h> ++#include <stdbool.h> + + #include "helpers/bitmask.h" + #include <cpupower.h> +@@ -136,6 +137,16 @@ extern int decode_pstates(unsigned int cpu, int boost_states, + + extern int cpufreq_has_boost_support(unsigned int cpu, int *support, + int *active, int * states); ++ ++/* AMD P-States stuff **************************/ ++extern bool cpupower_amd_pstate_enabled(void); ++extern void amd_pstate_boost_init(unsigned int cpu, ++ int *support, int *active); ++extern void amd_pstate_show_perf_and_freq(unsigned int cpu, ++ int no_rounding); ++ ++/* AMD P-States stuff **************************/ ++ + /* + * CPUID functions returning a single datum + */ +@@ -168,6 +179,15 @@ static inline int cpufreq_has_boost_support(unsigned int cpu, int *support, + int *active, int * states) + { return -1; } + ++static inline bool cpupower_amd_pstate_enabled(void) ++{ return false; } ++static void amd_pstate_boost_init(unsigned int cpu, ++ int *support, int *active) ++{ return; } ++static inline void amd_pstate_show_perf_and_freq(unsigned int cpu, ++ int no_rounding) ++{ return; } ++ + /* cpuid and cpuinfo helpers **************************/ + + static inline unsigned int cpuid_eax(unsigned int op) { return 0; }; +@@ -185,5 +205,6 @@ extern struct bitmask *offline_cpus; void get_cpustate(void); void print_online_cpus(void); void print_offline_cpus(void); @@ -2097,10 +1914,54 @@ index c03925bea655..fbbfa6047c83 100644 #endif /* __CPUPOWERUTILS_HELPERS__ */ diff --git a/tools/power/cpupower/utils/helpers/misc.c b/tools/power/cpupower/utils/helpers/misc.c -index e0d3145434d3..d693c96cd09c 100644 +index fc6e34511721..d693c96cd09c 100644 --- a/tools/power/cpupower/utils/helpers/misc.c +++ b/tools/power/cpupower/utils/helpers/misc.c -@@ -164,3 +164,45 @@ void print_offline_cpus(void) +@@ -3,9 +3,11 @@ + #include <stdio.h> + #include <errno.h> + #include <stdlib.h> ++#include <string.h> + + #include "helpers/helpers.h" + #include "helpers/sysfs.h" ++#include "cpufreq.h" + + #if defined(__i386__) || defined(__x86_64__) + +@@ -39,6 +41,8 @@ int cpufreq_has_boost_support(unsigned int cpu, int *support, int *active, + if (ret) + return ret; + } ++ } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { ++ amd_pstate_boost_init(cpu, support, active); + } else if (cpupower_cpu_info.caps & CPUPOWER_CAP_INTEL_IDA) + *support = *active = 1; + return 0; +@@ -83,6 +87,22 @@ int cpupower_intel_set_perf_bias(unsigned int cpu, unsigned int val) + return 0; + } + ++bool cpupower_amd_pstate_enabled(void) ++{ ++ char *driver = cpufreq_get_driver(0); ++ bool ret = false; ++ ++ if (!driver) ++ return ret; ++ ++ if (!strcmp(driver, "amd-pstate")) ++ ret = true; ++ ++ cpufreq_put_driver(driver); ++ ++ return ret; ++} ++ + #endif /* #if defined(__i386__) || defined(__x86_64__) */ + + /* get_cpustate +@@ -144,3 +164,45 @@ void print_offline_cpus(void) printf(_("cpupower set operation was not performed on them\n")); } } @@ -2146,501 +2007,5 @@ index e0d3145434d3..d693c96cd09c 100644 + + return; +} - - - -amd-pstate kernel module is using the fine grain frequency instead of -acpi hardware pstate. So the performance and frequency values should be -printed in frequency-info. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - tools/power/cpupower/utils/cpufreq-info.c | 9 ++++--- - tools/power/cpupower/utils/helpers/amd.c | 28 ++++++++++++++++++++ - tools/power/cpupower/utils/helpers/helpers.h | 5 ++++ - 3 files changed, 39 insertions(+), 3 deletions(-) - -diff --git a/tools/power/cpupower/utils/cpufreq-info.c b/tools/power/cpupower/utils/cpufreq-info.c -index b429454bf3ae..f828f3c35a6f 100644 ---- a/tools/power/cpupower/utils/cpufreq-info.c -+++ b/tools/power/cpupower/utils/cpufreq-info.c -@@ -146,9 +146,12 @@ static int get_boost_mode_x86(unsigned int cpu) - printf(_(" Supported: %s\n"), support ? _("yes") : _("no")); - printf(_(" Active: %s\n"), active ? _("yes") : _("no")); - -- if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && -- cpupower_cpu_info.family >= 0x10) || -- cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { -+ if (cpupower_cpu_info.vendor == X86_VENDOR_AMD && -+ cpupower_cpu_info.caps & CPUPOWER_CAP_AMD_PSTATE) { -+ amd_pstate_show_perf_and_freq(cpu, no_rounding); -+ } else if ((cpupower_cpu_info.vendor == X86_VENDOR_AMD && -+ cpupower_cpu_info.family >= 0x10) || -+ cpupower_cpu_info.vendor == X86_VENDOR_HYGON) { - ret = decode_pstates(cpu, b_states, pstates, &pstate_no); - if (ret) - return ret; -diff --git a/tools/power/cpupower/utils/helpers/amd.c b/tools/power/cpupower/utils/helpers/amd.c -index bde6065cabf4..a1115891d76d 100644 ---- a/tools/power/cpupower/utils/helpers/amd.c -+++ b/tools/power/cpupower/utils/helpers/amd.c -@@ -193,5 +193,33 @@ void amd_pstate_boost_init(unsigned int cpu, int *support, int *active) - *active = cpuinfo_max == amd_pstate_max ? 1 : 0; - } - -+void amd_pstate_show_perf_and_freq(unsigned int cpu, int no_rounding) -+{ -+ printf(_(" AMD PSTATE Highest Performance: %lu. Maximum Frequency: "), -+ amd_pstate_get_data(cpu, AMD_PSTATE_HIGHEST_PERF)); -+ /* If boost isn't active, the cpuinfo_max doesn't indicate real max -+ * frequency. So we read it back from amd-pstate sysfs entry. -+ */ -+ print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_MAX_FREQ), no_rounding); -+ printf(".\n"); -+ -+ printf(_(" AMD PSTATE Nominal Performance: %lu. Nominal Frequency: "), -+ acpi_cppc_get_data(cpu, NOMINAL_PERF)); -+ print_speed(acpi_cppc_get_data(cpu, NOMINAL_FREQ) * 1000, -+ no_rounding); -+ printf(".\n"); -+ -+ printf(_(" AMD PSTATE Lowest Non-linear Performance: %lu. Lowest Non-linear Frequency: "), -+ acpi_cppc_get_data(cpu, LOWEST_NONLINEAR_PERF)); -+ print_speed(amd_pstate_get_data(cpu, AMD_PSTATE_LOWEST_NONLINEAR_FREQ), -+ no_rounding); -+ printf(".\n"); -+ -+ printf(_(" AMD PSTATE Lowest Performance: %lu. Lowest Frequency: "), -+ acpi_cppc_get_data(cpu, LOWEST_PERF)); -+ print_speed(acpi_cppc_get_data(cpu, LOWEST_FREQ) * 1000, no_rounding); -+ printf(".\n"); -+} -+ - /* AMD P-States Helper Functions ***************/ - #endif /* defined(__i386__) || defined(__x86_64__) */ -diff --git a/tools/power/cpupower/utils/helpers/helpers.h b/tools/power/cpupower/utils/helpers/helpers.h -index fbbfa6047c83..5f6862502dbf 100644 ---- a/tools/power/cpupower/utils/helpers/helpers.h -+++ b/tools/power/cpupower/utils/helpers/helpers.h -@@ -142,6 +142,8 @@ extern int cpufreq_has_boost_support(unsigned int cpu, int *support, - extern bool cpupower_amd_pstate_enabled(void); - extern void amd_pstate_boost_init(unsigned int cpu, - int *support, int *active); -+extern void amd_pstate_show_perf_and_freq(unsigned int cpu, -+ int no_rounding); - - /* AMD P-States stuff **************************/ - -@@ -182,6 +184,9 @@ static inline bool cpupower_amd_pstate_enabled(void) - static void amd_pstate_boost_init(unsigned int cpu, - int *support, int *active) - { return; } -+static inline void amd_pstate_show_perf_and_freq(unsigned int cpu, -+ int no_rounding) -+{ return; } - - /* cpuid and cpuinfo helpers **************************/ - - - - -Introduce the amd-pstate driver design and implementation. - -Signed-off-by: Huang Rui <ray.huang@amd.com> ---- - Documentation/admin-guide/pm/amd-pstate.rst | 373 ++++++++++++++++++ - .../admin-guide/pm/working-state.rst | 1 + - 2 files changed, 374 insertions(+) - create mode 100644 Documentation/admin-guide/pm/amd-pstate.rst - -diff --git a/Documentation/admin-guide/pm/amd-pstate.rst b/Documentation/admin-guide/pm/amd-pstate.rst -new file mode 100644 -index 000000000000..24a88476fc69 ---- /dev/null -+++ b/Documentation/admin-guide/pm/amd-pstate.rst -@@ -0,0 +1,373 @@ -+.. SPDX-License-Identifier: GPL-2.0 -+.. include:: <isonum.txt> -+ -+=============================================== -+``amd-pstate`` CPU Performance Scaling Driver -+=============================================== -+ -+:Copyright: |copy| 2021 Advanced Micro Devices, Inc. -+ -+:Author: Huang Rui <ray.huang@amd.com> -+ -+ -+Introduction -+=================== -+ -+``amd-pstate`` is the AMD CPU performance scaling driver that introduces a -+new CPU frequency control mechanism on modern AMD APU and CPU series in -+Linux kernel. The new mechanism is based on Collaborative Processor -+Performance Control (CPPC) which provides finer grain frequency management -+than legacy ACPI hardware P-States. Current AMD CPU/APU platforms are using -+the ACPI P-states driver to manage CPU frequency and clocks with switching -+only in 3 P-states. CPPC replaces the ACPI P-states controls, allows a -+flexible, low-latency interface for the Linux kernel to directly -+communicate the performance hints to hardware. -+ -+``amd-pstate`` leverages the Linux kernel governors such as ``schedutil``, -+``ondemand``, etc. to manage the performance hints which are provided by -+CPPC hardware functionality that internally follows the hardware -+specification (for details refer to AMD64 Architecture Programmer's Manual -+Volume 2: System Programming [1]_). Currently ``amd-pstate`` supports basic -+frequency control function according to kernel governors on some of the -+Zen2 and Zen3 processors, and we will implement more AMD specific functions -+in future after we verify them on the hardware and SBIOS. -+ -+ -+AMD CPPC Overview -+======================= -+ -+Collaborative Processor Performance Control (CPPC) interface enumerates a -+continuous, abstract, and unit-less performance value in a scale that is -+not tied to a specific performance state / frequency. This is an ACPI -+standard [2]_ which software can specify application performance goals and -+hints as a relative target to the infrastructure limits. AMD processors -+provides the low latency register model (MSR) instead of AML code -+interpreter for performance adjustments. ``amd-pstate`` will initialize a -+``struct cpufreq_driver`` instance ``amd_pstate_driver`` with the callbacks -+to manage each performance update behavior. :: -+ -+ Highest Perf ------>+-----------------------+ +-----------------------+ -+ | | | | -+ | | | | -+ | | Max Perf ---->| | -+ | | | | -+ | | | | -+ Nominal Perf ------>+-----------------------+ +-----------------------+ -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | Desired Perf ---->| | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ | | | | -+ Lowest non- | | | | -+ linear perf ------>+-----------------------+ +-----------------------+ -+ | | | | -+ | | Lowest perf ---->| | -+ | | | | -+ Lowest perf ------>+-----------------------+ +-----------------------+ -+ | | | | -+ | | | | -+ | | | | -+ 0 ------>+-----------------------+ +-----------------------+ -+ -+ AMD P-States Performance Scale -+ -+ -+.. _perf_cap: -+ -+AMD CPPC Performance Capability -+-------------------------------- -+ -+Highest Performance (RO) -+......................... -+ -+It is the absolute maximum performance an individual processor may reach, -+assuming ideal conditions. This performance level may not be sustainable -+for long durations and may only be achievable if other platform components -+are in a specific state; for example, it may require other processors be in -+an idle state. This would be equivalent to the highest frequencies -+supported by the processor. -+ -+Nominal (Guaranteed) Performance (RO) -+...................................... -+ -+It is the maximum sustained performance level of the processor, assuming -+ideal operating conditions. In absence of an external constraint (power, -+thermal, etc.) this is the performance level the processor is expected to -+be able to maintain continuously. All cores/processors are expected to be -+able to sustain their nominal performance state simultaneously. -+ -+Lowest non-linear Performance (RO) -+................................... -+ -+It is the lowest performance level at which nonlinear power savings are -+achieved, for example, due to the combined effects of voltage and frequency -+scaling. Above this threshold, lower performance levels should be generally -+more energy efficient than higher performance levels. This register -+effectively conveys the most efficient performance level to ``amd-pstate``. -+ -+Lowest Performance (RO) -+........................ -+ -+It is the absolute lowest performance level of the processor. Selecting a -+performance level lower than the lowest nonlinear performance level may -+cause an efficiency penalty but should reduce the instantaneous power -+consumption of the processor. -+ -+AMD CPPC Performance Control -+------------------------------ -+ -+``amd-pstate`` passes performance goals through these registers. The -+register drives the behavior of the desired performance target. -+ -+Minimum requested performance (RW) -+................................... -+ -+``amd-pstate`` specifies the minimum allowed performance level. -+ -+Maximum requested performance (RW) -+................................... -+ -+``amd-pstate`` specifies a limit the maximum performance that is expected -+to be supplied by the hardware. -+ -+Desired performance target (RW) -+................................... -+ -+``amd-pstate`` specifies a desired target in the CPPC performance scale as -+a relative number. This can be expressed as percentage of nominal -+performance (infrastructure max). Below the nominal sustained performance -+level, desired performance expresses the average performance level of the -+processor subject to hardware. Above the nominal performance level, -+processor must provide at least nominal performance requested and go higher -+if current operating conditions allow. -+ -+Energy Performance Preference (EPP) (RW) -+......................................... -+ -+Provides a hint to the hardware if software wants to bias toward performance -+(0x0) or energy efficiency (0xff). -+ -+ -+Key Governors Support -+======================= -+ -+``amd-pstate`` can be used with all the (generic) scaling governors listed -+by the ``scaling_available_governors`` policy attribute in ``sysfs``. Then, -+it is responsible for the configuration of policy objects corresponding to -+CPUs and provides the ``CPUFreq`` core (and the scaling governors attached -+to the policy objects) with accurate information on the maximum and minimum -+operating frequencies supported by the hardware. Users can check the -+``scaling_cur_freq`` information comes from the ``CPUFreq`` core. -+ -+``amd-pstate`` mainly supports ``schedutil`` and ``ondemand`` for dynamic -+frequency control. It is to fine tune the processor configuration on -+``amd-pstate`` to the ``schedutil`` with CPU CFS scheduler. ``amd-pstate`` -+registers adjust_perf callback to implement the CPPC similar performance -+update behavior. It is initialized by ``sugov_start`` and then populate the -+CPU's update_util_data pointer to assign ``sugov_update_single_perf`` as -+the utilization update callback function in CPU scheduler. CPU scheduler -+will call ``cpufreq_update_util`` and assign the target performance -+according to the ``struct sugov_cpu`` that utilization update belongs to. -+Then ``amd-pstate`` updates the desired performance according to the CPU -+scheduler assigned. -+ -+ -+Processor Support -+======================= -+ -+The ``amd-pstate`` initialization will fail if the _CPC in ACPI SBIOS is -+not existed at the detected processor, and it uses ``acpi_cpc_valid`` to -+check the _CPC existence. All Zen based processors support legacy ACPI -+hardware P-States function, so while the ``amd-pstate`` fails to be -+initialized, the kernel will fall back to initialize ``acpi-cpufreq`` -+driver. -+ -+There are two types of hardware implementations for ``amd-pstate``: one is -+`Full MSR Support <perf_cap_>`_ and another is `Shared Memory Support -+<perf_cap_>`_. It can use :c:macro:`X86_FEATURE_CPPC` feature flag (for -+details refer to Processor Programming Reference (PPR) for AMD Family -+19h Model 21h, Revision B0 Processors [3]_) to indicate the different -+types. ``amd-pstate`` is to register different ``amd_pstate_perf_funcs`` -+instances for different hardware implementations. -+ -+Currently, some of Zen2 and Zen3 processors support ``amd-pstate``. In the -+future, it will be supported on more and more AMD processors. -+ -+Full MSR Support -+----------------- -+ -+Some new Zen3 processors such as Cezanne provide the MSR registers directly -+while the :c:macro:`X86_FEATURE_CPPC` CPU feature flag is set. -+``amd-pstate`` can handle the MSR register to implement the fast switch -+function in ``CPUFreq`` that can shrink latency of frequency control on the -+interrupt context. -+ -+Shared Memory Support -+---------------------- -+ -+If :c:macro:`X86_FEATURE_CPPC` CPU feature flag is not set, that means the -+processor supports shared memory solution. In this case, ``amd-pstate`` -+uses the ``cppc_acpi`` helper methods to implement the callback functions -+of ``amd_pstate_perf_funcs``. -+ -+ -+AMD P-States and ACPI hardware P-States always can be supported in one -+processor. But AMD P-States has the higher priority and if it is enabled -+with :c:macro:`MSR_AMD_CPPC_ENABLE` or ``cppc_set_enable``, it will respond -+to the request from AMD P-States. -+ -+ -+User Space Interface in ``sysfs`` -+================================== -+ -+``amd-pstate`` exposes several global attributes (files) in ``sysfs`` to -+control its functionality at the system level. They located in the -+``/sys/devices/system/cpu/cpufreq/policyX/`` directory and affect all CPUs. :: -+ -+ root@hr-test1:/home/ray# ls /sys/devices/system/cpu/cpufreq/policy0/*amd* -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_freq -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_nonlinear_perf -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_lowest_perf -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_max_freq -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_min_freq -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_freq -+ /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_nominal_perf -+ -+ -+``amd_pstate_highest_perf / amd_pstate_max_freq`` -+ -+Maximum CPPC performance and CPU frequency that the driver is allowed to -+set in percent of the maximum supported CPPC performance level (the highest -+performance supported in `AMD CPPC Performance Capability <perf_cap_>`_). -+This attribute is read-only. -+ -+``amd_pstate_nominal_perf / amd_pstate_nominal_freq`` -+ -+Nominal CPPC performance and CPU frequency that the driver is allowed to -+set in percent of the maximum supported CPPC performance level (Please see -+nominal performance in `AMD CPPC Performance Capability <perf_cap_>`_). -+This attribute is read-only. -+ -+``amd_pstate_lowest_nonlinear_perf / amd_pstate_lowest_nonlinear_freq`` -+ -+The lowest non-linear CPPC performance and CPU frequency that the driver is -+allowed to set in percent of the maximum supported CPPC performance level -+(Please see the lowest non-linear performance in `AMD CPPC Performance -+Capability <perf_cap_>`_). -+This attribute is read-only. -+ -+``amd_pstate_lowest_perf`` -+ -+The lowest physical CPPC performance. The minimum CPU frequency can be read -+back from ``cpuinfo`` member of ``cpufreq_policy``, so we won't expose it -+here. -+This attribute is read-only. -+ -+ -+``amd-pstate`` vs ``acpi-cpufreq`` -+====================================== -+ -+On majority of AMD platforms supported by ``acpi-cpufreq``, the ACPI tables -+provided by the platform firmware used for CPU performance scaling, but -+only provides 3 P-states on AMD processors. -+However, on modern AMD APU and CPU series, it provides the collaborative -+processor performance control according to ACPI protocol and customize this -+for AMD platforms. That is fine-grain and continuous frequency range -+instead of the legacy hardware P-states. ``amd-pstate`` is the kernel -+module which supports the new AMD P-States mechanism on most of future AMD -+platforms. The AMD P-States mechanism will be the more performance and energy -+efficiency frequency management method on AMD processors. -+ -+``cpupower`` tool support for ``amd-pstate`` -+=============================================== -+ -+``amd-pstate`` is supported on ``cpupower`` tool that can be used to dump the frequency -+information. And it is in progress to support more and more operations for new -+``amd-pstate`` module with this tool. :: -+ -+ root@hr-test1:/home/ray# cpupower frequency-info -+ analyzing CPU 0: -+ driver: amd-pstate -+ CPUs which run at the same hardware frequency: 0 -+ CPUs which need to have their frequency coordinated by software: 0 -+ maximum transition latency: 131 us -+ hardware limits: 400 MHz - 4.68 GHz -+ available cpufreq governors: ondemand conservative powersave userspace performance schedutil -+ current policy: frequency should be within 400 MHz and 4.68 GHz. -+ The governor "schedutil" may decide which speed to use -+ within this range. -+ current CPU frequency: Unable to call hardware -+ current CPU frequency: 4.02 GHz (asserted by call to kernel) -+ boost state support: -+ Supported: yes -+ Active: yes -+ AMD PSTATE Highest Performance: 166. Maximum Frequency: 4.68 GHz. -+ AMD PSTATE Nominal Performance: 117. Nominal Frequency: 3.30 GHz. -+ AMD PSTATE Lowest Non-linear Performance: 39. Lowest Non-linear Frequency: 1.10 GHz. -+ AMD PSTATE Lowest Performance: 15. Lowest Frequency: 400 MHz. -+ -+ -+Diagnostics and Tuning -+======================= -+ -+Trace Events -+-------------- -+ -+There are two static trace events that can be used for ``amd-pstate`` -+diagnostics. One of them is the cpu_frequency trace event generally used -+by ``CPUFreq``, and the other one is the ``amd_pstate_perf`` trace event -+specific to ``amd-pstate``. The following sequence of shell commands can -+be used to enable them and see their output (if the kernel is generally -+configured to support event tracing). :: -+ -+ root@hr-test1:/home/ray# cd /sys/kernel/tracing/ -+ root@hr-test1:/sys/kernel/tracing# echo 1 > events/amd_cpu/enable -+ root@hr-test1:/sys/kernel/tracing# cat trace -+ # tracer: nop -+ # -+ # entries-in-buffer/entries-written: 47827/42233061 #P:2 -+ # -+ # _-----=> irqs-off -+ # / _----=> need-resched -+ # | / _---=> hardirq/softirq -+ # || / _--=> preempt-depth -+ # ||| / delay -+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION -+ # | | | |||| | | -+ <idle>-0 [015] dN... 4995.979886: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=15 changed=false fast_switch=true -+ <idle>-0 [007] d.h.. 4995.979893: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true -+ cat-2161 [000] d.... 4995.980841: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=0 changed=false fast_switch=true -+ sshd-2125 [004] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=4 changed=false fast_switch=true -+ <idle>-0 [007] d.s.. 4995.980968: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=7 changed=false fast_switch=true -+ <idle>-0 [003] d.s.. 4995.980971: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=3 changed=false fast_switch=true -+ <idle>-0 [011] d.s.. 4995.980996: amd_pstate_perf: amd_min_perf=85 amd_des_perf=85 amd_max_perf=166 cpu_id=11 changed=false fast_switch=true -+ -+The cpu_frequency trace event will be triggered either by the ``schedutil`` scaling -+governor (for the policies it is attached to), or by the ``CPUFreq`` core (for the -+policies with other scaling governors). -+ -+ -+Reference -+=========== -+ -+.. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming, -+ https://www.amd.com/system/files/TechDocs/24593.pdf -+ -+.. [2] Advanced Configuration and Power Interface Specification, -+ https://uefi.org/sites/default/files/resources/ACPI_Spec_6_4_Jan22.pdf -+ -+.. [3] Processor Programming Reference (PPR) for AMD Family 19h Model 21h, Revision B0 Processors -+ https://www.amd.com/system/files/TechDocs/55898_B1_pub_0.50.zip -+ -diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst -index f40994c422dc..5d2757e2de65 100644 ---- a/Documentation/admin-guide/pm/working-state.rst -+++ b/Documentation/admin-guide/pm/working-state.rst -@@ -11,6 +11,7 @@ Working-State Power Management - intel_idle - cpufreq - intel_pstate -+ amd-pstate - cpufreq_drivers - intel_epb - intel-speed-select +-- +2.34.1 |