Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add support to Supermicro M12SWA-TF #22

Open
pktiuk opened this issue Jun 26, 2023 · 8 comments
Open

[Feature request] Add support to Supermicro M12SWA-TF #22

pktiuk opened this issue Jun 26, 2023 · 8 comments
Labels
enhancement New feature or request

Comments

@pktiuk
Copy link

pktiuk commented Jun 26, 2023

I see there are only X10/X11 motherboards.
Would it need a lot of effort to implement support for motherboard M12SWA-TF ?

@petersulyok
Copy link
Owner

There is a compatibility feedback here from @staaled on a Supermicro H13SSL-NT motherboard where he managed to configure smfc.

@staaled: How did you configure the CPU zone for AMD in order to read the temperature properly? Could you please share your config?

@staaled
Copy link

staaled commented Jun 28, 2023

Sorry for the late response @petersulyok

Well... to expand a little on what I wrote at #19 (comment)

I use the k10temp kernel module for AMD CPUs instead of coretemp for Intel CPUs, and the rest is guesswork.


When running sensors (from lm-sensors):

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +39.0 C  
Tccd1:        +33.0 C  
Tccd2:        +32.9 C  
Tccd3:        +34.8 C  
Tccd4:        +34.4 C  

So just did a quick and dirty search for a hwmon temp1_label file containing Tctl in sysfs:
/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon13/temp1_label

and chucked this into smfc.conf under the CPU zone section:
hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input

When running smfc this seems to expand properly and it matches the temperature reading from sensors and ipmi:

# systemctl status smfc -n 50 | grep hwmon
Jun 28 06:28:01 localhost smfc.service[9451]:    hwmon_path = ['/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon13/temp1_input']
# 

Please note this is only tested on a single socket EPYC Zen4(Genoa) CPU running a 6.2.0 kernel.

One observation i made is that hwmon13 is NOT stable, and may vary between reboots/changes to components etc, so I wouldn't recommend using anything like hwmon_path=/sys/class/hwmon/hwmon13/temp1_input

Full config I'm experimenting with now:

[Ipmi]
command=/usr/bin/ipmitool 
fan_mode_delay=10
fan_level_delay=5
swapped_zones=1

[CPU zone]
enabled=1
count=1
temp_calc=1
steps=6
sensitivity=3.0
polling=2
min_temp=35.0
max_temp=70.0
min_level=10
max_level=100
hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input

[HD zone]
enabled=0

FWIW I replaced my chassis fans with Noctua NF-A9x14's, and disabled the HD zone because I want those silent puppies running full speed (~2100 RPM), as the stock fan on the Dynatron J12 CPU cooler is a little 80mm monster which does 8000 RPM at full tilt and makes me wonder if I can use it as a siren for the burglar alarm... The min_level setting in the above config may not be very safe though.

@petersulyok
Copy link
Owner

petersulyok commented Jun 28, 2023

@staaled thanks for sharing this!

I'm planning to add support of AMD CPUs for smfc as well and I would have some further questions:

  • I'm wondering if the k10temp module is visible on the /sys/devices/platform branch in hwmon?
  • What is the enumeration pattern for a second CPU? Do you have any sample for that?
  • Is there any power saving technology for AMD CPUs to recommend in the README.md (e.g. Intel Speed Shift) ?
  • Can you confirm if IPMI FULL MODE is working on AST2600 chip?
  • Can you confirm if you managed to configure the threshold for fans properly on AST2600 chip?

I really appreciate your help.

@staaled
Copy link

staaled commented Jun 28, 2023

We should perhaps create a separate issue for this, however just a quick response to your questions @petersulyok:

  • Couldn't find much in /sys/devices/platform/ for k10temp
  • I don't currently have access to a system with multiple AMD sockets, I would however guess one could reliably enumerate them from /sys/module/k10temp/drivers/pci:k10temp/(same as /sys/bus/pci/drivers/k10temp/), by looking at symlinks pointing to devices. (Ref https://docs.kernel.org/hwmon/k10temp.html they should all show up as pci devices.)
    Putting some sample output here in case it helps:
root@localhost:/sys/module/k10temp/drivers/pci:k10temp# ls -l
total 0
lrwxrwxrwx 1 root root    0 Jun 28 15:11 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
--w------- 1 root root 4096 Jun 28 15:13 bind
lrwxrwxrwx 1 root root    0 Jun 28 15:11 module -> ../../../../module/k10temp
--w------- 1 root root 4096 Jun 28 15:13 new_id
--w------- 1 root root 4096 Jun 28 15:13 remove_id
--w------- 1 root root 4096 Jun 28 15:10 uevent
--w------- 1 root root 4096 Jun 28 15:13 unbind
  • Yeah AMD has a similar set of features, sometimes implemented with the same intel* drivers as when using an Intel CPU, along with the standard ones. Enabling and disabling those is a bit of a mess as BIOS from various vendors will call it different things, and server boards usually won't let you set them directly. For my H13SSL-NT board one can find the manual on the resources page, MNL-2545.pdf and look at pages 63-66 to find things as SMT, Core Performance Boost, C-states, TDP Control, Package Power Limit Control etc..., I would perhaps start by looking at the CPU specs and https://docs.kernel.org/admin-guide/pm/amd-pstate.html , but this might be out of scope for this project?
  • IPMI FULL MODE confirmed working on H13SSL-NT using the AST2600 BMC :)
  • No. But I somehow managed to set the lower thresholds to 140 RPM, the other settings failed, still not gotten around to playing more with the set_ipmi_threshold.sh script and figure out working commands for ipmitool.
    Currently everything in the IPMI webui reads N/A for Low NR, Low CT, High CT, High NR, except for Low CT=140 for my connected fans, since none of my fans go below 200 RPM this is not much of a problem right now.

As a quick sidenote, fan_measurement.sh requires a lot longer delay between changing fan levels to pick up the actual change, in the range of 10-15 seconds, or in my case they are still speeding up or slowing down when the measurement is taken, a nice feature would also be to detect when it trips the lowct point and fans spin up to 100% automatically.

@staaled
Copy link

staaled commented Jul 3, 2023

@petersulyok :

So a friend of mine has a dual socket SuperMicro H12 motherboard with 2x EPYC 7551, 5.4 kernel, that outputs:

ls -al /sys/module/k10temp/drivers/pci:k10temp/
total 0
drwxr-xr-x  2 root root    0 Feb 12 20:08 .
drwxr-xr-x 30 root root    0 Feb 12 20:08 ..
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:19.3 -> ../../../../devices/pci0000:00/0000:00:19.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1a.3 -> ../../../../devices/pci0000:00/0000:00:1a.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1b.3 -> ../../../../devices/pci0000:00/0000:00:1b.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1c.3 -> ../../../../devices/pci0000:00/0000:00:1c.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1d.3 -> ../../../../devices/pci0000:00/0000:00:1d.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1e.3 -> ../../../../devices/pci0000:00/0000:00:1e.3
lrwxrwxrwx  1 root root    0 Jul  3 12:08 0000:00:1f.3 -> ../../../../devices/pci0000:00/0000:00:1f.3
--w-------  1 root root 4096 Jul  3 12:08 bind
lrwxrwxrwx  1 root root    0 Jul  3 12:08 module -> ../../../../module/k10temp
--w-------  1 root root 4096 Jul  3 12:08 new_id
--w-------  1 root root 4096 Jul  3 12:08 remove_id
--w-------  1 root root 4096 Feb 12 20:08 uevent
--w-------  1 root root 4096 Jul  3 12:08 unbind

Apparently they all have a Tctl sensor, but no Tccd's


I have another single socket EPYC 7302 on a SuperMicro H12SSL-NT board running 5.15 kernel:

ls -al /sys/module/k10temp/drivers/pci:k10temp/
total 0
drwxr-xr-x  2 root root    0 May  1  2022 .
drwxr-xr-x 34 root root    0 May  1  2022 ..
lrwxrwxrwx  1 root root    0 Jul  3 13:16 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
--w-------  1 root root 4096 Jul  3 13:16 bind
lrwxrwxrwx  1 root root    0 Jul  3 13:16 module -> ../../../../module/k10temp
--w-------  1 root root 4096 Jul  3 13:16 new_id
--w-------  1 root root 4096 Jul  3 13:16 remove_id
--w-------  1 root root 4096 May  1  2022 uevent
--w-------  1 root root 4096 Jul  3 13:16 unbind

With standard sensors output:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +42.5 C  
Tccd1:        +39.8 C  
Tccd3:        +40.0 C  
Tccd5:        +42.2 C  
Tccd7:        +39.8 C  

@petersulyok
Copy link
Owner

Hi @pktiuk, did you manage to setup your system based on the sample here?
I would appreciate to hear your feedback.

@pktiuk
Copy link
Author

pktiuk commented Aug 17, 2023

I haven't done this yet.
Unluckily I don't have too much time in this month for setting this up. But I will keep in mind testing this.

@petersulyok
Copy link
Owner

Let me know if you need some further help. The documentation of the latest v3.0.0 version contains recommendation for AMD users.

@petersulyok petersulyok added the enhancement New feature or request label Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants