Quick note to myself (and anyone else who hits this) on getting IPMI working and flashing BMC firmware on the SuperMicro X9DRG nodes in my cluster. Two separate problems, documenting both.
All my X9 nodes had IPMI ports wired up but remote access via ipmitool was failing with:
Error: Unable to establish IPMI v2 / RMCP+ session
Ping worked, nmap showed port 623 open, but no session would establish. The ADMIN user existed but didn’t actually have Administrator privileges set, so it wasn’t authorised to open sessions. Fix from the host OS:
ipmitool user list 1
ipmitool channel setaccess 1 2 callin=on link=on ipmi=on privilege=4
ipmitool user set password 2 yournewpassword
User ID 2 is ADMIN on SuperMicro boards. Privilege level 4 is Administrator. That was the real fix for remote IPMI access.
Correct ipmitool syntax for these boards once the user permissions are sorted:
ipmitool -H 10.4.10.x -U ADMIN -P yourpassword -I lanplus -C 3 chassis status
The -C 3 cipher suite flag matters. Without it, sessions fail even with correct credentials on some firmware versions.
I keep credentials in /etc/ipmi-secrets.conf (chmod 600) and use a wrapper script called doipmi so I don’t have to type all that out every time:
~/bin/doipmi feronia chassis status
~/bin/doipmi feronia chassis power cycle
That’s IPMI sorted. The BMC firmware flash is a separate story.
The X9 nodes shipped with very old BMC firmware (1.73). It worked fine for basic IPMI once the user privileges were fixed, but I wanted to update it because I was trying to get sensible fan control working for the GPUs in these chassis, and the older firmware’s web interface is pretty limited. I went to 3.62, which is the latest for the X9DRG series.
SuperMicro’s lUpdate tool is what you want. Download from the SuperMicro support site (requires an account, which is annoying but necessary). You need the lUpdate Linux binary (v1.21) and SMT_X9_362.bin for the X9DRG series.
Always flash via LAN, not KCS. KCS (the local interface) is painfully slow for dumps. LAN is fast.
Backup the existing firmware first, optional but good practice:
./lUpdate -d backup.bin -i lan -h 10.4.10.x 623 -u ADMIN -p yourpassword -r y
Note that on some nodes with very old firmware, the dump fails with ERROR:EXECUTE CHECK DUMP FLASH COMPLETE FAILED. This is a quirk of old firmware. Skip the dump and go straight to flashing if it happens.
Then flash:
./lUpdate -f SMT_X9_362.bin -i lan -h 10.4.10.x 623 -u ADMIN -p yourpassword -r y
The -r y flag preserves existing BMC config (network settings, user accounts). Don’t omit it.
The tool prints its full usage banner before doing anything. This looks like an error but isn’t. Then it connects and shows progress. Be patient, it’s slow to initialise.
Phase1:Wait for BMC !!
Phase1:100% !!
Phase2:...
If you get ERROR:SEND “GetFWUpdateInfo” COMMAND TO BMC FAILED, the BMC is in a stuck state (usually from a failed dump attempt). Reset it and try again:
ipmitool mc reset cold
# wait 60-90 seconds
./lUpdate -f SMT_X9_362.bin -i lan -h 10.4.10.x 623 -u ADMIN -p yourpassword -r y
If it still fails after a cold reset, fall back to KCS:
./lUpdate -f SMT_X9_362.bin -i kcs -r y
KCS is slower but bypasses the network stack entirely and is more reliable when the BMC is being difficult.
Verify after the flash:
ipmitool mc info
# Firmware Revision should show: 3.62
The BMC takes 60-90 seconds to reinitialise after flashing before it’ll accept connections again.
The BMC web interface at http://10.4.10.x is now available and much improved on 3.62. Fan control lives under Configuration → Fan, with two modes exposed on these boards: Optimal and Full. No zone control via the web UI.
One gotcha worth noting on the X9DRG-HF/QF+ boards: the BMC cannot read GPU temperatures. The GPU sensor slots exist in the SDR but show “No Reading”. Fan control is therefore blind to GPU load, which matters if you’ve stuffed these chassis with Teslas. Raw IPMI fan control commands (0x30 0x70 0x66…) are also unreliable on these boards even after the firmware update. Getting sensible automatic fan control going needed a separate script. I’ll write that up in another post.