Using a BlueField as a Host Machine
— 7 minsA BlueField DPU is pretty much a self contained computer, and even Amazon uses their own DPUs as the management/control plane host for their datacenter switches. Same thing can be done with these. This is proper jank, so beware, but you can do some cool things. There’s some lack of detail here as it’s pretty much written from notes that I had laying around, but should be enough for anyone who wants to tinker.
# Powering a BlueField
Super easy and janky, but you can get one of those wacky mining USB-to-PCIe assemblies, like this one or just search for something like usb to pci mining
in ebay/aliexpress/whatever-else. You can also use another one to connect the DPU to another PCIe card. You’ll also need to connect the 8-pin ATX connector to power too, I didn’t try and see if it worked without it. I assume it’s all common Pos/GND but didn’t test. Probably just connect it.
NVIDIA also states that it’ll draw up to 150W but I’ve seen it only draw about 35W idle, and about 48W with all the ARM cores loaded (just running stress
, so super rudimentary testing). So you can probably get away with a smaller and shittier power supply if you don’t have one that’ll do >=12.5A.
# Flashing BF-Bundle (the easy way)
As a note, this was all done on the latest firmware - if you don’t have the DOCA-Host package installed (or don’t want it, or want to rawdog the image onto the DPU), you can do it this way:
Download the latest bfb from here and keep it, you will need it.
SSH to the BMC IP. The default username is root
and password is 0penBmc
. It’ll ask you to change the password, don’t lose it. The BMC does DHCP by default, and it’s always the highest MAC, but it should also signal that it’s the bmc with the hostname of dpu-bmc
or bluefield-bmc
:
ssh root@<bmc ip>
systemctl start rshim
exit
Now you can SCP the image across. NVIDIA actually seems to indicate that this isn’t possible or shouldn’t be done… but it seems to be the easiest way to get the latest OS + firmware package with the least fucking about.
scp bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb root@<bmc ip>:/dev/rshim0/boot
This will actually take a while, and it’ll be real slow/weird to begin with as it verifies and flashes each part of the BFB to the SRAM in order, so BL1 will validate the BL2R in the BFB, which in turn validates/applies BL2, then BL31/BL33 and so on until you get to the initramfs and then the boot image. So if the SCP hangs, just know it’s not actually broken - it’s working as intended.
Anyway, once this is done it’ll (re)boot and boot up into a freshly flashed BlueField flavoured Ubuntu. The default username is ubuntu
and password is ubuntu
. It’ll make you change it to something else.
# Configuring PCIe
The reason why I mentioned flashing it is that some BFs come with an older version of mlxconfig
which don’t support the required PCI_BUS
variables, so if you skip to here and it doesn’t work - go back.
Anyway, this is super straightforward and it lets you connect a GPU to a BlueField and run games on it, or whatever else you’d like to do.
Probably good to reset the config before doing this, but you probably don’t have to either, but it’s easy enough:
mlxconfig -d /dev/mst/mt41692_pciconf0 -y reset
Configuring the goldfingers into x16 with the root port on the ARM CPU:
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=5
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=4
The same can also be done for the component side connector (silkscreen will say BLACK CABLE
):
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_WIDTH=5
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_SPEED=4
# But what does this mean?
HIERARCHY_TYPE
is basically if you have an external vs internal root port, which looks like the following:
val | description | enum value |
---|---|---|
0 | PCIE_ENDPOINT | |
1 | use host as root port | PCIE_EXTERNAL_HOST_SWITCH |
2 | use ARM as root port | PCIE_INTERNAL_HOST_SWITCH |
WIDTH
and SPEED
maps to the link width and speed, and follows the following format:
Config value | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Speed | ? | Gen1 | Gen2 | Gen3 | Gen4/5 |
Config value | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|
Width | ? | x1 | x2 | x4 | x8 | x16 |
You can actually bifurcate this to have 8 x2, like so:
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_SPEED=4
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_WIDTH=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_SPEED=4
Use PCI_BUS1n
for the BLACK CABLE
connector.
Anyway, reboot once you have this configured how you like it, and you should be good to go.
# Connecting Something Else
As mentioned, you can use a second janky mining card to do the other side. The actual trick is finding 2 different versions where one swaps the TX/RX pairs and the other doesn’t, or flipping the TX/RX through a modified USB cable. Either way, depending on how you configure this, you may need to configure this bifurcation so that its Gen4/5 + x1 for the link to come up (or even use a lower speed, some of these things only do Gen2/3!):
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=1
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=3
Would have the first link at x1 width, which should work with any dodgy mining gear you find. Assuming all is good, you should see whatever PCIe device in lspci
:
# lspci | grep 3D
04:00.0 3D controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Ti] (rev 01)
Lol, lmao even.
# Bonus Round: Removing Vendor Firmware
Between the RJ45 connector and the QSFP slot, there’s two ‘holes’ with a square around them and FNP
(firmware not present) written on the silkscreen. It’s in different places depending on the card you have, but it’s seemingly where the QSFP connectors are, or on the top side of the card. On a BF2 its BF2_FNP
on the silkscreen.
If you short these out, the card goes into livefish mode, which is basically just ConnectX SoC flash recovery mode. This seemed to behave weirdly with the DOCA version of mst though, so if you experience the same, try the open-source version here which did work for me: https://github.com/Mellanox/mstflint
Otherwise, you can do this with ipmitool and the DPU BMC (but it didn’t work for me for whatever reason, maybe there’s another undocumented step before OEM commands work):
# enable livefish
ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P 0penBmc raw 0x32 0x92
# disable
ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P 0penBmc raw 0x32 0x93
More OEM commands here: https://docs.nvidia.com/networking/display/bluefieldbmcv2507/appendix+-+nvidia+oem+ipmi+commands
Anyway, once you do this you should see a BlueField-3 SoC Flash Recovery
in lspci
. You can use the bus address, or just continue to use the mst device:
flint -d /dev/mst/mt41692_pciconf0 -ocr hw set Flash0.WriteProtected=Disabled
Then you can write any firmware image:
flint -d /dev/mst/mt41692_pciconf0 --ignore_dev_data --allow_psid_change --ocr -i <firmware.bin> burn
This also lets you go back to upstream NVIDIA firmware if you have some scuffed bullshit vendor firmware ConnectX card that gets delayed/no firmware updates (or you just want to mlxfwmanager -u --online
). The --ignore_dev_data --allow_psid_change
is the critical bit here if you’re doing that, otherwise not necessary.
You can enable write protection again after this (but optional, lol), and then make sure those pins aren’t shorted (or ipmitool
it back to usual) and power cycle the card and you should have your new Fun Firmware.
Anyway, no idea who the audience for this would be. Have fun I guess.