adam's stuff

Using a BlueField as a Host Machine

— 7 mins

A BlueField DPU is pretty much a self contained computer, and even Amazon uses their own DPUs as the management/control plane host for their datacenter switches. Same thing can be done with these. This is proper jank, so beware, but you can do some cool things. There’s some lack of detail here as it’s pretty much written from notes that I had laying around, but should be enough for anyone who wants to tinker.

# Powering a BlueField

Super easy and janky, but you can get one of those wacky mining USB-to-PCIe assemblies, like this one or just search for something like usb to pci mining in ebay/aliexpress/whatever-else. You can also use another one to connect the DPU to another PCIe card. You’ll also need to connect the 8-pin ATX connector to power too, I didn’t try and see if it worked without it. I assume it’s all common Pos/GND but didn’t test. Probably just connect it.

NVIDIA also states that it’ll draw up to 150W but I’ve seen it only draw about 35W idle, and about 48W with all the ARM cores loaded (just running stress, so super rudimentary testing). So you can probably get away with a smaller and shittier power supply if you don’t have one that’ll do >=12.5A.

# Flashing BF-Bundle (the easy way)

As a note, this was all done on the latest firmware - if you don’t have the DOCA-Host package installed (or don’t want it, or want to rawdog the image onto the DPU), you can do it this way:

Download the latest bfb from here and keep it, you will need it.

SSH to the BMC IP. The default username is root and password is 0penBmc. It’ll ask you to change the password, don’t lose it. The BMC does DHCP by default, and it’s always the highest MAC, but it should also signal that it’s the bmc with the hostname of dpu-bmc or bluefield-bmc:

ssh root@<bmc ip>
systemctl start rshim
exit

Now you can SCP the image across. NVIDIA actually seems to indicate that this isn’t possible or shouldn’t be done… but it seems to be the easiest way to get the latest OS + firmware package with the least fucking about.

scp bf-bundle-3.1.0-76_25.07_ubuntu-22.04_prod.bfb root@<bmc ip>:/dev/rshim0/boot

This will actually take a while, and it’ll be real slow/weird to begin with as it verifies and flashes each part of the BFB to the SRAM in order, so BL1 will validate the BL2R in the BFB, which in turn validates/applies BL2, then BL31/BL33 and so on until you get to the initramfs and then the boot image. So if the SCP hangs, just know it’s not actually broken - it’s working as intended.

Anyway, once this is done it’ll (re)boot and boot up into a freshly flashed BlueField flavoured Ubuntu. The default username is ubuntu and password is ubuntu. It’ll make you change it to something else.

# Configuring PCIe

The reason why I mentioned flashing it is that some BFs come with an older version of mlxconfig which don’t support the required PCI_BUS variables, so if you skip to here and it doesn’t work - go back.

Anyway, this is super straightforward and it lets you connect a GPU to a BlueField and run games on it, or whatever else you’d like to do.

Probably good to reset the config before doing this, but you probably don’t have to either, but it’s easy enough:

mlxconfig -d /dev/mst/mt41692_pciconf0 -y reset

Configuring the goldfingers into x16 with the root port on the ARM CPU:

mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=5
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=4

The same can also be done for the component side connector (silkscreen will say BLACK CABLE):

mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_WIDTH=5 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS10_SPEED=4

# But what does this mean?

HIERARCHY_TYPE is basically if you have an external vs internal root port, which looks like the following:

valdescriptionenum value
0PCIE_ENDPOINT
1use host as root portPCIE_EXTERNAL_HOST_SWITCH
2use ARM as root portPCIE_INTERNAL_HOST_SWITCH

WIDTH and SPEED maps to the link width and speed, and follows the following format:

Config value01234
Speed?Gen1Gen2Gen3Gen4/5
Config value012345
Width?x1x2x4x8x16

You can actually bifurcate this to have 8 x2, like so:

mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS01_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS02_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS03_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS04_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS05_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS06_SPEED=4 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_WIDTH=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS07_SPEED=4 

Use PCI_BUS1n for the BLACK CABLE connector.

Anyway, reboot once you have this configured how you like it, and you should be good to go.

# Connecting Something Else

As mentioned, you can use a second janky mining card to do the other side. The actual trick is finding 2 different versions where one swaps the TX/RX pairs and the other doesn’t, or flipping the TX/RX through a modified USB cable. Either way, depending on how you configure this, you may need to configure this bifurcation so that its Gen4/5 + x1 for the link to come up (or even use a lower speed, some of these things only do Gen2/3!):

mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_HIERARCHY_TYPE=2 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_WIDTH=1 
mlxconfig -d /dev/mst/mt41692_pciconf0 s PCI_BUS00_SPEED=3

Would have the first link at x1 width, which should work with any dodgy mining gear you find. Assuming all is good, you should see whatever PCIe device in lspci:

# lspci | grep 3D
04:00.0 3D controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Ti] (rev 01)

Lol, lmao even.

# Bonus Round: Removing Vendor Firmware

Between the RJ45 connector and the QSFP slot, there’s two ‘holes’ with a square around them and FNP (firmware not present) written on the silkscreen. It’s in different places depending on the card you have, but it’s seemingly where the QSFP connectors are, or on the top side of the card. On a BF2 its BF2_FNP on the silkscreen.

If you short these out, the card goes into livefish mode, which is basically just ConnectX SoC flash recovery mode. This seemed to behave weirdly with the DOCA version of mst though, so if you experience the same, try the open-source version here which did work for me: https://github.com/Mellanox/mstflint

Otherwise, you can do this with ipmitool and the DPU BMC (but it didn’t work for me for whatever reason, maybe there’s another undocumented step before OEM commands work):

# enable livefish
ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P 0penBmc raw 0x32 0x92
# disable
ipmitool -C 17 -I lanplus -H <bmc_ip> -U root -P 0penBmc raw 0x32 0x93

More OEM commands here: https://docs.nvidia.com/networking/display/bluefieldbmcv2507/appendix+-+nvidia+oem+ipmi+commands

Anyway, once you do this you should see a BlueField-3 SoC Flash Recovery in lspci. You can use the bus address, or just continue to use the mst device:

flint -d /dev/mst/mt41692_pciconf0 -ocr hw set Flash0.WriteProtected=Disabled

Then you can write any firmware image:

flint -d /dev/mst/mt41692_pciconf0 --ignore_dev_data --allow_psid_change --ocr -i <firmware.bin> burn

This also lets you go back to upstream NVIDIA firmware if you have some scuffed bullshit vendor firmware ConnectX card that gets delayed/no firmware updates (or you just want to mlxfwmanager -u --online). The --ignore_dev_data --allow_psid_change is the critical bit here if you’re doing that, otherwise not necessary.

You can enable write protection again after this (but optional, lol), and then make sure those pins aren’t shorted (or ipmitool it back to usual) and power cycle the card and you should have your new Fun Firmware.

Anyway, no idea who the audience for this would be. Have fun I guess.