NVIDIA Tesla GPU passthrough to LXC container on Proxmox

Quick note to myself on getting Tesla P100 and P4 cards accessible inside a privileged LXC container on Proxmox 8.4 for PyTorch workloads. This is the LXC approach — not VM passthrough via vfio. LXC shares the host kernel.

Hardware in my case: Tesla P100 PCIe 16GB + Tesla P4, both in a SuperMicro SYS-1027GR-TRF running Proxmox 8.4.

Blacklist nouveau

The open-source nouveau driver will claim the cards at boot if not blacklisted. Add to /etc/modprobe.d/pve-blacklist.conf:

echo "blacklist nouveau" >> /etc/modprobe.d/pve-blacklist.conf
echo "options nouveau modeset=0" >> /etc/modprobe.d/pve-blacklist.conf
update-initramfs -u -k all

Install NVIDIA driver on the host

The kernel modules live on the Proxmox host — the container shares them. Install the Tesla driver on the host using the silent flags to skip the interactive installer:

apt install -y pve-headers-$(uname -r)
wget https://us.download.nvidia.com/tesla/535.161.08/NVIDIA-Linux-x86_64-535.161.08.run
chmod +x NVIDIA-Linux-x86_64-535.161.08.run
./NVIDIA-Linux-x86_64-535.161.08.run --no-questions --ui=none

Verify both cards are visible and healthy:

nvidia-smi

If you ever need to uninstall cleanly: nvidia-uninstall

Note: If you were previously running GRID/vGPU drivers (e.g. for P4 vGPU on Windows VMs), uninstall those first with nvidia-uninstall and reboot before installing the standard Tesla driver. GRID and standard drivers conflict — you can’t run both.

Enable persistence mode

Persistence mode keeps the driver actively holding the GPUs rather than releasing them when idle. On Tesla cards this prevents initialization delays and potential issues when a process first tries to use the GPU.

nvidia-smi -pm 1

Make it permanent with a systemd oneshot service:

cat > /etc/systemd/system/nvidia-persistence.service << 'EOF'
[Unit]
Description=NVIDIA Persistence Mode
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm 1
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

systemctl enable nvidia-persistence.service
systemctl start nvidia-persistence.service

Check device major numbers

Before configuring the container, check the actual major numbers assigned to the NVIDIA devices — these can vary between systems and kernel versions:

ls -la /dev/nvidia* | grep -v caps

On my system:

crw-rw-rw- 1 root root 195,   0  /dev/nvidia0
crw-rw-rw- 1 root root 195,   1  /dev/nvidia1
crw-rw-rw- 1 root root 195, 255  /dev/nvidiactl
crw-rw-rw- 1 root root 511,   0  /dev/nvidia-uvm
crw-rw-rw- 1 root root 511,   1  /dev/nvidia-uvm-tools
cr-------- 1 root root 236,   1  /dev/nvidia-caps/nvidia-cap1
cr--r--r-- 1 root root 236,   2  /dev/nvidia-caps/nvidia-cap2

Major numbers here are 195, 511, and 236. Use whatever your system shows — don’t assume they’ll match mine.

Configure the LXC container

The container must be privileged. Add the following to /etc/pve/lxc/<CTID>.conf, substituting the major numbers you found above:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.cgroup2.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap1 dev/nvidia-caps/nvidia-cap1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-caps/nvidia-cap2 dev/nvidia-caps/nvidia-cap2 none bind,optional,create=file

Install NVIDIA userspace inside the container

The kernel modules are on the host — the container just needs the userspace libraries. Use the same driver version as the host, with the --no-kernel-modules flag:

wget https://us.download.nvidia.com/tesla/535.161.08/NVIDIA-Linux-x86_64-535.161.08.run
chmod +x NVIDIA-Linux-x86_64-535.161.08.run
./NVIDIA-Linux-x86_64-535.161.08.run --no-kernel-modules --no-questions --ui=none

Important: The userspace version must match the host driver version exactly. A mismatch — even a minor one — causes a driver version error. This is also why you can’t mix GRID drivers on the host with standard drivers in the container; they’re different builds even at the same version number.

Verify

From inside the container:

nvidia-smi

Both cards should be visible with full memory. For PyTorch specifically:

python3 -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count()); print([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())])"

Should print True, 2, and the card names.

Why LXC and not VM passthrough?

I spent a long time trying to get these P100s working via vfio passthrough into a KVM VM first, on a different SuperMicro host in the same cluster. The cards kept entering D3cold power states, the QEMU IRQ handler would assert on startup, and even when it did boot the nvidia driver inside the VM would hit Xid 79 and lose the cards. The PCIe power management issues were part of it, but the chassis airflow was also inadequate for 250W cards — something I only realised later. The P100 is a passive card that depends entirely on chassis airflow.

The SYS-1027GR-TRF is a different matter — it’s a 1U chassis purpose-built for GPU compute, with counter-rotating fans and a layout that puts direct airflow across the cards. Thermals are fine. And switching to LXC rather than vfio VM passthrough sidestepped the PCIe layer issues entirely. LXC shares the host kernel, so there’s no virtual PCIe topology for QEMU to mishandle, no FLR reset requirements, no IRQ mapping. The host kernel talks directly to the hardware and the container just uses the same kernel.

The tradeoff is isolation — a privileged LXC container is less isolated than a VM. For a dedicated ML/inference workload node that’s an acceptable tradeoff.

doofer.org