this post was submitted on 19 Dec 2023
7 points (88.9% liked)

Proxmox

969 readers
1 users here now

Proxmox VE is a complete, open-source server management platform for enterprise virtualization. It tightly integrates the KVM hypervisor and Linux Containers (LXC), software-defined storage and networking functionality, on a single platform. With the integrated web-based user interface you can manage VMs and containers, high availability for clusters, or the integrated disaster recovery tools with ease.

Proxmox VE Official site

K3S on Proxmox LXC

founded 1 year ago
MODERATORS
7
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]
 

I've been using a Proxmox home server for quite some time now without many problems. Recently i got an AMD Navi 10 RX 5700 XT and tried to pass it through to a windows VM. I mainly followed the official Proxmox guide but got it running by using some other tutorials too. For now, it works once after i reboot the host. Then its no problem to start the VM, but after a restart the VM doesnt start no more, showing this error: swtpm_setup: Not overwriting existing state file. kvm: ../hw/pci/pci.c:1637: pci_irq_handler: Assertion 0 <= irq_num && irq_num < PCI_NUM_PINS' failed. stopping swtpm instance (pid 98348) due to QEMU startup error TASK ERROR: start failed: QEMU exited with code -1` I tried fixing it using this but it didnt change much.

EDIT: link was not shown

top 12 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 11 months ago (1 children)

Maybe this?
https://github.com/gnif/vendor-reset
Although I've been passing through a vega64 without needing this.

[–] [email protected] 2 points 11 months ago (1 children)

Yeah, i tried that - the link was just not shown in the original post That didnt really fix it

[–] [email protected] 1 points 11 months ago (1 children)

Try journalctl to get more details from when it fails?

[–] [email protected] 2 points 11 months ago* (last edited 11 months ago) (2 children)

This is the output from journalctl, since stopping and rebooting the VM: Main error seems to occur at 16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK

Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:

Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible

Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.

Dec 19 16:41:41 pve systemd[1]: Started 195.scope.

Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state

Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state

Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state

Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state

Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running

Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s

Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed

Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting

Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting

Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting

Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting

Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting

Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode

Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state

Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error

Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1

Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>

Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.

Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.`

[–] [email protected] 2 points 11 months ago* (last edited 11 months ago) (1 children)

dmesg also reported vendor_reset: module verification failed: signature and/or required key missing - tainting kernel However, according to https://github.com/gnif/vendor-reset/issues/46#issuecomment-983087796 this error is not as important...

[–] [email protected] 2 points 11 months ago* (last edited 11 months ago) (1 children)

To everyone else encountering this error, I finally fixed it this way: This forum entry sent me here, which then helped me resolve the issue. Huge thanks to you, InEnduringGrowStrong, for pushing me in the right direction.

[–] [email protected] 2 points 11 months ago (1 children)

Ah nice you got it working.
Once it works it's great.
I've been running mine for a while now, but purposefully avoided Kernel upgrades so far.

[–] [email protected] 1 points 11 months ago

Haha, I already started worrying about that :) But you‘re right, its great.

[–] [email protected] 2 points 11 months ago (2 children)

Formatted with a code block so it's more readable:

16:41:43 `Dec 19 16:40:45 pve pvedaemon[1590]: end task UPID:pve:00030675:000E7952:6581B96F:vncshell::root@pam: OK
Dec 19 16:40:47 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:41:03 pve pvedaemon[1590]: starting task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:03 pve pvedaemon[198894]: start VM 195: UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam:
Dec 19 16:41:06 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:41:40 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
Dec 19 16:41:41 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:41:41 pve systemd[1]: 195.scope: Consumed 54min 2.778s CPU time.
Dec 19 16:41:41 pve systemd[1]: Started 195.scope.
Dec 19 16:41:41 pve kernel: tap195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered disabled state
Dec 19 16:41:41 pve kernel: fwpr195p0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwpr195p0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered blocking state
Dec 19 16:41:41 pve kernel: vmbr0: port 4(fwpr195p0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered disabled state
Dec 19 16:41:41 pve kernel: fwln195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwln195i0: entered promiscuous mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 1(fwln195i0) entered forwarding state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:41:41 pve kernel: tap195i0: entered allmulticast mode
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered blocking state
Dec 19 16:41:41 pve kernel: fwbr195i0: port 2(tap195i0) entered forwarding state
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:43 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:41:44 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:44 pve pvedaemon[1592]: VM 195 qmp command failed - VM 195 not running
Dec 19 16:41:45 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:46 pve kernel: pcieport 0000:02:00.0: broken device, retraining non-functional downstream link at 2.5GT/s
Dec 19 16:41:47 pve kernel: pcieport 0000:02:00.0: retraining failed
Dec 19 16:41:47 pve kernel: vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; waiting
Dec 19 16:41:48 pve kernel: vfio-pci 0000:03:00.0: not ready 2047ms after bus reset; waiting
Dec 19 16:41:50 pve kernel: vfio-pci 0000:03:00.0: not ready 4095ms after bus reset; waiting
Dec 19 16:41:54 pve kernel: vfio-pci 0000:03:00.0: not ready 8191ms after bus reset; waiting
Dec 19 16:42:03 pve kernel: vfio-pci 0000:03:00.0: not ready 16383ms after bus reset; waiting
Dec 19 16:42:21 pve kernel: vfio-pci 0000:03:00.0: not ready 32767ms after bus reset; waiting
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: not ready 65535ms after bus reset; giving up
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve kernel: tap195i0 (unregistering): left allmulticast mode
Dec 19 16:42:56 pve kernel: fwbr195i0: port 2(tap195i0) entered disabled state
Dec 19 16:42:56 pve pvedaemon[199553]: stopping swtpm instance (pid 199561) due to QEMU startup error
Dec 19 16:42:56 pve pvedaemon[198894]: start failed: QEMU exited with code 1
Dec 19 16:42:56 pve pvedaemon[1590]: end task UPID:pve:000308EE:000E85EB:6581B98F:qmstart:195:root@pam: start failed: QEMU exit>
Dec 19 16:42:56 pve systemd[1]: 195.scope: Deactivated successfully.
Dec 19 16:42:56 pve systemd[1]: 195.scope: Consumed 1.736s CPU time.
[–] [email protected] 2 points 11 months ago (1 children)

It does seem a lot like the reset bug, but then you already tried that. :/ Kernel module aren't as easy to install and if you're missing the required flags it might just do nothing.

grep -E '(CONFIG_FTRACE|CONFIG_KPROBES|CONFIG_PCI_QUIRKS|CONFIG_KALLSYMS|CONFIG_KALLSYMS_ALL|CONFIG_FUNCTION_TRACER)\b' /boot/config-`uname -r`  

Should show the 6 flags =y

Or maybe some variation of manual reset...
https://forum.proxmox.com/threads/issues-with-intel-arc-a770m-gpu-passthrough-on-nuc12snki72-vfio-pci-not-ready-after-flr-or-bus-reset.130667/

[–] [email protected] 2 points 11 months ago

Just fyi, the 6 y-flags were shown

[–] [email protected] 1 points 11 months ago

It was inteded to be a code block, but that way it was just a bunch of text without newlines somehow