- 2024-12-29
- it
PCI Layout inside virtual machine
At $Dayjob I have managed to with PCI passthrough pass in devices(GPU and IB) into a VM. But for some reason an all_reduce is still slow (1/10 with 2 full nodes in BM). First attempted layout was something like: upstream_port ^ downstream_port ^ ^ ^ ^ ^ GPU1 GPU2 GPU3 .. GPU8 IB1 IB2 IB3 …