#ats Articles


  • 2024-12-29
  • it

PCI Layout inside virtual machine

At $Dayjob I have managed to with PCI passthrough pass in devices(GPU and IB) into a VM. But for some reason an all_reduce is still slow (1/10 with 2 full nodes in BM). First attempted layout was something like: upstream_port ^ downstream_port ^ ^ ^ ^ ^ GPU1 GPU2 GPU3 .. GPU8 IB1 IB2 IB3 …