Comprehensive guide to performance optimizations for gaming on virtual machines with KVM/QEMU and PCI passthrough

Preamble

This guide describes performance optimizations for gaming on a virtual machine (VM) with GPU passthrough.

In order to optimize the user experience for virtualized gaming, I started to pursue low latency and high performance. Especially since my main focus are demanding online multiplayer games.

The Benchmarking Process

I started benchmarking several setups (found on level1tech, reddit and the arch wiki) using Superpossition Demo to compare perfomance results, and LatencyMon 6.70 in order to measure the input lag.

Unfortunately, after several hours of testing I had to learn that those two categories weren’t enough to guarantee a decent gaming experience. Additionally, even when I had one game running smoothly it was not guaranteed that others were not stuttering.

I had to extend my observations.

Instead of testing the performance with the Superpossition demo (measuring FPS only), I used three games to measure my progress.

  • Player Unknowns Battlegrounds (PUBG)
    • which is CPU demanding
  • Apex Legends
    • which is GPU demanding
  • Blizzards Overwatch (OW)
    • which has a pretty low profile

As the previous tests had shown, FPS alone are no reliable performance measurement in a virtual gaming environment.

Thus I came up with the following rules:

  1. Input lag  – the smallest possible input latency is crucial for gaming.
  2. Consistency – no freezes and/or stuttering should
  3. Performance – more is better, gaming performance is measured in frames per second (fps)
  4. Stability – there shall be no crashes!
  5. Compatibility – don’t get banned, work fluently with anti-cheat tools etc.

Whereas 1 is more important than 2 is more important than 3 and so on.

Benchmarking Results

I have forged two libvirt XML files during the tests.

One for i440fx and one for Q35 chipsets. I reached the better gaming experience on a virtual machine with i440fx, Windows 10 1803 configuration.

Passages from QEMU 4.1, i440fx chipset libvirt config file

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>win10-1803-i440fx</name>
  <uuid>073f2a4e-5ab2-4bc7-99c2-2ac006adc87e</uuid>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking> <!-- hugepages enable -->
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <emulatorpin cpuset='0-3'/>
    <iothreadpin iothread='1' cpuset='0-1'/>
    <iothreadpin iothread='2' cpuset='2-3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
       [...]
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
<vendor_id state='on' value='1234567890ab'/> <!-- nvidia error code 43 prevention -->
<frequencies state='on'/> </hyperv> <kvm> <hidden state='on'/> <!-- nvidia error code 43 prevention --> </kvm> <vmport state='off'/> <ioapic driver='kvm'/> <!-- required for QEMU 4.0 or later --> </features> <cpu mode='custom' match='exact' check='none'>
<model fallback='allow'>EPYC</model>
<topology sockets='1' cores='4' threads='2'/>
<feature policy='require' name='topoext'/>
<feature policy='require' name='svm'/>
<feature policy='require' name='apic'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='invtsc'/>
</cpu>
<clock offset='localtime'> <timer name='rtc' present='no' tickpolicy='catchup'/> <timer name='pit' present='no' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='hypervclock' present='yes'/> <timer name='tsc' present='yes' mode='native'/> </clock> [...] <devices> <emulator>/usr/local/bin/qemu4.1-system-x86_64</emulator> [...] </devices> <qemu:commandline> <qemu:env name='QEMU_AUDIO_DRV' value='pa'/> <qemu:env name='QEMU_PA_SAMPLES' value='8192'/> <qemu:env name='QEMU_AUDIO_TIMER_PERIOD' value='99'/> <qemu:env name='QEMU_PA_SERVER' value='/run/user/1000/pulse/native'/> </qemu:commandline> </domain>

[collapse]

With the Q35 settings I reach lower input latency (with stutters though).

These are my settings for Q35, Windows 10 1903.

Passages from QEMU 4.1, Q35 chipset libvirt config file

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>win10-1903-q35</name>
  <uuid>cc37803d-a904-44cd-a333-5830ce22d20f</uuid>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking> <!-- hugepages enable -->
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <emulatorpin cpuset='0-3'/>
    <iothreadpin iothread='1' cpuset='0-1'/>
    <iothreadpin iothread='2' cpuset='2-3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-4.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
       [...]
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
<vapic state='on'/>
<spinlocks state='on' retries='8191'/>
<vpindex state='on'/>
<synic state='on'/>
<stimer state='on'/>
<reset state='on'/>
<vendor_id state='on' value='1234567890ab'/> <!-- nvidia error code 43 prevention -->
<frequencies state='on'/> </hyperv> <kvm> <hidden state='on'/> <!-- nvidia error code 43 prevention --> </kvm> <vmport state='off'/> <ioapic driver='kvm'/> <!-- required for QEMU 4.0 or later --> </features> <cpu mode='custom' match='exact' check='none'>
<model fallback='allow'>EPYC</model>
<topology sockets='1' cores='4' threads='2'/>
<feature policy='require' name='topoext'/>
<feature policy='require' name='svm'/>
<feature policy='require' name='apic'/>
<feature policy='require' name='hypervisor'/>
<feature policy='require' name='invtsc'/>
</cpu>
<clock offset='localtime'> <timer name='rtc' present='no' tickpolicy='catchup'/> <timer name='pit' present='no' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='kvmclock' present='no'/> <timer name='hypervclock' present='yes'/> <timer name='tsc' present='yes' mode='native'/> </clock> [...] <devices> <emulator>/usr/local/bin/qemu4.1-system-x86_64</emulator> [...] </devices> <qemu:commandline> <qemu:env name='QEMU_AUDIO_DRV' value='pa'/> <qemu:env name='QEMU_PA_SAMPLES' value='8192'/> <qemu:env name='QEMU_AUDIO_TIMER_PERIOD' value='99'/> <qemu:env name='QEMU_PA_SERVER' value='/run/user/1000/pulse/native'/> </qemu:commandline> </domain>

[collapse]

The following chapters will give you some optimization tips for Host, Guest and hopefully some insights behind the picked libvirt settings.

Is this content any helpful? Then please consider supporting me.

If you appreciate the content I create, this is your chance to give something back and earn some good old karma.

Although ads are fun to play with, and very important for content creators, I felt a strong hypocrisy in putting ads on my website. Even though, I always try to minimize the data collecting part to a minimum.

Thus, please consider supporting this website directly 😘

Overview – the GPU Passthrough Setup

I am using an AMD Ryzen platform for my GPU passthrough setup. This means, some optimizations are especially (or only) relevant to Ryzen CPUs, while others are relevant to any system. I will try to mark these settings throughout the article.

Hardware components

  • CPU: AMD Ryzen 7 1800x (8 Core @3.6GHz)
  • RAM: 32GB DDR4 RAM (@2800MHz)
  • Mainboard: ASUS Prime x370 pro (BIOS version 4207)

Attention! The ASUS Prime x370 pro BIOS versions for RYZEN 3000-series support (up to current latest version 5220 and further), will break a PCI passthrough setup. Error “Unknown PCI header type ‘127’ “. BIOS versions up to (and including) 4406 are working.

Software Components

The Ubuntu host

  • OS: (X)Ubuntu 18.04
  • Kernel: 5.3.6
  • Hypervisor: QEMU version 4.1
  • Manager: Libvirt version 4.7

The Windows virtual machine (guest)

  • Windows 10 version 1903 on Q35 chip
  • Windows 10 version 1803 on i440fx chip
  • Nvidia Driver version 436.68

Attention! A known bug for libvirt and Windows 10 1903: Do not use 6ch/9ch audio devices in the virtual machine, as it creates awful stuttering and performance loss. Using ac97 audio fixes this issue.

Host OS Optimization

CPU Governor Settings

Right after your Host system has booted, the CPU governor settings are usually set to “on demand“. When the CPU requires a boost for a process it will be allowed, in other cases it saves energy.

Unfortunately does the boost trigger from within a virtual machine not work consistent in my tests.

Thus, I force CPU governor setting to “performance” on the host, while the virtual machine is running. The downside is higher energy consumption. :

I use this bash script to enable performance.

#!/bin/bash
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "performance" > $file; done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

I use this bash script to enable on-demand afterwards.

#!/bin/bash
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
for file in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo "ondemand" > $file; done
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Unfortunately I have lost the exact source to give credit for the code. source1, source2

QEMU and LIBVIRT Versions

QEMU version 3.1 or higher is recommended as it adds improved SMT support for Ryzen CPUs. Ubuntu-1804-LTS’s version is currently 2.12.0. So I recommend upgrading by either building QEMU on your own (read here) or download it via custom ppa.

You can check the libvirt documentation for suppoorted commands and version matching. Why do we need a higher Libvirt version? Not all libvirt versions support all XML commands (link to documentation – )

Clocksource

Make sure ‘tsc’ is set as clock source. You can check this via:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

this should return ‘tsc

Guest OS Optimization

The optimization recommendations in this chapter are not solely relevant for virtual machines. They can be used as for Windows in general.

Windows 1903 is used on purpose, because it brings better Ryzen SMT support.

Enable MSI Interrupts

One can enable MSI interrupts for passed through hardware with the MSI_util_v2. Get it from CHEF-KOCHs git-hub repository. the MSIInturuptEnabler (thank you Mark).

Spectre Patches (optional)

it is possible to disable the spectre patches in the system using InSpectre tool (downside less secure). Heiko Sieger has an article about the topic.

Is this content any helpful? Then please consider supporting me.

If you appreciate the content I create, this is your chance to give something back and earn some good old karma.

Although ads are fun to play with, and very important for content creators, I felt a strong hypocrisy in putting ads on my website. Even though, I always try to minimize the data collecting part to a minimum.

Thus, please consider supporting this website directly 😘

Virtual machine Configuration Optimizations

Five sections, in the Libvirt virtual machine configuration, are crucial in order to optimize the virtual machines performance:

  • CPU pinning
  • CPU model information
  • Hyper-V enlightments
  • Clock settings
  • Hugepages

CPU Pinning

CPU-pinning will allocate CPU-cores for mainly (or solely) Guest tasks, when the Guest is running. If everything works as expected, the Host will not use the Guest allocated CPU cores.

One could go one even further and restrict access to the guest cores completely, even if the guest isn’t running. This would use the isolcpus kernel command line flag at boot time. I do not use this feature as I would like to have maximum Host performance if the Guest is not running.

AMD Ryzen CPU architecture

Ryzen CPU architecture

The AMD Ryzen architecture houses 8 physical cores, each core capable of handling two threads. This leads to a total of 16 cores available for pinning. The 8 cores are separated into two complexes of 4 cores called CCX. Each CCX has its own L3 cache.   The plan is to have one CCX for the host, and one CCX for the guest. As the hosts runs first, ill assume it will use the (first) CCX with cores 0-3. The second CCX (cores 4-7) shall be used for the virtual machine.   I used a 12 pin setup to the Guest (6 cores) for half a year.

Sometimes I encountered micro lag-spikes which I couldn’t track down. Then I switched to 8 cpus (4 cores).

Core separation between host and guest system

The plan is to have one CCX for the host, and one CCX for the guest. As the hosts runs first, ill assume it will use the (first) CCX with cores 0-3. The second CCX (cores 4-7) shall be used for the virtual machine. I used a 12 pin setup to the Guest (6 cores) for half a year. Sometimes I encountered micro lag-spikes which I couldn’t track down. Then I switched to 8 cpus (4 cores).

The benchmarks indicated that the 6 core pinning had better CPU mark results, and slightly higher FPS, but since I switched to the CCX seperation the lag spikes went away.

Here are my settings:

AMD Ryzen CPU Pinning Recommendation for Optimal Gaming Performance

In order edit the virtual machines configuration use: virsh edit your-windows-vm-name

Once your done editing, you can use CTRL+x CTRL+y to save the changes. First of all find the very first line, which should read:

<domain type='kvm'>

and replace it with:

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

Now find the line which ends with </vcpu>and add the following block in the next line:

<vcpu placement='static'>8</vcpu>
<iothreads>2</iothreads>
<cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <emulatorpin cpuset='0-1'/>
    <iothreadpin iothread='1' cpuset='0-1'/>
    <iothreadpin iothread='2' cpuset='2-3'/>
 </cputune>

I have tested <vcpusched vcpus='0' scheduler='fifo' priority='1'/> for the every pinned core but removed it eventually. Without it, it felt more responsive. I have no hard number benchmarks to proof this though.

Remark! Make sure <vcpu>, <iothreads> and <cputune> have the same indent.

Remark! Make sure the pinned cores do match the CPUs topology from below.

CPU Model Information

The chapter above gave us some insights in the AMD Ryzen CPU structure. It is a good thing if the Guest operating system also knows about the structure.

The Libvirt CPU model and topology settings block are used to make the Guest aware of the CPU specifications (as CCX layout, chache size, etc.)

CPU Mode and Cache

For Ryzen CPUs the model definition “Epyc” can be used, this is recommended for QEMU version 3.1 and below. This defines structure and caches of the CPU.

For Qemu version above 3.1 host-passthrough is also feasable.

Attention! Windows 10 release >1803 require the Host to have kvm ignore_msrs=1 enabled, otherwise BSOD occur.

CPU Topology

This is were the actual number of cores are defined. Using 1 socket with 4 cores and 2 threads will toll the Guest operating system it has access to one 4 core hyperthreading CPU.

Important for the CPUs topology is that the number of cores matches the number of pinned cores from above.

CPU Features

In both cases <feature policy=’require’ name=’topoext’/> should be added as CPU feature. See arch wiki. But as /u/llitz said using EPYC is to best way to have an optimal setup.

AMD Ryzen CPU model recommendation for optimal gaming performance

Find the block <CPU> and adapt it to look like this:

 <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
   <!-- add additional cpu features here-->
 </cpu>

For QEMU 3.1 and below “EPYC” is prefered over “host-passthrough”:

<cpu mode='custom' match='exact' check='none'>
    <model fallback='allow'>EPYC</model>
    <topology sockets='1' cores='4' threads='2'/>
    <feature policy='require' name='topoext'/>
    <!-- add additional cpu features here--> 
</cpu>

Hyper-V Enlightments

Hyper-V enlightments help the Guest operating system handling virtualization tasks. The operating has to support those features (Win 10 1903 should be better with it than Win 10 1803).

After excessive testing I went with the settings shown below. My general rule of thumb is “the more the merrier“. The vendor_id setting was only required for Nvidia Error 43 prevention.

The function of each setting (and for which version it is available) can be analyzed in the libvirt documentation.

Add Hyper-V enlightments in the block <features> and add the following block in parallel to the <acpi> block:

HyperV Settings Recommendaction

<hyperv>
   <relaxed state='on'/>
   <vapic state='on'/>
   <spinlocks state='on' retries='8191'/>
   <vpindex state='on'/>
   <synic state='on'/>
   <stimer state='on'/>
   <reset state='on'/>
   <vendor_id state='on' value='1234567890ab'/> <!-- former nvidia error code 43 now AMD GPU prevention -->
   <frequencies state='on'/>
</hyperv>

Remark!

Make sure <hyperv>and <acpi> have the same indent.

Clock Settings

todo.

Hugepages

Unfortunately is the Hugepages post out dated. Until I have fixed this, I recommend using the glorious Arch Wiki.

Troubleshooting

A common issues and troubleshooting article exists here.

Updates

  • 06.07.2021 – Added new MSIInterupt Enabler (thank you Mark)
  • 21.08.2019 – Added further information and todos
  • 30.09.2019 – Updated Hyper-V settings
  • 21.11.2019 – Rewrote the article and added further information

Is this content any helpful? Then please consider supporting me.

If you appreciate the content I create, this is your chance to give something back and earn some good old karma.

Although ads are fun to play with, and very important for content creators, I felt a strong hypocrisy in putting ads on my website. Even though, I always try to minimize the data collecting part to a minimum.

Thus, please consider supporting this website directly 😘

42 comments on “Comprehensive guide to performance optimizations for gaming on virtual machines with KVM/QEMU and PCI passthrough”

  1. JK

    Thanks for great configuration. Just bought Ryzen 2700 due to similar reasons – I had problems with guest latency, hope extra cores solve this issue.

    Reply
    1. Mathias Hueber

      Let me know if it helps, I am always looking for further tweaking suggestions.

      Reply
      1. Matt

        I am horrified to find iut after getting it to work, there is no video option for vmvga in my drop-down. Stuck at 800×600

        Reply
  2. VR and Gaming Virtualization on Linux – random($foo)

    […] I used AMD μProf to help with mapping out my 3700X (and this writeup on CPU-pinning): […]

    Reply
    1. M

      Thanks for the article! Works like a charm with Ryzen 7 3700x!

      FYI anyone reading this, I recommend you check your CPU topology with a tool, instead of copying values blindly from the web. For example, the Core/PU numbers of 3700X differ from the processor in the article. On linux there is a tool called ‘lstopo’ which can show the processor topology in a nice graphical diagram (I guess that was used for this article as well?). Another is ‘lscpu -e’ but it is rather difficult to interpret.

      Thanks again Mathias!

      Reply
      1. imaru

        Could you add iommu=pt

        I needed to ad this because without i couldn’t geht my system running (ryzen 2700x).

        Reply
  3. Ian

    Not sure why, but it looks like the physical cores for my 1700 are now arranged differently.
    Core 0 uses threads 0&8, core 1 uses 1&9, core 2 uses 2&10, etc.
    However the logical indexes haven’t changed, so not quite sure which one I must use

    Reply
    1. Mathias Hueber

      Ohh, that is strange. Which BIOS version are you running? Is it still working though? I have read that some newer BIOS versions might break vfio passthrough altogether. See https://forum.level1techs.com/t/attention-amd-vfio-users-do-not-update-your-bios/142685

      Reply
  4. Kckhfn

    Thanks for providing these helpful tips Mathias.

    Reply
  5. Damir

    Hi,

    Great guide. I have a question, do you have a solution that by default it seems TOPOEXT does not handle AMD cpus with 3core/CCX (6 core / 12 core / 24 core CPus) correctly, as it still exposes 8 threads / L3 Cache (instead of 6 threads / L3 Cache) if running SMT. (or 4 instead of 3 if not running SMT).

    Reply
    1. Mathias Hueber

      Ouhh I wasn’t aware of that.
      What model do you use in the config. If you use host-passthrough maybe try EPYC.

      Reply
  6. Damir

    I’ve tried both, same. There is one way to make it happen, using the new die parameter, you can specify

    but, of course, there is a bug with this too. It uses sockets in Windows to expose the topology, it looks correct, and performs much better for cache-sensative games (looking at you FC5), but, you can’t do more than 2 CCX (6 cores or 12 threads /w SMT) because consumer Windows only supports 2 Sockets. if you do more, it will add more sockets, but only every other thread will be loaded, and windows will cap at two sockets.

    I’ve submitted a bug for this, but I thought you may have an workaround. For me, it currently performs better disabling TOPOEXT and letting it fake a L3 for each core (not-shared) instead of apps assuming the wrong topology.

    Reply
    1. Damir

      Forum ate my qemu parameter example:

      qemu:arg value=’cores=6,threads=1,dies=2,sockets=1′

      Reply
  7. Joe

    Never even thought that it is possible to configure a gaming setup on VM, thanks!

    Reply
  8. Alex

    Hi Mathias, thanks for the great setup.

    Have you every tried running Gears of War 4/5 with such setup? I’m running all games finely apart from Gears of War. It shows terrible performance and internal benchmark shows that there is a CPU bottleneck which doesn’t make sense to me. (I’m running Ryzen 3600 and RTX 2060 Super, assigning 4 cores to VM and passing through GPU).

    Reply
    1. Mathias Hueber

      Hey Alex,
      I found this
      it seems you are not alone 🙂

      Reply
  9. Alex

    Hi, I’ve spent like 24 hours literally with various tests and etc. and found what was the issue with Gears of War. It has nothing to do with VM at all.

    I reproduced it with streaming from bare metal Windows setup. Everytime I start streaming (Steam/Moonlight/RDP even) FPS drops by 50% at least. And I’ve figured out with FPSMonitor that GPU load hit no more than 60%. At seems like every DirectX game/benchmark affected, some more some less.
    Unigine benchmarks which I primarly used are not affected for some reason and GPU stays 100% loaded there even when streaming, but 3DMark is affected.

    So I started googling about bad streaming performance and found people using OBS streaming are complaining about fps drops and those who using twitch as well.

    So I’ve finally found one advice that worked for me. It was as simple as disabling Windows aero effects in system settings just by selecting “Adjust for best performance”. I couldn’t believe that, but that worked.

    Now I’m just about 10-20% drop vs bare metal on my headless VM which I’m completely satisfied.

    Reply
    1. Mathias Hueber

      Ok wow.
      Disabling windows aero for better performance gives me Windows Vista nightmare flashbacks 🙂
      I’ll add it to the recommendations.
      Thank you for the detailed updates.

      Reply
      1. Nubs Parrot

        Ryzen 3600 is stuck at it’s base clock of 3.6ghz and it won’t boost. Is there any way to allow it to boost?

        Reply
        1. Mathias Hueber

          Have you set the CPU govenor on the Host to performance?

          Reply
  10. Vic

    Thnx, the cpu pinning worked also for my ryzen 1700 and gave the Windows VM a 300% performance boost.

    Reply
  11. Chaython Meredith

    Highest DPC routine execution time (µs): 16470.230289
    Driver with highest DPC routine execution time: rspLLL64.sys – Resplendence Latency Monitoring and Auxiliary Kernel Library, Resplendence Software Projects Sp.

    Highest reported total DPC routine time (%): 0.023670
    Driver with highest DPC total execution time: nvlddmkm.sys – NVIDIA Windows Kernel Mode Driver, Version 441.87 , NVIDIA Corporation

    Total time spent in DPCs (%) 0.105757

    DPC count (execution time =4000 µs): 0

    Latencymon itself is having high latency, aside from it nvlddmkm.sys 66ms; wdf01000.sys 28ms; TCPIP.sys 8ms, dxgkrnl.sys 7ms, ndis.sys 10ms
    Any idea what’s wrong?
    The only diference between I and you is i7 6700k(so diferent cpu pinning etc] seabios[ I wasn’t given the option during install and I read changing after will break windows] qemu 4 – ubuntu, only pushes security updates…
    If I plugin a usb sound card, my vm drops the usb devices when I exit a game.

    Reply
  12. Steven

    Hi. Thank you for this great post. I am on a Renoir CPU 4800H with RTX 2060. There is some serious CPU performance downgrade in my Win10 guest. It is especially obvious in PUBG where the FPS is cut by more than half during final circles in ranked mode(Very CPU intensive scenario as 50 players are located within 1 km distance of you). However in the very beginning the FPS is just fine like 130FPS + (Very low CPU computation needed). I tried everything that I can think of(everything in your post included). The issue persist. I had a feeling it’s a latency issue? But my level 3 cache latency looks fine to me(13ns). At this point I wonder if anyone has achieved good guest performance for PUBG with ryzen CPU? Like at least 80% of the performance for low 1% FPS compared to bare metal windows.

    Reply
    1. Mathias Hueber

      Well, I recently have tested the bare metal performance by directly booting into the nvme system disk. I was very pleased that it showed basivly the same fps results.

      Which Qemu version are you running?
      Have you checked the common issues article?

      Reply
      1. Steven

        Thanks for the reply! I am using 4.2 that comes with ubuntu 20.04/Fedora 32. Should I switch to 5.0?

        Reply
        1. Steven

          And also I don’t know which flags should I enable for configuring QEMU before I compile it. I remember last time I tried compiling it, it told me to enable usb; The second time I tired it told me enable sth else. Do you mind share you steps of configuring and compiling QEMU from the source?

          Reply
        2. Mathias Hueber

          Qemu 4.2 is fine, this should work. If you can pastebin your config and link it, we can look over it.
          Another option is to ask in /r/vfio for help.

          Reply
  13. rad

    All this recomendations apply to linux as well?

    Reply
  14. Daniel

    this:

    and MORE IMPORTANT this:
    “-cpu host,kvm=off,hv_vendor_id=nvidia43fix,+topoext -enable-kvm -M q35,kernel_irqchip=on

    i removed the -hypervisor flag but stil only 20 fps on rtx 3080…

    edit: fixed. i added these options. no idea which one is responsible. now i have almost baremetall speed.

    hv-relaxed,hv-vapic,hv-spinlocks=8191,hv-vpindex,hv-runtime,hv-crash,hv-time,hv-synic,hv-stimer,hv-ipi,hv-reset,hv-frequencies,hv-reenlightenment,hv-stimer-direct,hv-no-nonarch-coresharing=auto”

    Reply
    1. Mathias Hueber

      Glad it worked out.
      Thanks for pinging back.

      Reply
  15. Stephen Cameron

    I was isolating cores this way

    It worked for smaller CPU’s like the i7 which had smaller core counts

    You might want to see if you get better results by leaving the scheduling up to QEMU
    Something like this maybe?

    Reply
    1. Stephen Cameron

      I was isolating cores this way
      vcpupin vcpu=’0′ cpuset=’8′
      vcpupin vcpu=’1′ cpuset=’9′
      vcpupin vcpu=’2′ cpuset=’10’
      vcpupin vcpu=’3′ cpuset=’11’
      vcpupin vcpu=’4′ cpuset=’12’
      vcpupin vcpu=’5′ cpuset=’13’
      vcpupin vcpu=’6′ cpuset=’14’
      vcpupin vcpu=’7′ cpuset=’15’
      It worked for smaller CPU’s like the i7 which had smaller core counts

      You might want to see if you get better results by leaving the scheduling up to QEMU
      Something like this maybe?
      vcpupin vcpu=’0′ cpuset=’8-15′
      vcpupin vcpu=’1′ cpuset=’8-15′
      vcpupin vcpu=’2′ cpuset=’8-15′
      vcpupin vcpu=’3′ cpuset=’8-15′
      vcpupin vcpu=’4′ cpuset=’8-15′
      vcpupin vcpu=’5′ cpuset=’8-15′
      vcpupin vcpu=’6′ cpuset=’8-15′
      vcpupin vcpu=’7′ cpuset=’8-15′

      Reply
  16. Mark

    Seems the MSI Enabler link is no longer valid. I found another MSI Enabler here
    https://github.com/TechtonicSoftware/MSIInturruptEnabler

    Reply
    1. Mathias Hueber

      Thank you very much <3

      Reply
    2. ericlee

      This is very helpful, thank you

      Reply
  17. LugMunoz

    Hi mate thanks for the guide is very useful, I have a little problem and maybe you could help me, when trying to use cpu-passthrough after I click apply it adds migratable=”on” like this:

    and if I try to erase it when I click apply it came back, I tested with “on” and “off”, “off” gives better performance than “on” but I’m still losing performance compare to when I set cpu mode=”host-model” I need it to be passthrough to play games like Rainbow six siege so any advice on how to improve this?
    I’m using Fedora 34 with QEMU emulator version 5.2.0 (qemu-5.2.0-8.fc34) and my vm is currently using Q35 should I try with i440fx?

    Reply
    1. LugMunoz

      cpu mode=”host-passthrough” check=”none” migratable=”on”

      *edit: just to add this line, don’t know why wasn’t posted in my first comment

      Reply
  18. Sebastian Fors

    Thank you so much for the guide Mathias, it worked perfectly with my Ryzen 1700x.
    The performance is way better.

    Reply
  19. ラブドール おっぱい

    ダッチワイフ 内外のパートナーとして本物のラブドールを持つことの利点

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *