What to remember when using benchmarks

The primary role of every benchmark is to allow users to assess the approximate performance of a computer or server, and to compare these results with other configurations. Additionally, benchmarks are frequently used to identify bottlenecks: scenarios where hardware can’t reach its maximum performance due to one or more configuration components. In some instances, this can verify if the server configuration aligns with the declared one.

What influences the result

When choosing a benchmark for a server, you need to be very careful, as there are many factors that can affect the final result and mislead you. Here are the main factors:

See also:

Use of desktop benchmarks on servers;
Complete trust in published online ratings;
Enabled energy-saving features;
Patches against vulnerabilities like Spectre and Meltdown;
Presence of network drives mounted as local drives.

Desktop benchmarks are primarily designed for tests with a local console connected. Most of them are intended for gamers and anyone who needs to squeeze as much FPS as possible from GPUs at a high screen resolution. Since servers are accessed using remote access technologies such as RDP or VNC, the graphic cards often can’t be switched to the modes required for testing. This results in failures even in those benchmarks that are generally designed for testing both desktop and server hardware:

For instance, Passmark PerformanceTest, then attempting to test 3D Graphics, will display the relevant error and simply skip all tests that cannot be performed. Consequently, the overall rating will be reduced as if there were no GPU in the system at all.

Some desktop benchmarks are designed for systems with a single graphics accelerator installed or multiple ones, connected using “desktop” technologies like SLI®. If your server has a built-in video card on the motherboard, it will be considered the primary GPU in benchmark. This occurs, for example, when running the Novabench benchmark if all server motherboard drivers have been correctly installed. Some benchmarks, in addition to providing local results, often have online ratings, where information about the tests performed is accumulated. This allows for better interpretation of results by comparing them with reference values.

Unfortunately, many such ratings have turned into a competitive ground, where each participant tries to maximize the performance of their equipment. Sometimes participants look for vulnerabilities and exploit them, substituting obviously inflated values that can’t be obtained in the usual ways. Therefore, relying 100% on such ratings often leads to controversial situations when people compare their performance with inflated values and mistakenly assume performance problems. So when using data from such ratings, it’s important to consider that they sometimes contain fake data.

Energy saving modes

Windows Server operating systems are characterized by the use of various schemes to enhance energy efficiency at the expense of partial performance reduction. For instance, when installing Windows Server 2019/2022, the Optimal Performance scheme may be set by default, which significantly impacts the results of synthetic tests. For example, if we run the Passmark PerformanceTest with the default energy efficiency scheme, we’ll obtain the score like in the previous picture: 7401.2

Now, let’s switch the scheme to High Performance and repeat the test:

As you can see, the score is different (7660.3). This clearly demonstrates how such tests depend not only on the hardware itself but also directly on the operating system settings.

Vulnerability patches

Vendors attempt to shield themselves from numerous hardware vulnerabilities by issuing new versions of BIOS/UEFI firmware or patching an operating system. The issue is that these firmware or patches significantly decrease performance while enhancing security. The overall system performance can diminish by 5-30 percent, which will instantly impact benchmark results. Systems that are less secure and susceptible to hardware vulnerabilities will perform better in any synthetic tests. This is an important factor to consider when studying system ratings and performance measurements.

Install the PowerShell Module:

Install-Module SpeculationControl

Backup the current execution policy:

$SaveExecutionPolicy = Get-ExecutionPolicy

Change the current execution policy:

Set-ExecutionPolicy RemoteSigned -Scope Currentuser

Import-Module SpeculationControl

Get-SpeculationControlSettings

Restore the previous execution policy:

Set-ExecutionPolicy $SaveExecutionPolicy -Scope Currentuser

To disable patches for the vulnerabilities above, open the Command line or PowerShell and run two commands:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

Don’t forget to restart the server.

To re-enable these patches use the following commands:

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 0 /f

reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

After this you will also need to restart the server.

Background tasks optimizing

By default, the Windows Server operating system optimizes performance for background services, giving them the highest priority. As a result, active applications running from the GUI (such as benchmarks) will have lower priority than those running as services. To change this you need to switch the priority mode to favor regular applications over services, before conducting performance testing. To do this, follow these steps.

Right click at Start - System - Advanced system settings:

Choose the Advanced tab and click Settings button in the Performance section.

In the window that opens select Advanced tab and tick Programs in the Adjust for best performance of tab.

Confirm with OK.

Conclusion

This article lists only the basic factors that can affect server performance, and therefore, the readings of synthetic benchmark tests. So, if you are looking for a reason why your results are very different from the results of testing servers with a similar configuration, it makes sense to check which of the above factors are present.

In addition, I would like to note that we don’t recommend disabling protection against such vulnerabilities as Spectre and Meltdown on production servers, even despite the obvious decrease in performance. Data security will take precedence over performance in the vast majority of cases.