Collecting GPU metrics with Grafana
It is important to note that this setup isn’t secure. We highly recommend reading the documentation for each component used independently and utilizing the built-in security features whenever possible.
In this guide, we will demonstrate how to monitor each GPU on your server and create beautiful dashboards using Grafana. All metrics will be collected using the Prometheus exporter. The exporter will periodically call the nvidia-smi binary to read all available values every few seconds.
Install Grafana
To ensure a smooth installation process, it is recommended to install Grafana from the project’s official repository. Before you add the repository to a package manager’s sources list, please verify that all the required packages are presented in the operating system. The following command installs aby missing ones:
sudo apt -y install apt-transport-https software-properties-common wget
Let’s create a separate directory for keyrings:
sudo mkdir -p /etc/apt/keyrings/
Download and apply the Grafana GPG key:
wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
Add the Grafana repository using the downloaded key:
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
Update cache and install Grafana:
sudo apt update && sudo apt -y install grafana
After the process finishes, you can start the application:
sudo systemctl start grafana-server.service
If you want to run Grafana alongside the operating system:
sudo systemctl enable grafana-server.service
Install Prometheus
To determine the current version and access the package download link, you should visit the project’s official website. For example, if you need to install Prometheus v2.47.1, you can use the following link:
wget https://github.com/prometheus/prometheus/releases/download/v2.47.1/prometheus-2.47.1.linux-amd64.tar.gz
Unpack the archive:
tar xvfz prometheus-2.47.1.linux-amd64.tar.gz
Open the unpacked directory:
cd prometheus-2.47.1.linux-amd64
Let’s copy two executable files to the system directory:
sudo cp prometheus /usr/local/bin
sudo cp promtool /usr/local/bin
Create a separate directory for the configuration files:
sudo mkdir /etc/prometheus
And copy these files to it:
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
Create a configuration file named prometheus.yml, which will contain the main Prometheus settings and targets:
sudo nano /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'nvidia'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9835']
There are basic options and two targets:
- localhost:9090 - this target is a built-in Prometheus target by default.
- localhost:9835 - this will be used to scrape metrics from nvidia_smi_exporter.
Press Ctrl + X, then Y, then Enter to save the file and exit the text editor. It’s a good idea to run Prometheus as a dedicated user, without shell access. We suggest creating a user like this:
sudo useradd --no-create-home --shell /bin/false prometheus
Good. Executable files have been copied, and configuration files are in their places. It’s time to inform the operating system that Prometheus can be used as a daemon and managed with systemd. Execute the following command and input this content:
sudo systemctl edit --full --force prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Save and exit by pressing Ctrl + X, then Y, then Enter. Next, you need to make the newly created user the owner of two working directories and their content:
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus
Everything is ready to start Prometheus. Run the daemon:
sudo systemctl start prometheus
Check the status and you’ll get a result like this:
sudo systemctl status prometheus
● prometheus.service - Prometheus Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2023-10-09 12:44:16 UTC; 5min ago Main PID: 2223 (prometheus) Tasks: 42 (limit: 425060) Memory: 96.4M CPU: 1.099s CGroup: /system.slice/prometheus.service └─2223 /usr/local/bin/prometheus --config.file=/etc/prometheus/prometheus.yml -> Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.263Z caller=head.go:760 l> Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.269Z caller=head.go:760 l> Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.269Z caller=head.go:760 l> Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.270Z caller=head.go:797 l> Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.273Z caller=main.go:1045 > Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.273Z caller=main.go:1048 > Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.273Z caller=main.go:1229 > Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.274Z caller=main.go:1266 > Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.274Z caller=main.go:1009 > Oct 09 12:44:19 gpuserver prometheus[2223]: ts=2023-10-09T12:44:19.274Z caller=manager.go:10>
The last command will indicate that the Prometheus daemon should be loaded automatically when the operating system starts:
sudo systemctl enable prometheus
Scraping metrics in Linux
All the previous steps were merely preparation. Even though Grafana and Prometheus have already been installed, they know nothing about each other. Let's add a new data source to Grafana. Open your web browser and type the following address:
http://[LEADERGPU_IP_ADDRESS]:3000/connections/datasources/new
For the first login to Grafana, use the default credentials: admin/admin. These need to be changed immediately afterward. On the opened page, choose Prometheus:

Set the name and URL of our local Prometheus instance. By default, it operates on port 9090 and uses the simple HTTP protocol:

Scroll down to the end and click on the Save & test button. Grafana will send a short request to Prometheus and if everything is OK, you’ll receive a message like this:

Now, you need a ready-made official dashboard working with the nvidia_gpu_metrics exporter. Open the following link in your browser:
https://grafana.com/grafana/dashboards/14574-nvidia-gpu-metrics/
Here you have the Download JSON button. You don’t actually need to download a JSON file. Simply right-click on it and select Copy link address:

Open the Grafana web interface at http://[LEADERGPU_IP_ADDRESS]:3000/, then click on the plus symbol and select the Import dashboard menu item:

Insert the copied link to the following field by pressing the Ctrl + V keyboard shortcut, then click on the Load button:

Grafana will download the dashboard, and you’ll need to select the previously created data source, Prometheus, from the drop-down menu. Then, click the Import button:

A nice dashboard will open, but it doesn’t show any data yet:

This is because we haven’t yet installed the final component: the nvidia_gpu_exporter. Let’s do that. Open link in your web browser and copy the link to the deb-package. You can now download it using the wget utility:
wget https://github.com/utkuozdemir/nvidia_gpu_exporter/releases/download/v1.2.0/nvidia-gpu-exporter_1.2.0_linux_amd64.deb
Finally, install it:
sudo dpkg -i nvidia-gpu-exporter_1.2.0_linux_amd64.deb
Wait a few seconds and you’ll receive the first data on your Grafana dashboard:

Enjoy your new monitoring tool!
Scraping metrics in Windows Server
The system described above doesn’t need to be located on the same server. For instance, you can establish a combination of Grafana and Prometheus on one server (even a small single-board computer like Raspberry Pi will suffice) and scrape GPU metrics from a remote server. The key component of the system, nvidia_gpu_exporter, can operate not only on Linux but also on Windows Server. We have prepared brief instructions for its installation.
Open PowerShell and run the following commands sequentially. Allow the current user to run PowerShell scripts:
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
Get and install the Scoop package manager:
iex "& {$(irm get.scoop.sh)} -RunAsAdmin"
Install the nssm and git packages using Scoop. These packages are necessary to obtain nvidia_gpu_exporter from Github and install it as a system service:
scoop install nssm --global
scoop install git
Scoop manages apps by grouping them into collections, which are called buckets. Let’s create one with necessary app:
scoop bucket add nvidia_gpu_exporter https://github.com/utkuozdemir/scoop_nvidia_gpu_exporter.git
Now, we are ready to download and install the app:
scoop install nvidia_gpu_exporter/nvidia_gpu_exporter --global
When the app starts, it uses port number 9835. You need to configure Windows Defender Firewall to accept connections from outside:
New-NetFirewallRule -DisplayName "Nvidia GPU Exporter" -Direction Inbound -Action Allow -Protocol TCP -LocalPort 9835
Tell the operating system that nvidia_gpu_exporter is a new service and it needs to run while operating system loads:
nssm install nvidia_gpu_exporter "C:\ProgramData\scoop\apps\nvidia_gpu_exporter\current\nvidia_gpu_exporter.exe"
The final step is to run the service:
Start-Service nvidia_gpu_exporter
Please note that you need to add the remote server as target in your prometheus.yml:
…
- job_name: 'nvidia'
scrape_interval: 5s
static_configs:
- targets: ['REMOTE_SERVER_IP:9835']
…
See also:
Updated: 28.03.2025
Published: 25.06.2024