What is NVIDIA® MIG

In modern computing, GPUs play a crucial role across various fields, from medicine to film production. While GPUs can significantly reduce computing time - often by factors of tens or hundreds. They also have unique characteristics to consider. For single-workload scenarios without the need to share a GPU among users, almost any GPU will suffice.

Many applications support parallel computing, distributing the workload across multiple physical cards. However, challenges arise when there is only one powerful GPU in a server, or when multiple users need to run GPU-intensive calculations exclusively, potentially blocking others until their tasks complete. NVIDIA®’s MIG (Multi-Instance GPU) technology addresses this issue. Available on specific GPU models like A30, A100 (PCIE/SXM4), H100 (PCIE/SXM5/GH200), and H200 (SXM5), MIG allows logical division of a single card into several independently accessible instances.

These GPU instances are isolated at the hardware level, preventing workloads from affecting each other’s performance or VRAM consumption. Each instance is allocated a fixed amount of video memory. If a process attempts to exceed this allocation, the OOM-killer activates, terminating the offending process.

The maximum possible number of instances is 7. This may seem strange, as it would be more logical to divide into an even number of parts. There is a simple explanation: the 8th part is also allocated, but its computing resources are used to control the division. Therefore, it becomes unavailable for normal use. VRAM is also divided into 8 parts at most and is equally distributed among the 7 GPU instances. Each instance also receives a portion of other resources, such as hardware decoding units.

This feature allows for workload parallelization even with a single card. At the OS level, GPU instances appear as different physical cards, enabling the application to access each separately and ensure parallel computing. This achieves optimal utilization of GPU resources. Cloud providers highly value this feature, as the ability to flexibly manage resources is critical, especially in containerized environments.

MIG features

Before working with MIG, ensure that the server has the necessary GPU drivers installed and that there are no active computing tasks on the card. Enabling MIG is a “heavy” request to the driver API, causing a complete clearing of video memory and a GPU restart. Currently, this only works in Linux, and the user executing such a request must have superuser rights.

When working with MIG, there are several important considerations to keep in mind. First, MIG is designed for computing tasks only and doesn’t support graphical applications. If you need to run graphical applications, it’s necessary to disable MIG.

MIG functionality and GPU instance passthrough within virtual machines are only supported on Linux-based operating systems. You won’t be able to use MIG with Microsoft Hyper-V or VMware ESXi. In these cases, it’s recommended to disable MIG and perform full GPU passthrough instead.

It’s worth noting that GPU instances lack P2P connectivity, even when placed in the same container. This limitation, caused by internal isolation mechanisms, can pose significant challenges for infrastructures built using the Kubernetes orchestrator. However, third-party software solutions can help overcome this issue.

MIG is best suited for servers with GPUs of the same model. LeaderGPU users don’t need to worry about this, as all available configurations are designed with MIG compatibility in mind. To enable MIG, you’ll need to run specific commands. If you’re using NVIDIA® System Management and Data Center GPU Manager services, make sure to stop them first:

sudo systemctl stop nvsm dcgm

After ensuring there are no active jobs on the GPU, proceed with the switch modes. For instance, the following command enables MIG on GPU ID 0:

sudo nvidia-smi -i 0 -mig 1

Repeat this command for each GPU you want to divide into instances. Next, let’s examine the available profiles for this division:

sudo nvidia-smi mig -lgip

Select the desired profiles and note their IDs. Let’s assume we have an NVIDIA® A100 GPU. We’ll divide the card into four GPU instances: the first will have three Compute instances and 20 GB of video memory, while the other three GPU instances will have one Compute instance and 5 GB of video memory.

Note that the system applies profiles sequentially. To prevent potential errors, always specify the profile with the largest number of Compute instances and memory first:

sudo nvidia-smi mig -cgi 9,19,19,19 -C

Successfully created GPU instance ID  2 on GPU  0 using profile MIG 3g.20gb (ID  9)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  2 using profile MIG 3g.20gb (ID  2)
Successfully created GPU instance ID  7 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  7 using profile MIG 1g.5gb (ID  0)
Successfully created GPU instance ID  8 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  8 using profile MIG 1g.5gb (ID  0)
Successfully created GPU instance ID  9 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  9 using profile MIG 1g.5gb (ID  0)

Otherwise, you may encounter an error:

Failed to create GPU instances: Insufficient Resources

After applying the profile, check the available GPU instances:

sudo nvidia-smi mig -lgi

+-------------------------------------------------------+
| GPU instances:                                        |
| GPU   Name             Profile  Instance   Placement  |
|                          ID       ID       Start:Size |
|=======================================================|
|   0  MIG 1g.5gb          19        7          0:1     |
+-------------------------------------------------------+
|   0  MIG 1g.5gb          19        8          1:1     |
+-------------------------------------------------------+
|   0  MIG 1g.5gb          19        9          2:1     |
+-------------------------------------------------------+
|   0  MIG 3g.20gb          9        2          4:4     |
+-------------------------------------------------------+

You can now restore the operation of NVSM and DCGM:

sudo systemctl start nvsm dcgm

Alternative management

The standard management method using the nvidia-smi utility has several drawbacks. First, it only divides resources into a fixed number of instances. Second, reconfiguration requires unloading all computing tasks from memory and stopping applications. This approach is bad-suited for cloud computing, as it limits automatic scaling. To fully utilize MIG, additional software like Run:ai is necessary.

This platform offers more flexible GPU utilization management, complementing MIG with its own fractionation technology. It ensures each running application receives its share of computing power. Through a specialized monitoring application, the platform allocates equal computing resources to each application and redistributes unused resources among other active applications.

Run:ai also guarantees parallel execution of workloads, maximizing resource utilization. As the division is software-based, special attention is given to VRAM management to prevent collisions.

Beyond dynamic management, the platform allows reserving specific computing resources for particular applications. This eliminates the need to unload all other applications, as partitioning occurs on the fly. The platform ensures this process doesn’t disrupt running GPU applications.

What is NVIDIA® MIG

MIG features

Alternative management

Still have questions? Write to us!