Llama 3 using Hugging Face

On April 18, 2024, the newest major language model from MetaAI, Llama 3, was released. Two versions were presented to users: 8B and 70B. The first version contains more than 15K tokens and was trained on data valid until March 2023. The second, larger version was trained on data valid until December 2023.

Step 1. Prepare operating system

Update cache and packages

Let’s update the package cache and upgrade your operating system before you start setting up LLaMa 3. Please note that for this guide, we are using Ubuntu 22.04 LTS as the operating system:

sudo apt update && sudo apt -y upgrade

Also, we need to add Python Installer Packages (PIP), if it isn’t already present in the system:

sudo apt install python3-pip

Install Nvidia drivers

You can use the automated utility that is included in Ubuntu distributions by default:

sudo ubuntu-drivers autoinstall

Alternatively, you can install Nvidia drivers manually. Don’t forget to reboot the server:

sudo shutdown -r now

Step 2. Get the model

Log in to Hugging Face using your username and password. Go to the page corresponding to the desired LLM version: Meta-Llama-3-8B or Meta-Llama-3-70B. At the time of publication of this article, access to the model is provided on an individual basis. Fill a short form and click the Submit button:

Request access from HF

Then you will receive a message that your request has been submitted:

You will gain access after 30-40 minutes and will be notified about this via email.

Add SSH key to HF

Generate and add an SSH-key that you can use in Hugging Face:

cd ~/.ssh && ssh-keygen

When the keypair is generated, you can display the public key in the terminal emulator:

cat id_rsa.pub

Copy all information starting from ssh-rsa and ending with usergpu@gpuserver as shown in the following screenshot:

Open Hugging Face Profile settings. Then choose SSH and GPG Keys and click on the Add SSH Key button:

Fill in the Key name and paste the copied SSH Public key from the terminal. Save the key by pressing Add key:

Now, your HF-account is linked with the public SSH-key. The second part (private key) is stored on the server. The next step is to install a specific Git LFS (Large File Storage) extension, which is used for downloading large files such as neural network models. Open your home directory:

cd ~/

Download and run the shell script. This script installs a new third-party repository with git-lfs:

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash

Now, you can install it using the standard package manager:

sudo apt-get install git-lfs

Let’s configure git to use our HF nickname:

git config --global user.name "John"

And linked to the HF email account:

git config --global user.email "john.doe@example.com"

Download the model

Open the target directory:

cd /mnt/fastdisk

And start downloading the repository. For this example we chose 8B version:

git clone git@hf.co:meta-llama/Meta-Llama-3-8B

This process takes up to 5 minutes.You can monitor this by executing the following command in another SSH-console:

watch -n 0.5 df -h

Here, you’ll see how the free disc space on the mounted disc is reduced, ensuring that the download is progressing and the data is being saved. The status will refresh every half-second. To manually stop viewing, press the Ctrl + C shortcut.

Alternatively, you can install btop and monitor the process using this utility:

sudo apt -y install btop && btop

To quit the btop utility, press the Esc key and select Quit.

Step 3. Run the model

Open the directory:

cd /mnt/fastdisk

Download the Llama 3 repository:

git clone https://github.com/meta-llama/llama3

Change the directory:

cd llama3

Run the example:

torchrun --nproc_per_node 1 example_text_completion.py \
--ckpt_dir /mnt/fastdisk/Meta-Llama-3-8B/original \
--tokenizer_path /mnt/fastdisk/Meta-Llama-3-8B/original/tokenizer.model \
--max_seq_len 128 \
--max_batch_size 4

Now you can use Llama 3 in your applications.

Published: 19.04.2024