Building a high performance compute server on Azure and installing KenLM and Cuda/Kaldi with NVIDIA Tesla drivers.

Feb 20, 2022 | Azure, Linux, Server administration, Technology | 0 comments

About a week ago, I was asked to build a new server. This is going to be used for research purposes so the spec is quite high. 16 dedicated CPU cores, 110GB RAM and an NVIDIA Tesla T4 GPU. It’s running on Azure and the applications needed on it are a little different. So this was a lot of fun.

First the VM type: It’s a Standard_NC16as_T4_v3 server. You can’t just go buy one of these. You must create a support request with Microsoft so that they can release the number of cores required for this specific type of server. This is a painful process! There were 200 processor cores available in that subscription but obviously not at the right type. However, there is a very useful category when creating a support request in the Azure Portal for requesting additional cores. What isn’t so useful is the portal didn’t understand that I had enough cores. I needed the specific cores for this research server. I spoke to a HPC (High Computer Performance) specialist about something unrelated during the week and he knew what I was talking about right away. But it took over a week for Azure Support to understand what I was looking for then make the required changes.

Moving on, Once Microsoft did what they needed, setting up the new server wasn’t difficult. It was created within about 10 minutes after I finished with the VM creation wizzard.

The main requirements of this server are Cuda and KenLM and this is really what this post is about. I don’t spend every day in a Linux environment. So when I need to install something like this that I wouldn’t use often, I rely heavily on documentation. It’s not that I couldn’t go hunt down all the installation sources and dependencies. But that would be a waste of time. And time is not something I really like to waste.

I took notes during this process. These include the commands that I used to install everything and the various sources I read through to learn a bit more about what I was installing and how it could and should be done.

In case anyone copies and pastes the following lines, I am going to proceed my comments with #.

# First you need to determine the GPU that you have and the suggested driver. Fortunately, this is way easier than it used to be.
apt install ubuntu-drivers-common
ubuntu-drivers devices

# Do not use this next command. It installs way too much and will result in massive dependency issues when you go to install Cuda.
# ubuntu-drivers autoinstall

# After installing the GPU driver, you must reboot.
reboot now

# The following command will install the NVIDIA gPU driver. It will also install the unmet dependencies.
apt install nvidia-driver-470 libnvidia-gl-470 libnvidia-compute-470 libnvidia-decode-470 libnvidia-encode-470 libnvidia-ifr1-470 libnvidia-fbc1-470

# This will install all of the Cuda dependencies.
mv /etc/apt/preferences.d/cuda-repository-pin-600
apt-key adv --fetch-keys
add-apt-repository "deb /"
apt-get update
apt-get -y install cuda

# Add the Cuda binaries to your path:
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc

# You can test that Cuda is installed and that the version installed is as expected as follows:
nvcc --version

# IF at some point, you need to start again, this one-liner will remove all the NVIDIA and Cuda packages that you might have installed using aptitude / apt-get.
# apt clean; apt update; apt purge cuda; apt purge nvidia-*; apt autoremove apt install cuda

# The following lines will install KenLM on Ubuntu 20.04.
apt-get update
apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev
apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev -y
git clone
cd kenlm/
mkdir build
cd build
cmake ..
make -j 4
make install


Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.