Develop cmake for Kaldi

Recently I started to take time to read C++ codes in Kaldi to deepen my understandings of the internal Kaldi pipelines. To start with, there are several options to read Kaldi’s codes.

  • read code with <your favourite editors>
  • read code on Github
  • read code on Kaldi’s website
  • read code on IDE

Previously I was reading Kaldi’s code mainly using Github and its website. These are not bad options if we were only interested in reading codes. However, I prefer debugging codes with some easy tests when reading them. So my options here are limited to editors or IDEs. As I am not familiar with debugger plugins of editors (e.g: ctags), and I am used to developing C++ projects in Clion IDE because it is supported in most operating systems, so I decided to use Clion as my main IDE for code reading/debugging.

The problem here is Clion requires the project to be configured with cmake, which is not compatible with the default makefile in Kaldi. Therefore, I was trying to develop the cmake environment to replace the makefile in order to use Clion. This post contains my investigations of Kaldi makefiles and explains how did I create CMakeLists.txt from the existing makefiles.

Related Works

It looks that there were several works trying to construct CMakeLists.txt for Kaldi so far. For instance:

  • brute force: just add all header and source into one Kaldi’s executable. A quick way to configure but impossible to debug.
  • pykaldi: this is a cool project to implement a python wrapper for Kaldi and has lots of interesting points. However, its cmake contains many unrelated code for our purpose (e.g: clif bindings).
  • kaldi-cmake: this is actually a good start point for kaldi’s cmake. It uses a python script to traverse all Kaldi’s subdirectories to generate CMakeLists.txt, and I follow the same strategy to implement my own version. This project, however, changed Kaldi’s original directory structures and divided all files into header, source and test directories. As I want to keep the original structure, I did not use this project.

None of them could satisfy my personal purpose, so I continue to create my own.

Makefiles in Kaldi

There are two large directories under Kaldi’s root path: src and tools.

  • src contains the core part of Kaldi’ own code
  • tools has the 3rd party codes such as openfst and blas libraries.

The install instructions require us to first configure tools and compile necessary software. Then go to the src to configure and compile. As we are only interested in reading and debugging Kaldi’s core part, we will follow the instructions to configure both directories and compile tools, but we will skip the compilation part for src.

After configuring Kaldi’s src, each subdirectory in src will contain Makefile as follows:

# Whenever make is run in this directory, call ./get_version.sh as the
# first thing. This script regenereates ./version.h if necessary, e.g.
# if it does not already exist or if the version number has changed.
LOG := $(shell ./get_version.sh; echo " $$?")
ifneq ($(strip $(LOG)), 0)
  RC := $(lastword $(LOG))
  OUT := $(wordlist 1,$(shell echo $$(($(words $(LOG))-1))),$(LOG))
  ifeq ($(RC),0)
    $(info $(OUT))
  else
    $(error $(OUT))
  endif
endif

all:

include ../kaldi.mk

TESTFILES = kaldi-math-test io-funcs-test kaldi-error-test timer-test

OBJFILES = kaldi-math.o kaldi-error.o io-funcs.o kaldi-utils.o timer.o

LIBNAME = kaldi-base

ADDLIBS =

include ../makefiles/default_rules.mk

The structure here are simple. First, there are two top-level included Makefile: ../kaldi.mk and ../makefiles/default_rules.mk

  • kaldi.mk: it looks contains the environment variable setup by configure
  • makefiles/default_rules.mk: this is to setup template how to compile files based on different OS

The meaning of other fields are as follows:

  • TESTFILES: all listed cc files should be compiled as executables
  • OBJFILES: all listed objects should be compiled into the library specified by LIBNAME
  • LIBNAME: library name for this directory. usually each directory is corresponding to one LIBNAME
  • ADDLIBS: other libraries dependencies. (There are no dependencies here because kaldi-base is one of the most root-level shared library in Kaldi. However, a lot of other libraries will depend on this kaldi-base)

cmake for Kaldi

To create CMakeLists.txt for each directory, we need to translate the previous Makefile into the following CMakeLists.txt

set(EXECUTABLE_OUTPUT_PATH "${CMAKE_CURRENT_SOURCE_DIR}")

add_library(kaldi-base kaldi-math.cc kaldi-error.cc io-funcs.cc kaldi-utils.cc timer.cc)

add_executable(kaldi-math-test kaldi-math-test.cc)
add_executable(io-funcs-test io-funcs-test.cc)
add_executable(kaldi-error-test kaldi-error-test.cc)
add_executable(timer-test timer-test.cc)

target_link_libraries(kaldi-math-test kaldi-base)
target_link_libraries(kaldi-error-test kaldi-base)
target_link_libraries(timer-test kaldi-base)

The first line is to ensure all executables are generated under the same directory with the source directory. This is for the compatibility with the default Kaldi’s behavior.

The second line is to compile the library specified by LIBNAME by building all specified OBJFILE.

The third part is to build all TESTFILE, and the last part is to link those test executables with the current library.

Additionally, we need to make sure that blas so and openfst so are linked correctly.

Special Cases

There are, however, a handful of corners cases regarding the Kaldi makefiles.

The first one is its naming issue inside nnet1, 2, 3. There are executables whose name are identical in nnet1, 2 and 3 which looks like conflicting with cmake system because Cmake requires different id to be assigned for different executable. I handle this by adding nnet1, nnet2 or nnet3 prefix to every executable’s name.

The second one is cuda related files. Kaldi implements its own cuda matrix library which is used in their nnet implementations. As I am developing systems on a machine without a GPU, I would like to disable those cuda features temporarily. I simply disable those two cu object files (btw, there are two cu objects: cu-kernels.cu and chain-kernels.cu)

The third case is the dependency on some 3rd party libraries. First, there are two directory containing projects using tensorflow: tfrnnlm and tfrnnlmbin. As these two are not automatically built and I do not want to handle library dependency to tensorflow for now, I disable creating cmakefiles for these two directories. In addition, I also ignored online and onlinebin as they require dependency to portaudio. These can be fixed, however, by manually adding target linking.

6 Comments

    1. Hi Ahmet,

      Thanks for your comment!
      As those cmakes should be configured with respect to your own building environment, I guess my generated cmake is not useful for your system.
      Can you try running this python script? I actually generated my cmake with it.
      It should work at least for Linux.

  1. Thanks for the reply, It seems to work with Linux. It only needs to add C++11 flag and one class was named cc extension but it should have been cu.
    I tried with Mac and it kinda worked but I had a problem with math libraries (Atlas). I can make a pull request for small fixes.

  2. Hi
    thanks for your work
    I want to modify Kaldi codes but I confuse to choice a c++ IDE.
    In your opinion which IDE is better for Kaldi?

    best regards

    1. Hi Akbar,

      Thanks for your comment!
      If you do not have any particular IDE or editor preference, I think clion should be enough in most cases. I have being using clion and this cmake to debug kaldi for a couple of weeks, and it works pretty well so far. Personally, I am also using other jetbrains IDE (e.g.: pycharm) so it’s easy for me to get familiar with clion.
      Unless you are considering developing some native applications such as win32 (visual studio) or iphone apps (xcode), I think clion should be a good choice.

Leave a Comment

Your email address will not be published. Required fields are marked *