Skip to content

Development

Dylon Edwards edited this page Feb 18, 2024 · 4 revisions

Initialization

To ease dependency management during development, Anaconda is used but should not be required if you have the necessary libraries installed. If you do not have a working Anaconda installation, I recommend the Mamba variant:

# For Linux (x86_64)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b
# For OS X (arm64)
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh -b

[!TODO] Add instructions for Windows.

Initialize the base environment:

__conda_setup="$("$HOME/miniforge3/bin/conda" 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0  ]; then
    eval "$__conda_setup"
else
    if [ -f "$HOME/miniforge3/etc/profile.d/conda.sh"  ]; then
        source "$HOME/miniforge3/etc/profile.d/conda.sh"
    else
        export PATH="$HOME/miniforge3/bin:$PATH"
    fi
fi

Clone liblevenshtein-cpp:

git clone https://github.com/universal-automata/liblevenshtein-cpp.git
cd liblevenshtein-cpp

Initialize the conda environment:

mamba env create --force -f environment.yml
conda activate ll-cpp

Building and Installation

mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX ..
make
make install

The files will be installed to the following locations:

$ tree "${CMAKE_INSTALL_PREFIX}"
${CMAKE_INSTALL_PREFIX}
├── include
│   ├── MurmurHash2.h
│   ├── MurmurHash3.h
│   └── liblevenshtein
│       ├── collection
│       │   ├── dawg.h
│       │   ├── dawg_iterator.h
│       │   ├── dawg_node.h
│       │   ├── prefix.h
│       │   ├── sorted_dawg.h
│       │   └── transition.h
│       ├── distance
│       │   ├── distance.h
│       │   ├── memoized_distance.h
│       │   ├── merge_and_split_distance.h
│       │   ├── standard_distance.h
│       │   ├── symmetric_pair.h
│       │   └── transposition_distance.h
│       ├── proto
│       │   └── liblevenshtein.pb.h
│       ├── serialization
│       │   └── serializer.h
│       └── transducer
│           ├── algorithm.h
│           ├── comparator.h
│           ├── distance.h
│           ├── intersection.h
│           ├── lazy_query.h
│           ├── merge.h
│           ├── position.h
│           ├── position_transition.h
│           ├── state.h
│           ├── state_iterator.h
│           ├── state_transition.h
│           ├── subsumes.h
│           ├── transducer.h
│           └── unsubsume.h
└── lib
    ├── cmake
    │   └── liblevenshtein
    │       ├── liblevenshtein-config-version.cmake
    │       ├── liblevenshtein-config.cmake
    │       ├── liblevenshtein-targets-release.cmake
    │       └── liblevenshtein-targets.cmake
    ├── liblevenshtein.so -> liblevenshtein.so.4.0
    ├── liblevenshtein.so.4.0 -> liblevenshtein.so.4.0.0
    └── liblevenshtein.so.4.0.0

11 directories, 37 files

Enabling tests

If you want to build the library with tests, use the same instructions but add the CMake option BUILD_TESTS=ON, as described below:

cmake -DCMAKE_BUILD_TYPE=Debug \
      -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
      -DBUILD_TESTS=ON \
      ..

Enabling baseline metrics

If you want to enable the baseline metrics for validation, you must pass -DBUILD_BASELINE_METRICS=ON to CMake:

cmake -DCMAKE_BUILD_TYPE=Debug \
      -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
      -DBUILD_BASELINE_METRICS=ON \
      ..

The baseline metrics are intended for validation of the search results but might be useful if you need to compute edit distances among individual pairs of terms.

Note

The baseline metrics are required for the tests and will be implicitly enabled for them if the baseline metrics are not explicitly enabled.

Testing

# Within the build directory
./test/test-liblevenshtein

Generating Documentation

Documentation is generated with Doxygen. To generate the docs, you must pass -DGENERATE_DOCS=ON to cmake, followed by make doxygen:

# Create and activate the necessary environment
mamba env create --force -f environment-docs.yml
conda activate ll-cpp-docs

# Configure CMake
mkdir build
cd build
cmake -DGENERATE_DOCS=ON ..

# Generate the docs and write them to $PWD/doxygen/html/
make doxygen

Dependencies

For the most up-to-date list of dependencies, please reference environment.yml.