Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autobuild Python/Pytorch module for derivative code #166

Open
wants to merge 94 commits into
base: second_derivative
Choose a base branch
from

Conversation

fymue
Copy link
Collaborator

@fymue fymue commented Jan 6, 2023

Differential Dynamic Programming

Motivation (some of this might not be 100% correct, ignore if that's the case)

Motivation

Recently, certain DP algorithms such as Needleman-Wunsch (NW) have been proven to be differentiable. This discovery has opened new doors for the usage of such DP algorithms, which can now be used in e.g. deep learning models like DeepBLAST, which aims to learn protein structural similarity from sequence alone. This particular model is written in Pytorch and uses Pytorch's autograd functionality (automatic differentiation) for the backward pass through the NW algorithm. Due to the implementation of this functionality, this is extremely slow though, because it takes a lot of kernel calls to complete the backward pass. It would be way more efficient if one could calculate the entire backward pass in one function/kernel call. This is where GAPC can help. GAPC can generate C++-code that can calculate the score matrices for e.g. differentiable NW in one kernel call. If we were able to call this generated code directly from Python/Pytorch, the entire backward pass could be executed in 1 kernel call, which has the potential to significantly speed up the training process of the model.

Implementation

In order to make our GAPC-generated derivative code Python-/Pytorch-compatible and usable in e.g. DeepBLAST, I implemented a method which uses the Pytorch C++ Extension API to automatically create a Python module from our generated code, which provides a forward and backward function for the first and second derivative (if requested). After installation, this module can be imported/used like any other Python module. Currently, these functions take two strings/sequences as arguments (just like the GAPL NW/Gotoh implementations) and output a List of Pytorch Tensors of all alignment score matrices (1 for NW, 3 for Gotoh).

These functions are provided trough an interface (pytorch_interface.cc), which contains generic definitions for the forward/backward functions as well as binding instructions for the module build step (Side note: since our GAPC-generated code for NW/Gotoh uses double vectors to store all values, I had to use float64 Tensors to store these values. This might cause problems if this is supposed to run on a GPU later, so we should maybe see if our GAPC algorithm also works with 32-bit floats). The module creation is handled via setuptools. An adjusted Makefile will generate a setup.py file (as well as the pytorch_interface.cc file) with all necessary instructions for the module creation.

The GAPC-generated code is only minimally modified. The out object(s) get the additional methods get_forward_score_matrices and get_backward_score_matrices, which convert the internal score matrices (std::vector) to torch::Tensor objects and return them. This means that as of right now, the core DP algorithm is unmodified and calculates everything regularly; I merely convert the score matrices to tensors after the algorithm finished it's calculations.

Prerequisites, Usage, Example

In order to create a Python-/Pytorch module with GAPC instead of the regular binary, you can use the --as-torchmod MODULE_NAME option (in combination with the --derivative N option; if this option is not set, the module creation cannot work so the user is instructed to set this option), e.g. :

gapc -i bothD --derivative 2 --as-torchmod "nw_gapc" alignments.gap

With this option enabled, an adjusted Makefile will be produced, which contains rules that generate the setup.py file as well as the pytorch_interface.cc file, through which all functions of the module will be provided.

The command

make -f out.mf

will generate the setup.py and pytorch_interface.cc files, the command

make -f out.mf install

will generate the Python module and install it locally.

For the installation to work, the user must have the Python modules torch and pip installed (as well as Python, obviously). The Python.h header file is also required. This file does not usually come with the default Python installation as it is part of the python-dev/python-devel package. This file is available though if the user installs Python/Pytorch in a conda environment (which is recommended for Pytorch anyways). As an example, I will provide a short installation guide on how I set everything up on my system. Since I don't have a GPU on my machine, I installed the cpu-only version of torch. This should also work for cuda versions of Pytorch though.

conda create -n pytorch python=3.8  # Pytorch recommends python3.6, python3.7 or python3.8
conda activate pytorch
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu  # for cpu-only
# OR pip3 install torch torchvision torchaudio  # for cuda 11.7
gapc -i bothD --derivative 2 --as-torchmod "nw_gapc" alignments.gap
make -f out.mf
make -f out.mf install  # install "nw_gapc" module locally

After that, you can use the nw_gapc module in Python:

import torch  # need to import torch first so certain linker symbols become visible
import nw_gapc

s1 = "Freizeit"
s2 = "Zeitgeist"
tensor = torch.zeros(1)  # example torch tensor

# all functions currently take a tensor as the first argument
# as an illustration that we can easily pass inputs from Python to these functions
# and process the inputs in our C++ code

forward_D1 = nw_gapc.forward_D1(tensor, s1, s2)  # forward pass 1st derivative (equivalent to run method in out_main.cc)
backward_D1 = nw_gapc.backward_D1(tensor, s1, s2)  # backward pass 1st derivative (equivalent to report_derivative method in out_main.cc)

forward_D2 = nw_gapc.forward_D2(tensor, s1, s2)  # forward pass  2nd derivative
backward_D2 = nw_gapc.backward_D2(tensor, s1, s2)  # backward pass 2nd derivative

# forward_DX and backward_DX return Lists of Pytorch Tensors, which are the score matrices for the respective pass

fymue added 14 commits December 27, 2022 22:16
…ivative GAPC code (still need to add pytorch_interface.cc generation)
…core matrices to pytorch Tensors; add generic_pytorch_interface.cc; can now calculate score matrices in Python for NW
…low errors when adding C++ double vals to tensor (probably need to change our code to use float so tensor can be float32 for GPU compatibility); use make_blob function to make tensors (way faster)
… matrices where returned instead of forward matrices
@fymue fymue requested a review from sjanssen2 January 6, 2023 14:46
fymue and others added 15 commits January 7, 2023 14:18
…rything if forward gets re-called with different inputs; fix comment
…parisons; module compiles and can be built now
…omments; add some more constructors; refactor indexing
@fymue fymue mentioned this pull request Jun 13, 2023
13 tasks
@fymue
Copy link
Collaborator Author

fymue commented Jul 4, 2023

Update (04.07.2023):

Didn't post an update for a while now, but here all all changes since the last update:

  • 2nd derivative can now be executed as well; the generic interface provides the forward_D2 and backward_D2 functions, which can be run to execute the 2nd forward and backward pass, and to return the respective DP matrices as tensors
  • the dimensions of the tensors that are created from the computed DP matrices are now individually set for every DP table; before, I simply assumed that all DP tables have the same size and dimensions (pretty lazy)
  • I also added a small mechanism to the generated code that can check if a DP table is quadratic (NxN), linear (N) or just a triangular matrix; since the exposure of DP matrices as tensors only works for quadratic or linear tables, the new function convert_triu_to_tensor will properly convert a triangular matrix to a tensor if a triangular DP matrix is detected
  • if users want to generate an algorithm that can process batched inputs, they will be warned if the sizes of the batch dimension of multiple input tensors are unequal
  • I adjusted the make_index / index_components mechanism to be able to handle an arbitrary number of indices; to ensure that this stays fast and efficient, I had to implement a way to add template arguments to GAP-C-generated functions (not too much new code, most of it is in src/expr/template.hh); this allows us to add template arguments to GAP-C Fn_Def objects, so the objects inside those functions can be provided with template arguments
  • max batch size can be adjusted by user at compilation: e.g. MAX_BATCH_SIZE=32 make -f out.mf would limit the max batch size to 32 instead of the default 128; this can same a huge amount of memory if the user knows that a bigger max batch sizes isn't required

@fymue
Copy link
Collaborator Author

fymue commented Jul 9, 2023

Update (09.07.2023)

I added some minor bugfixes and overall improvements:

  • GAP-C will throw an error if users used batch types (e.g. F32batch) in algebra functions but didn't specify batched input tensors; Similarly, GAP-C will also report an error if batch-incompatible types were used in combination with batched input tensors
  • fixed an edge case where the internal DP matrices were DEALLOCATED prematurely (The commit msg mistakenly says "premature allocation")
  • cleaned up the GAP-C help msg for this feature a bit
  • added the option to use regular dynamic memory allocation for Batch objects with malloc instead of using the memory pool; this option can be turned on at compile-time by setting the MALLOC_BATCH environment variable (e.g. MALLOC_BATCH=1 make -f out.mf); this is useful when users aren't sure how big the batch dimension of the input tensors can become or if they simply don't want to set a compile-time limit for the maximum batch size

Copy link
Member

@sjanssen2 sjanssen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not yet through. Here come my first comments.

bool batched = input.find("batched") != input.npos;

// check the number of dims of the input Tensor (default: 2)
std::regex regex("[a-zA-Z0-9]*([0-9]{1,2})D[a-zA-Z0-9]*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why limiting to 0-99? Is this a limit for PyTorch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think PyTorch tensors have a dimension limit. But how realistic is it that someone has a tensor with e.g. 90 dimensions? Since I wanted to at least support 10D tensors, this regex needs to parse a 2-digit number, so 99 is the natural hard limit. Would you prefer this to be unlimited instead?

src/tensor.hh Outdated
try {
n_dims = std::stoi(match[1].str());
} catch (const std::exception &e) {
std::cerr << "Couldn't convert " << match[1].str() << " to a number.\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please use the gapc log mechanism to report this error. It would also help if you prompt the user to the input declaration line (the Loc(action) mechanism)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will adjust this.

break;
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also warn/error the user if he/she provides a "tensor" that is not supported at all

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user misspells "tensor" in the input declaration, this will already be caught in str_to_mode in input.cc. I added another warning for the specified tensor type if the parser can't determine it.

Makefile Outdated
Comment on lines 338 to 344
test-pytorch:
cd testdata/regresstest &&\
$(SHELL) test_pytorch_module.sh $(TRUTH_DIR)

test-pytorch-batched:
cd testdata/regresstest &&\
$(SHELL) test_pytorch_module.sh $(TRUTH_DIR) ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we merge both tests into just one target, to reduce the risk of missing execution of a test a little?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will merge them.


GAPC=../../../gapc
MAKE="make"
PYTHON="python3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it necessary to specify 3 at the end. Shouldn't python be sufficient if we install the correct dependency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work with just python as well. I think that the official interpreter binary name is still python3, so that should always work. Just python probably works in most shells as well though.

equal: bool = True
for matrix, reference in zip(bw_matrices, reference_matrices):
# make sure both matrices have the same dtype (float32)
# and don't contain any inf vals (bug if so?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should check what happens for triangular matrices, i.e. single track, for cells i > j. These cells might be flagged as +/-inf

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will eventually, but as of right now I don't of a test program that makes use of derivatives and uses just one track / has triangular matrices. But any inf values will be set to 0 in one of the next couple of lines in this loop, so they wouldn't cause any trouble here.

# tests for regular (single) input and batched inputs
# for 1st and 2nd derivative
# (Truth files are in gapc-test-suite/Truth/Regress)
TESTS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we extend this great framework to variable source input? Sometime in the future, I'd like to also process RNA folding algorithms this way. How much effort would it be to also subject testdata/grammars_outside/nodangle.gap to your mechanism?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean. Are you asking if this feature can also convert programs like nodangle to Python packages? In general, this feature should be able to convert any GAP-L program that also works with the --derivative option into a Python package. If nodangle can be rewritten in a differentiable way that makes use of the --derivative option, that this feature should also be able to handle e.g. nodangle.

return x * exp(-2.0);
}
float Ers(<alphabet a, alphabet b>, <tensorslice locA, tensorslice locB>, float x) {
if (equal(a, b)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to overload the == operator appropriately, as this would be more natural for programmers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

algebra alg_count auto count;

algebra alg_hessian implements sig_alignments(alphabet=tensorchar, answer=F64batch) {
F64batch Ins(<alphabet a, void>, <tensorslice locA, tensorslice locB>, F64batch x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to myself (@sjanssen2) shouldn't it be possible to type answer and alphabet here in the source code and later replace all answer with what is provide in the first line of the algebra definition in the AST, here F64batch?!

rtlib/traces.hh Outdated
#include <utility>
#include <cassert>

#define MAX_INDEX_COMPONENTS 3 // max indices for an index_components object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we need to consider multi track programs for which some DP matrices can be quadratic, linear or even constant. I feel that restricting the number of index components to 3 might be too restrictive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already fixed in the latest traces.hh adjustment.

fymue added 6 commits July 10, 2023 21:22
…e PyTorch/Tensor-related GAP-L test programs to testdata/grammar_pytorch; bundle batched/non-batched test into 1; improve tensor input declaration error messages
…tes with differently-sized index components; substitute some copies for moves in the trace mechanism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants