Transformer-based language model trained on multiple tasks including summarization, sentiment-analysis, q&a, translation etc. The implementation in this repo is an adaptation of the onnxt5 repo which makes the export and use of T5 with ONNX easier.
T5 is a transformer model which aims to provide great flexibility and provide better semantic understanding through the training of multiple tasks at once.
Model | Download | Compressed | ONNX version | Opset version |
---|---|---|---|---|
T5-encoder | 650.6 MB | 205.0 MB | 1.6 | 12 |
T5-decoder-with-lm-head | 304.9 MB | 304.9 MB | 1.6 | 12 |
Huggingface PyTorch T5 + script changes ==> ONNX T5-encoder
Huggingface PyTorch T5 + script changes ==> ONNX T5-decoder-with-lm-head
Script changes include:
- reshaping the Huggingface models to combine the lm head with the decoder to allow for a unified model
- reshaping the encoder to output the hidden state directly
The script for ONNX model conversion and ONNX Runtime inference is here. More complete utilities to export and use the models are maintained in the onnxt5 repo.
This implementation takes as inputs a prompt which begins by the task at hand here. Examples of some tasks include summarize: <PROMPT>
,
translate English to French: <PROMPT>
, cola sentence: <PROMPT>
, etc.
For the full list of task you can refer to the appendix D of the original paper.
The easiest way to use the model is to use the onnxt5 utilities (installation instructions: pip install onnxt5
).
In that case you can use the model with the following piece of code:
from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
# output_text: "J'ai été victime d'une série d'accidents."
Or if you wish to produce the embeddings of a sentence:
from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
prompt = 'Listen, Billy Pilgrim has come unstuck in time.'
encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)
Otherwise you can manually create the Generative model with the following:
from onnxruntime import InferenceSession
from transformers import T5Tokenizer
from .dependencies.models import GenerativeT5
tokenizer = T5Tokenizer.from_pretrained('t5-base')
# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers
# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default
# based on the build flags) when instantiating InferenceSession.
# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:
# InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])
decoder_sess = InferenceSession(str(path_t5_decoder))
encoder_sess = InferenceSession(str(path_t5_encoder))
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]
For the T5-encoder model:
last_hidden_state: Sequence of hidden-states at the last layer of the model. It's a float tensor of size (batch_size, sequence_length, hidden_size).
For T5-decoder-with-lm-head model:
logit_predictions: Prediction scores of the language modeling head. It's a float tensor of size (batch_size, sequence_length, vocab_size).
For the T5-encoder model:
last_hidden_states = model(input_ids)[0]
For the T5-decoder-with-lm-head model:
# To generate the encoder's last hidden state
encoder_output = encoder_sess.run(None, {"input_ids": input_ids})[0]
# To generate the full model's embeddings
decoder_output = decoder_sess.run(None, {
"input_ids": input_ids,
"encoder_hidden_states": encoder_output
})[0]
For the generative model, to generate a translation:
from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
The original model from Google Brain is pretrained on the Colossal Clean Crawled Corpus. The pretrained model is referenced in huggingface/transformers, trained on the same data.
Benchmarking can be run with the following script with initial results in this post.
This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.
@article{2019t5,
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
journal = {arXiv e-prints},
year = {2019},
archivePrefix = {arXiv},
eprint = {1910.10683},
}
This model is converted directly from huggingface/transformers.
Apache 2.0 License