This project shows how speedster
can accelerate spaCy's WikiNER pipeline.
Speedster is an open-source tool designed to accelerate AI inference of deep learning models in a few lines of code. Within the WikiNER pipeline, speedster
optimizes BERT to achieve the maximum acceleration physically possible on the hardware used.
Speedster
is built on top of Nebullvm, an open-source framework for building AI-optimization tools.
Further info on the WikiNER pipeline can be found in this section.
The project.yml
defines the data assets required by the
project, as well as the available commands and workflows. For details, see the
Weasel documentation.
The following commands are defined by the project. They
can be executed using weasel run [name]
.
Commands are only re-run if their inputs have changed.
Command | Description |
---|---|
corpus |
Convert the data to spaCy's format |
train |
Train the full pipeline and optimize the transformer model for inference |
evaluate |
Evaluate on the test data and save the metrics |
clean |
Remove intermediate files |
The following workflows are defined by the project. They
can be executed using weasel run [name]
and will run the specified commands in order. Commands are only re-run if their
inputs have changed.
Workflow | Steps |
---|---|
all |
corpus → train → evaluate |
The following assets are defined by the project. They can
be fetched by running weasel assets
in the project directory.
File | Source | Description |
---|---|---|
assets/aij-wikiner-en-wp2.bz2 |
URL |
Before running the WikiNER pipeline, speedster must be installed. Speedster can be easily installed using pip
:
pip install speedster
Some of the speedster components required for inference optimization can be installed using the nebullvm
auto-installer module.
python -m nebullvm.installers.auto_installer --frameworks torch onnx huggingface --compilers all
If you are interested in installing just a part of the compilers supported by speedster
you can replace the all
keyword with the wanted compilers. Further info can be found in the speedster documentation.
When tested, speedster accelerated the WikiNER pipeline between 20% and 80% with no impact on model performance. The library could further accelerate deep learning model inference by applying more aggressive optimization techniques, which may result in a slight change in model performance. For more information, refer to the speedster documentation.
Below are the response times of the WikiNER pipeline in milliseconds (ms).
Hardware | Original latency [ms] | Speedster optimized latency [ms] | Speedster speed-up |
---|---|---|---|
Intel | 139 | 114 | 1.2x |
AMD | 293 | 162 | 1.8x |
Nvidia RTX 3090Ti | 24.1 | 14.1 | 1.7x |
M1 Pro | 143 | 121 | 1.2x |