The mtasklite
library provides enjoyable parallelization of iterating through an iterable with or without a progress bar. It is inspired by the simplicity of the great pqdm
library, but it improves upon pqdm
in several ways, in particular, by supporting object-based (stateful) workers, truly "lazy" iteration, and context managers (i.e., a support for with-statement
). Stateful workers are implemented using the cool concept of delayed initialization, which is effortlessly enabled by adding @delayed_init
decorator to a worker class definition.
Supporting object-based workers enables:
- Using different GPUs, models, or network connections in different workers.
- Efficient initialization of workers: If the worker needs to load a model (which often takes quite a bit of time), it will be done once (per process/thread) before processing input items.(examples/mtasklite_pqdm_spacy_tokenization_demo.ipynb) for an example.
- Logging and bookkeeping: Each worker is represented by an object that "lives" as long as we have items to process (data can be stored in the object attributes).
The mtasklite
package provides pqdm-compatibility wrappers, which can be used as a (nearly) drop-in replacement of pqdm
. For an overview of differences, and a list of features, please, refer to the documentation in the GitHub repository.
This library is replacing py_stateful_map
. The objective of this replacement to provide a more convenient and user-friendly interface as well as to fix several issues.
A huge shoutout to the creators for the multiprocess library, which is a drop-in replacement of the standard Python multiprocessing
library, which resolves various pickling issues that arise on non-Unix platforms (when a standard multiprocessing library is used). Thanks to their effort, mtasklite
works across Linux, Windows, and MacOS.