The graph transformer can be controlled through options. For example, one can use options to set the level of parallelism to use or to specify the name of the output file.
The method of providing options depends on how the graph transformer is invoked; whether it is invoked from a Python environment or from a command-line environment. In both cases, the same options can be provided. Below, we first describe how to pass options from Python and command-line arguments respectively, then a list of most-used options, and then a list of all options.
The Python interface "xmos-ai-tools" available through PyPi contains the xcore optimiser (xformer) for optimising suitable tflite models. This module can be imported using:
from xmos_ai_tools import xformer
The main method in xformer is convert, which requires an path to an input model, an output path, and a list of parameters. The list of parameters should be a dictionary of options and their value.
xf.convert("example_int8_model.tflite", "xcore_optimized_int8_model.tflite",
params = {
"xcore-thread-count": 4,
"xcore-reduce-memory": None,
}
)
The possible options are described below in the command line interface section. If the default operation is intended this third argument can be "None".
xf.convert("example_int8_model.tflite", "xcore_optimized_int8_model.tflite",
params = None
)
Upon installing the "xmos-ai-tools" from PyPi, the program xcore-opt
is
available on the command-line. It is called with at least two arguments (the
input model and the output model), and all other configuration options are specified with a --
ahead of it, eg:
xcore-opt example_int8_model.tflite -o output_model.tflite --xcore-thread-count 4 --xcore-reduce-memory
Name of the file where to place the optimized TFLITE model
Number of threads to translate for (max=5). Defaults to 1.
File to place the learned parameters in. If this option is not specified, the learned parameters are kept by the model. This will increase the amount of RAM required by the model but is very fast. When this option is used, the learned parameters are placed in a file that must be flashed onto the hardware, and the learned parameters will be streamed from flash. This can be slower but allows large numbers of learned parameters to be used.
Sets a threshold under which to not place learned parameters in flash. The
default is set to 96 bytes. If less than 96 bytes, the overhead of lowering to flash is
more than the benefit gained. This option is only meaningful if
xcore-weights-file
has been used. You can experiment with this
parameter to get a different trade-off between speed and memory requirements.
Try to reduce memory usage by possibly increasing execution time. Default is true
When optimising convolutions small inaccuracies are introduced, due to the nature of fixed point comitations. These errors are typically small and happen infrequently. The default threshold is 0.25, meaning that the largest error that is acceptable is two bits below the decimal comma (in integer arithmetic). If an error higher than this occurs, the compiler will fall back on a less optimal convolution that produces a better result.
You can adjust this parameter to get a different trade-off between execution speed and accuracy of the result.
By default a the above option calculates an upper-bound for the error. Setting this option calculates the precise maximum error at the cost of (significant) extra compile-time.
There are networks where large errors in a layer can be fixed by changing the quantization. This option limits outliers in the multipliers of a convolution to a factor of N larger than the minimum. THe default for N is 0x7fffffff (ie, no limit).
Normally the TFLITE model is minified, by reducing string lengths, using this option enables you to keep the old strings.