Increase the inference speed of the model
-
Updated
Jun 7, 2022 - Python
Increase the inference speed of the model
trying to write a mini triton backend in rust
Deploy KoGPT with Triton Inference Server
A library for interfacing with Triton.
This bootcamp is designed to give NLP researchers an end-to-end overview on the fundamentals of NVIDIA NeMo framework, complete solution for building large language models. It will also have hands-on exercises complimented by tutorials, code snippets, and presentations to help researchers kick-start with NeMo LLM Service and Guardrails.
Learnings and experimentation with GPU programming
WPF application for editing XML based configuration files
Run CI jobs in Manta when trigger by Pull Requests
Package for running Nvidia Triton within python test with features like Dockerfile DSL and building images on fly.
This repository contains everything regarding the bachelor thesis: NLPiP (NLP in Production).
Framework, Model & Kernel Optimizations for Distributed Deep Learning - Data Hack Summit
Manta adapter for Spine models running in NodeJS
Add Some plus extra features to transformers
The benchmark for OpenAI Triton.
Add a description, image, and links to the triton topic page so that developers can more easily learn about it.
To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."