ToDo:
- Add images
- Info
- Update references to this file
- HuggingFace
- hdf5
Guide to add pretrained ML models (from Hugging Face, hdf5 format, pickle format) to do inferences through an API.
- Select a Pretrained Model:
- Choose a pretrained model from the Hugging Face Model Hub that suits your task and language requirements. For example, you can choose models for text classification, named entity recognition, question answering, etc.
Hugging Face Model Hub.
- In this case we will use the "Pythia-70m" model, a recent text and code generation model and developed by EleutherAI. See https://huggingface.co/EleutherAI/pythia-70m.
Hugging Face: Pythia-70m by EleutherAI.
- Most of the models in HuggingFace have a 'Quickstart' section where you can see how to load the model and use it in a python script. In this type of models you need a model and a tokenizer to make inferences using it.
Hugging Face: Pythia-70m code example.
- Load the Pretrained Model:
- Use the Hugging Face library to load the pretrained model of your choice into your API code. You can use
pip
to install thetransformers
library.
- Use the Hugging Face library to load the pretrained model of your choice into your API code. You can use
Note: if you need to export model to use it with onnx or ov, verify its model_type or architecture is supported with currently transformers version. See ../experiments/re_onnxruntime/check_architecture_support.py
-
Add new model class. File: ../app/models.py
- Add a new class according to your new model, parent class is
Model()
. In this case we will addPythia_70mModel(Model)
. - Make sure
NewModel.predict()
method is implemented according to the model. See below the predict() method, it receives a string and returns a dictionary with the prediction.- Load model
- Load tokenizer
- Tokenize input
- Inference using the loaded model and the input
- Return prediction
- generic class for text-generation models: SLMModel()
class Pythia_70mModel(Model): """_summary_ Creates a Pythia model. Inherits from Model() Args: Model (_type_): _description_ """ def __init__(self): super().__init__(models_names.Pythia_70m, ML_task.CODE) def predict(self, user_input: str): model = GPTNeoXForCausalLM.from_pretrained( "EleutherAI/pythia-70m", revision="step3000", cache_dir="./pythia-70m/step3000", ) tokenizer = AutoTokenizer.from_pretrained( "EleutherAI/pythia-70m", revision="step3000", cache_dir="./pythia-70m/step3000", ) text = "def get_random_element(dictionary):" text = user_input inputs = tokenizer(text, return_tensors="pt") tokens = model.generate(**inputs) response = { "prediction" : tokenizer.decode(tokens[0]), } return response
- Add a new class according to your new model, parent class is
-
Create a model schema. File: ../app/schemas.py
- Create a schema with one example. See below example for Pythia-70m model.
- Generic text-generation schema: PredictSLM()
class PredictPythia_70m(BaseModel): input_text: str class Config: schema_extra = { "example": { "input_text": "def hello_world():", } }
-
Add endpoint. File: ../app/api.py
- According to the new model information (Model Class and schema), add the endpoint to enable POST requests and make predictions using the model.
- Here you need to create an instance of the Model, pass the user input to the
model.predict()
method and return the prediction in the response dictionary that the POST method will return.
@app.post("/huggingface_models/Pythia_70m", tags=["Hugging Face Models"]) @construct_response def _predict_pythia_70m(request: Request, payload: PredictPythia_70m): """T5 model.""" input_text = payload.input_text print("Input text") print(input_text) model = Pythia_70mModel() print(f"Model: {model.name}") if input_text: prediction = model.predict(input_text) response = { "message": HTTPStatus.OK.phrase, "status-code": HTTPStatus.OK, "data": { "model-type": model.name, "input_text": input_text, "prediction": prediction, }, } else: response = { "message": "Model not found", "status-code": HTTPStatus.BAD_REQUEST, } return response
-
Deploy and Test the API:
- python start_server.py or check below:
- Deploy your API to a server or cloud platform such as GCP, AWS, or Azure. Test the API by sending requests with sample input to ensure it is functioning as expected.
- See manuals to check our guides to create the API or deploy the API.
- In this case we run the API using:
-
uvicorn app.api:app --host 0.0.0.0 --port 8000 --reload --reload-dir . --reload-dir app
- Go to the Swagger UI
- Select the model and click on the 'Try it out' button. Note that the endpoint must previously be defined in api.py
Swagger UI.
Execute prediction using Pythia-70m.
- Verify response
Response using Pythia-70m.
-
Create a new directory for your API project called
models
and add here your pretrained model (h5 file). -
Load the Pretrained Model:
- Use the keras library to load the pretrained model in h5 format into your API code. You can use
pip
to install thetensorflow.keras
library.
- Use the keras library to load the pretrained model in h5 format into your API code. You can use
-
Add new model class. File: ../app/models.py
- Add a new class according to your new model, parent class is
Model()
. In this case we will addCNNModel(Model)
. - Make sure
NewModel.predict()
method is implemented according to the model. See below thepredict()
method, it receives a string and returns a dictionary with the prediction. - In this case, we are using a CNN model for Image Classification using Fashion-MNIST dataset. The input is a number from the user and inside the definition of predict() method, there is a process to select one input from a test set according to the input number, parse the image in order to have the input shape of the model and pass it to the model to get a prediction, which will be the output class.
- Load model
- Parse input to meet the input criteria
- Inference using the loaded model and the input
- Return prediction
class CNNModel(Model): """ Creates a LM Bert model. Inherits from Model() """ def __init__(self): super().__init__(models_names.CNN, ML_task.CV) def predict(self, user_input: str, n = 5): dataset = "fashion" label_names = { 0: "T-shirt/top", 1: "Trouser", 2: "Pullover", 3: "Dress", 4: "Coat", 5: "Sandal", 6: "Shirt", 7: "Sneaker", 8: "Bag", 9: "Ankle boot" } saved_model_dir = f"models/model_{dataset}.h5" # Get test set fashion_mnist=keras.datasets.fashion_mnist (_, _), (x_test, y_test) = fashion_mnist.load_data() # load model try: print(saved_model_dir) model = keras.models.load_model(saved_model_dir) print("Model loaded correctly") except: print("There is a problem with the file path") def see_x_image(x,y,name=None,caption=True, save_dir="."): ''' See image ''' plt.figure() plt.imshow((x.reshape((28,28))).astype("uint8")) title=str(y) if name: title += " "+name plt.title(title) if caption: plt.title(title) print(save_dir) plt.savefig(save_dir+"/"+dataset+"_image"+ ".png") plt.axis("off") if int(user_input) <= len(x_test): ran = int(user_input) print(" User entered ",user_input) else: # Get random number between 0 and len(x_test) ran = random.randint(0, len(y_test)) print("Using random") print(ran) print(f"label of selected input: {y_test[ran]}") #print(list(y_test[ran]).index(max(y_test[ran]))) label_name = label_names[y_test[ran]] see_x_image(x_test[ran],y_test[ran],label_name,save_dir="./") # Inference # predict with that random x_test = x_test.reshape(-1, 28, 28, 1) print(x_test[ran:ran+1].shape) model_predict = model.predict(x_test[ran:ran+1]) print(f"model_predict: {model_predict}") cat_pred = np.argmax(model_predict) print(f"argmax: {cat_pred}") label = f"{cat_pred} : {label_names[cat_pred]}" is_correct = False if y_test[ran] == cat_pred: is_correct = True print("Prediction: ",cat_pred) print("Prediction clothes: ", label_names[cat_pred]) print("Correct label: ",y_test[ran]) print(f"is_correct: ", is_correct) response = { "prediction": label, "is_correct": is_correct, } return response
- Add a new class according to your new model, parent class is
-
Create a model schema. File: ../app/schemas.py
- Create a schema with one example. See below example for CNN model.
class PredictCNN(BaseModel): input_text: str class Config: schema_extra = { "example": { "input_text": "10", } }
-
Add endpoint. File: ../app/api.py
- According to the new model information (Model Class and schema), add the endpoint to enable POST requests and make predictions using the model.
- Here you need to create an instance of the Model, pass the user input to the
model.predict()
method and return the prediction in the response dictionary that the POST method will return.
@app.post("/h5_models/cnn_fashion", tags=["H5 Models"]) @construct_response def _predict_cnn(request: Request, payload: PredictCNN): """CNN model.""" input_text = payload.input_text print("Input text") print(input_text) model = CNNModel() print(f"Model: {model.name}") if input_text: model_response = model.predict(input_text) response = { "message": HTTPStatus.OK.phrase, "status-code": HTTPStatus.OK, "data": { "model-type": model.name, "input_text": input_text, "prediction": model_response['prediction'], "is_correct": model_response['is_correct'], }, } else: response = { "message": "Model not found", "status-code": HTTPStatus.BAD_REQUEST, } return response
-
Deploy and Test the API:
- Deploy your API to a server or cloud platform such as GCP, AWS, or Azure. Test the API by sending requests with sample input to ensure it is functioning as expected.
- See manuals to check our guides to create the API or deploy the API.
- In this case we run the API using:
-
uvicorn app.api:app --host 0.0.0.0 --port 8000 --reload --reload-dir . --reload-dir app
- Go to the Swagger UI
- Select the model and click on the 'Try it out' button. Note that the endpoint must previously be defined in api.py
Swagger UI: CNN model.
Execute prediction using h5 CNN model.
- Verify response
Response using h5 CNN model.