▮ Advantage Of Packaging
Packaging ML models means getting a model into a container to take advantage of the following.
- You can run a container locally as long as container runtime is installed
- You can easily distribute and share your container
- There are many options to deploy a container in the cloud and scale it if necessary
- The deployment method becomes less complicated which makes it easier to debug and maintain
Here is one example to containerize an NLP model using Docker and Flask.
Reference: Practical MLOps
▮ Directory Structure
This is the final directory structure we want to make.
Working Directory:.
│ Dockerfile
│ requirements.txt
│
└─webapp
app.py
roberta-sequence-classification-9.onnx
▮ Dockerfile
First, we will create a Dockerfile to install everything inside the container.
FROM python:3.8 #use python image
COPY ./requirements.txt /webapp/requirements.txt
WORKDIR /webapp
RUN pip install -r requirements.txt
COPY webapp/* /webapp
ENTRYPOINT [ "python" ]
CMD ["app.py"]
▮ requirements.txt
Here we specify the list of libraries we will need to package the model.
simpletransformers==0.4.0
tensorboardX==1.9
transformers==2.1.0
flask==2.1.0
torch==1.7.1
onnxruntime==1.7.0
▮ app.py
We will create a simple flask app with a predict() function which links with the /predict URL.
from flask import Flask, request, jsonify
import torch
import numpy as np
from transformers import RobertaTokenizer
import onnxruntime
app = Flask(__name__)
tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
session = onnxruntime.InferenceSession(
"roberta-sequence-classification-9.onnx")
@app.route("/predict", methods=["POST"])
def predict():
input_ids = torch.tensor(tokenizer.encode(
request.json[0], add_special_tokens=True)).unsqueeze(0)
if input_ids.requires_grad:
numpy_func = input_ids.detach().cpu().numpy()
else:
numpy_func = input_ids.cpu().numpy()
name = session.get_inputs()[0].name
inputs = {session.get_inputs()[0].name: numpy_func}
out = session.run(None, inputs)
result = np.argmax(out)
return jsonify({"positive": bool(result)})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)
▮ ONNX model
For this post, I am using an ONNX model called RoBERTa(Sequence Classification Model). Download the model from this link and place it under the “webapp” directory.
▮ Getting Inference Result
Now that everything is all set, we will first build the docker container and tag it as “roberta“.
docker build -t roberta .
Then run the container
docker run -it -p 5000:5000 --rm roberta
If the container is running successfully it should show a log like below.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://172.17.0.2:5000
Finally, use curl to send an HTTP request.
I am using Window’s command prompt. the code below might slightly change depending on your OS.
curl -X POST -H "Content-Type: application/JSON" --data "[\"Packing ML Models is important\"]" http://127.0.0.1:5000/predict
The flask app should return a log like below.
{
"positive": true
}