How to Deploy GPT-JT to Production on Serverless GPUs

How to Deploy GPT-JT to Production on Serverless GPUs

graphic of a typewriter icon with our blog post title "GPT-JT deployment tutorial".

We're going to walkthrough how you can easily deploy GPT-JT to production. This tutorial is welcoming to all levels of knowledge and experience in AI. We'll be deploying GPT-JT through Banana's deployment framework, which you can consider the "template" to easily run most ML models on serverless GPUs.

We use this GPT-JT model from HuggingFace for the demo. Let's get into it!

What is GPT-JT?

GPT-JT is a fork from GPT-J-6B that was fine-tuned on 3.53 billion tokens, yet it outperforms most 100 billion+ parameter models at classification. Basically, this model packs a freaking punch. It was developed by Together, they have a wonderful release article that explains the model performance details quite well.

How to Deploy GPT-JT on Serverless GPUs

1. Fork Banana's GPT-JT Serverless Framework Repo

Fork this repository to your own private repo. A Banana discord member (s/o @lucataco) built this GPT-JT repo for anybody to use. Pretty sweet! Because this repo already is setup for the GPT-JT model this is going to be a super simple tutorial.

That said, we highly recommend that you review the documentation of our Serverless Framework. We breakdown and explain the framework components so you can better understand how it all operates in case you want to deploy other models that may not have as straight forward of a template repo available.

2. Create Banana Account and Deploy FLAN-T5

Now that the GPT-JT repository is cloned into your own private repo, you'll want to test the code before deploying to Banana in production. To do that, we suggest using Brev (follow this tutorial).

Once you have tested, login to your Banana Dashboard and navigate to the "Deploy" tab.

screenshot of the Banana dev deployment page.

Select "Deploy from GitHub", and choose your GPT-JT repository. Click "Deploy" and the model will start to build. The build process can take up to 1 hour (usually much less) so please be patient.

You'll see the Model Status change from "Building" to "Deployed" when it's ready to be called.

screenshot of model building status.

screenshot of model deployed status.

You can also monitor the status of your build in the Model Logs tab.

screenshot of banana model build logs.

3. Call your GPT-JT Model

After your model has built, it's ready to run in production! Jump over to the Banana SDK and select the programming language of your choice. Within the SDK you will see example code snippets of how you can call your GPT-JT model.

That's it! Congratulations on running GPT-JT on serverless GPUs. You are officially deployed in production!


Wrap Up

Reach out to us if you have any questions or want to talk about GPT-JT. We're around on our Discord or by tweeting us on Twitter. What other machine learning models would you like to see a deployment tutorial for? Let us know!