How to Deploy and Run Pygmalion 6B model

April 29, 2023How to Deploy and Run Pygmalion 6B model

Deprecated: This blog article is deprecated. We strive to rapidly improve our product and some of the information contained in this post may no longer be accurate or applicable. For the most current instructions on deploying a model like Pygmalion 6B to Banana, please check our updated documentation.

In this tutorial we're going to demo how you can run Pygmalion 6B on Banana's serverless GPU infrastructure. This tutorial is very straightforward, it should take you approximately 15 minutes from start to finish.

Basic Python coding experience is helpful, but not required for this. AI beginners are welcome!

We're using a Community Template for this demo. Templates are user-submitted model repositories that are optimized and setup to run on Banana's serverless GPU infrastructure. All you have to do is click deploy and in a few minutes your model is ready to use on Banana. This is the Pygmalion 6B model template we will be using.

What is Pygmalion?

Pygmalion 6B is a model developed by the PygmalionAI team. It is a proof-of-concept dialogue model based on EleutherAI's GPT-J-6B. This model is NOT suitable for use by minors, it will output X-rated content under certain circumstances.

Pygmalion Deployment Tutorial

Step 1: Deploy the Pygmalion template

Sign up for Banana (if you haven't already), and deploy this Pygmalion 6B template. It takes one click and a few minutes to deploy.

Step 2: Write your test file for Pygmalion

Now that your model has been deployed, whip open your code editor and create your test file. You can take the sample code from the model template, write your own test file, or copy and paste our code snippet below (be sure to add your model key and API key).


import banana_dev as banana

api_key = "insert API key"

model_key = "insert model key"


print("this is Pygmalion! What is your prompt?")


def call_model(prompt):

out =, model_key, prompt)

return out


while True:

print("type your prompt:")


user_prompt = input('')

if user_prompt == "stop":


model_inputs = {

"max_new_tokens": 5,

"prompt": user_prompt,



model_response = call_model(prompt=model_inputs)



This is a slightly more complex code snippet than you need to test with. The main benefit to this code snippet is that we run a while loop that continuously asks for and takes in user prompts for our Pygmalion model.

Make sure your formatting matches the screenshot here:

code snippet of the demo code you need to run Pygmalion 6B on Banana's serverless GPU infrastructure.

Step 3: Run Pygmalion! (on Serverless GPUs)

We've made it to the end of the demo! All we need to do is run your Pygmalion model in production. To do this, make sure you:

Hint - to start a virtual environment, use these consecutive commands:

python3 -m venv venv

source venv/bin/activate


Congrats! You have successfully deployed and ran the Pygmalion 6B model on serverless GPUs. Thanks for following along!

Wrap Up

How did we do? Were you able to deploy Pygmalion? Drop us a message on our Discord or by tweeting us on Twitter. What other machine learning models would you like to see a deployment tutorial for? Let us know!