This API runs carrot, the state-of-the-art vision-language model.
It can perform image captioning, image QA and image-text similarity calculation.

Deprecation Note:

This free API has been deprecated after SDK version 3.0.1, and will not return if you call it from a more recent version.

To install a compatible SDK version, run:
pip3 install banana-dev==3.0.1

We keep this API live to support the builders who have been using it, but cannot promise constant uptime or maintenance, as we've shifted our engineering focus to our Serverless GPUs platform.

Add this to your python code:

import banana_dev as banana
api_key={YOUR API KEY}
model_parameters = {
                    "text":"is this a banana?", #text for QA / Similarity
                    "imageURL":"", #image for the model
                    "similarity":False, #whether to return text-image similarity
                    "maxLength":100, #max length of the generation
                    "minLength":30 #min length of the generation

#To generate captions, only send the image in model_parameters

out =, model_key, model_parameters)


Arg Description Required Type Example
api_key Your API key, found on the User Dashboard True string "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
model_key This models name True string "clip"
model_parameters Dictionary of custom tuning parameters False dict {"text": "banana","imageURL":""}