This API runs carrot, the state-of-the-art vision-language model.
It can perform image captioning, image QA and image-text similarity calculation.

Add this to your python code:

import banana_dev as banana
api_key={YOUR API KEY}
model_parameters = {
                    "text":"is this a banana?", #text for QA / Similarity
                    "imageURL":"", #image for the model
                    "similarity":False, #whether to return text-image similarity
                    "maxLength":100, #max length of the generation
                    "minLength":30 #min length of the generation

#To generate captions, only send the image in model_parameters

out =, model_key, model_parameters)


Arg Description Required Type Example
api_key Your API key, found on the User Dashboard True string "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
model_key This models name True string "clip"
model_parameters Dictionary of custom tuning parameters False dict {"text": "banana","imageURL":""}