Banana

Carrot

This API runs carrot, the state-of-the-art vision-language model.
It can perform image captioning, image QA and image-text similarity calculation.

Add this to your python code:

import banana_dev as banana
api_key={YOUR API KEY}
model_key="carrot"
model_parameters = {
                    "text":"is this a banana?", #text for QA / Similarity
                    "imageURL":"https://demo-images-banana.s3.us-west-1.amazonaws.com/image1.jpg", #image for the model
                    "similarity":False, #whether to return text-image similarity
                    "maxLength":100, #max length of the generation
                    "minLength":30 #min length of the generation
                    }

#To generate captions, only send the image in model_parameters

out = banana.run(api_key, model_key, model_parameters)
print(out)

Arguments:

Arg Description Required Type Example
api_key Your API key, found on the User Dashboard True string "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'
model_key This models name True string "clip"
model_parameters Dictionary of custom tuning parameters False dict {"text": "banana","imageURL":"https://demo-images-banana.s3.us-west-1.amazonaws.com/image1.jpg"}