
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0" # For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0 # The available Visual Foundation Models can be found in the following table # The model and device are separated by underline '_', the different models are separated by comma ',' # Visual Foundation Model to use and where it will be loaded to # You can specify the GPU/CPU assignment by "-load", the parameter indicates which # prepare your private OpenAI key (for Linux)Įxport OPENAI_API_KEY= On the other hand, Foundation Models serve as domain experts by providing deep knowledge in specific domains.īy leveraging both general and deep knowledge, we aim at building an AI that is capable of handling various tasks. On the one hand, ChatGPT (or LLMs) serves as a general interface that provides a broad and diverse understanding of a TaskMatrix needs the effort of the community! We crave your contribution to add new and interesting features!

TaskMatrix will return the detection or segmentation result! Then, say find xxx in the image or segment xxx in the image.Firstly, run python visual_chatgpt.py -load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0".For the image editing case, GroundingDINO is first used to locate bounding boxes guided by given text, then segment-anything is used to generate the related mask, and finally stable diffusion inpainting is used to edit image based on the mask. Now TaskMatrix supports GroundingDINO and segment-anything! Thanks for his efforts.

See our paper: Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models Updates:

RECIEVING VIRTUAL HUG GIF SERIES
TaskMatrix connects ChatGPT and a series of Visual Foundation Models to enable sending and receiving images during chatting.
