This project is a FastAPI server that provides endpoints for generating text, image, and combined (text + image) embeddings using the clip-ViT-B-32 model from SentenceTransformers.
Make sure you have Python 3.7+ installed.
Install all dependencies using the provided batch file:
requirementsInstall.batTo start the FastAPI server, simply run:
start.batOnce the server is running, you can access the API locally at:
http://127.0.0.1:8000
The automatic interactive Swagger UI is available at:
http://127.0.0.1:8000/docs
-
Method: POST
-
Body (form):
text: string
-
Returns: Normalized text embedding
-
Method: POST
-
Body (form-data):
file: image file (JPG, PNG, etc.)
-
Returns: Normalized image embedding
-
Method: POST
-
Body (form-data):
text: stringfile: image file
-
Returns: Averaged embedding from both image and text
- Model used:
clip-ViT-B-32 - Supports GPU acceleration if available (automatically detected).
- CORS is enabled for
*(can be adjusted in code).
curl -X POST http://127.0.0.1:8000/embed-text -F "text=A dog playing guitar"