Cog containers are Docker containers that serve an HTTP server for running your model. You can deploy them anywhere that Docker containers run.
The server inside Cog containers is coglet, a Rust-based inference server that handles HTTP requests, worker process management, and run execution.
This guide assumes you have a model packaged with Cog. If you don't, follow our getting started guide, or use an example model.
First, build your model:
cog build -t my-modelYou can serve your model locally with cog serve:
cog serve
# or, from a built image:
cog serve my-modelAlternatively, start the Docker container directly:
# If your model uses a CPU:
docker run -d -p 5001:5000 my-model
# If your model uses a GPU:
docker run -d -p 5001:5000 --gpus all my-modelThe server listens on port 5000 inside the container (mapped to 5001 above).
To view the OpenAPI schema, open localhost:5001/openapi.json in your browser or use cURL to make a request:
curl http://localhost:5001/openapi.jsonTo stop the server, run:
docker kill my-modelTo run the model,
call the /predictions endpoint,
passing input in the format expected by your model:
curl http://localhost:5001/predictions -X POST \
--header "Content-Type: application/json" \
--data '{"input": {"image": "https://.../input.jpg"}}'For more details about the HTTP API, see the HTTP API reference documentation.
The server exposes a GET /health-check endpoint that returns the current status of the model container. Use this for readiness probes in orchestration systems like Kubernetes.
curl http://localhost:5001/health-checkThe response includes a status field with values like STARTING, READY, BUSY, SETUP_FAILED, or DEFUNCT. See the HTTP API reference for full details.
By default, the server processes one run at a time. To enable concurrent runs, set the concurrency.max option in cog.yaml:
concurrency:
max: 4See the cog.yaml reference for more details.
You can configure runtime behavior with environment variables:
COG_SETUP_TIMEOUT: Maximum time in seconds for thesetup()method (default: no timeout).
See the environment variables reference for the full list.