replicate · markphelps · May 20, 2026 · May 20, 2026
@@ -12,7 +12,7 @@ You can deploy your packaged model to your own infrastructure, or to [Replicate]
 
 - ✅ **Define the inputs and outputs for your model with standard Python.** Then, Cog generates an OpenAPI schema and validates the inputs and outputs.
 
-- 🎁 **Automatic HTTP prediction server**: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server.
+- 🎁 **Automatic HTTP inference server**: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server.
 
 - 🚀 **Ready for production.** Deploy your model anywhere that Docker images run. Your own infrastructure, or [Replicate](https://replicate.com).
 
@@ -31,35 +31,35 @@ build:
 run: "run.py:Runner"
 ```
 
-Define how predictions are run on your model with `run.py`:
+Define how your model runs with `run.py`:
 
 ```python
 from cog import BaseRunner, Input, Path
 import torch
 
 class Runner(BaseRunner):
     def setup(self):
-        """Load the model into memory to make running multiple predictions efficient"""
+        """Load the model into memory to make running multiple inferences efficient"""
         self.model = torch.load("./weights.pth")
 
     # The arguments and types the model takes as input
     def run(self,
           image: Path = Input(description="Grayscale input image")
     ) -> Path:
-        """Run a single prediction on the model"""
+        """Run the model"""
         processed_image = preprocess(image)
         output = self.model(processed_image)
         return postprocess(output)
 ```
 
 In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model.
 
-Now, you can run predictions on this model:
+Now, you can run the model:
 
 ```console
 $ cog run -i image=@input.jpg
 --> Building Docker image...
---> Running Prediction...
+--> Running...
 --> Output written to output.jpg
 ```
 

@@ -215,13 +215,13 @@ cog push [IMAGE] [flags]
 
 ## `cog run`
 
-Run a prediction.
+Run the model.
 
-If 'image' is passed, it will run the prediction on that Docker image.
+If 'image' is passed, it will run the model on that Docker image.
 It must be an image that has been built by Cog.
 
 Otherwise, it will build the model in the current directory and run
-the prediction on that.
+it.
 
 ```
 cog run [image] [flags]
@@ -230,7 +230,7 @@ cog run [image] [flags]
 **Examples**
 
 ```
-  # Run a prediction with named inputs
+  # Run the model with named inputs
   cog run -i prompt="a photo of a cat"
 
   # Pass a file as input
@@ -268,7 +268,7 @@ cog run [image] [flags]
 
 ## `cog serve`
 
-Run a prediction HTTP server.
+Run an HTTP server.
 
 Builds the model and starts an HTTP server that exposes the model's inputs
 and outputs as a REST API. Compatible with the Cog HTTP protocol.

@@ -1,11 +1,11 @@
 # Deploy models with Cog
 
 Cog containers are Docker containers that serve an HTTP server
-for running predictions on your model.
+for running your model.
 You can deploy them anywhere that Docker containers run.
 
-The server inside Cog containers is **coglet**, a Rust-based prediction server
-that handles HTTP requests, worker process management, and prediction execution.
+The server inside Cog containers is **coglet**, a Rust-based inference server
+that handles HTTP requests, worker process management, and run execution.
 
 This guide assumes you have a model packaged with Cog.
 If you don't, [follow our getting started guide](getting-started-own-model.md),
@@ -19,7 +19,7 @@ First, build your model:
 cog build -t my-model
 ```
 
-You can serve predictions locally with `cog serve`:
+You can serve your model locally with `cog serve`:
 
 ```console
 cog serve
@@ -54,7 +54,7 @@ To stop the server, run:
 docker kill my-model
 ```
 
-To run a prediction on the model,
+To run the model,
 call the `/predictions` endpoint,
 passing input in the format expected by your model:
 
@@ -79,7 +79,7 @@ The response includes a `status` field with values like `STARTING`, `READY`, `BU
 
 ## Concurrency
 
-By default, the server processes one prediction at a time. To enable concurrent predictions, set the `concurrency.max` option in `cog.yaml`:
+By default, the server processes one run at a time. To enable concurrent runs, set the `concurrency.max` option in `cog.yaml`:
 
 ```yaml
 concurrency:

@@ -44,7 +44,7 @@ The `dist` option searches for wheels in:
 
 ### `COGLET_WHEEL`
 
-Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based prediction server.
+Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based inference server.
 
 **Supported values:** Same as `COG_SDK_WHEEL`
 

@@ -27,7 +27,7 @@ sudo chmod +x /usr/local/bin/cog
 To configure your project for use with Cog, you'll need to add two files:
 
 - [`cog.yaml`](yaml.md) defines system requirements, Python package dependencies, etc
-- [`run.py`](python.md) describes the prediction interface for your model
+- [`run.py`](python.md) describes the run interface for your model
 
 Use the `cog init` command to generate these files in your project:
 
@@ -74,31 +74,31 @@ This is handy for ensuring a consistent environment for development or training.
 
 With `cog.yaml`, you can also install system packages and other things. [Take a look at the full reference to see what else you can do.](yaml.md)
 
-## Define how to run predictions
+## Define how to run your model
 
-The next step is to update `run.py` to define the interface for running predictions on your model. The `run.py` generated by `cog init` looks something like this:
+The next step is to update `run.py` to define the interface for running your model. The `run.py` generated by `cog init` looks something like this:
 
 ```python
 from cog import BaseRunner, Path, Input
 import torch
 
 class Runner(BaseRunner):
     def setup(self):
-        """Load the model into memory to make running multiple predictions efficient"""
+        """Load the model into memory to make running multiple inferences efficient"""
         self.net = torch.load("weights.pth")
 
     def run(self,
             image: Path = Input(description="Image to enlarge"),
             scale: float = Input(description="Factor to scale image by", default=1.5)
     ) -> Path:
-        """Run a single prediction on the model"""
+        """Run the model"""
         # ... pre-processing ...
         output = self.net(input)
         # ... post-processing ...
         return output
 ```
 
-Edit your `run.py` file and fill in the functions with your own model's setup and prediction code. You might need to import parts of your model from another file.
+Edit your `run.py` file and fill in the functions with your own model's setup and run code. You might need to import parts of your model from another file.
 
 You also need to define the inputs to your model as arguments to the `run()` function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:
 
@@ -121,7 +121,7 @@ You can provide more information about the input with the `Input()` function, as
 - `choices`: For `str` or `int` types, a list of possible values for this input.
 - `deprecated`: Mark this input as deprecated with a message explaining what to use instead.
 
-There are some more advanced options you can pass, too. For more details, [take a look at the prediction interface documentation](python.md).
+There are some more advanced options you can pass, too. For more details, [take a look at the run interface documentation](python.md).
 
 Next, add the line `run: "run.py:Runner"` to your `cog.yaml`, so it looks something like this:
 
@@ -132,7 +132,7 @@ build:
 run: "run.py:Runner"
 ```
 
-That's it! To test this works, try running a prediction on the model:
+That's it! To test this works, try running the model:
 
 ```
 $ cog run -i image=@input.jpg

@@ -85,11 +85,11 @@ Type "help", "copyright", "credits" or "license" for more information.
 
 Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.
 
-## Run predictions on a model
+## Run a model
 
-Let's pretend we've trained a model. With Cog, we can define how to run predictions on it in a standard way, so other people can easily run predictions on it without having to hunt around for a prediction script.
+Let's pretend we've trained a model. With Cog, we can define how to run it in a standard way, so other people can easily run it without having to hunt around for a run script.
 
-We need to write some code to describe how predictions are run on the model.
+We need to write some code to describe how the model runs.
 
 Save this to `run.py`:
 
@@ -107,13 +107,13 @@ WEIGHTS = models.ResNet50_Weights.IMAGENET1K_V1
 
 class Runner(BaseRunner):
     def setup(self):
-        """Load the model into memory to make running multiple predictions efficient"""
+        """Load the model into memory to make running multiple inferences efficient"""
         self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
         self.model = models.resnet50(weights=WEIGHTS).to(self.device)
         self.model.eval()
 
     def run(self, image: Path = Input(description="Image to classify")) -> dict:
-        """Run a single prediction on the model"""
+        """Run the model"""
         img = Image.open(image).convert("RGB")
         preds = self.model(WEIGHTS.transforms()(img).unsqueeze(0).to(self.device))
         top3 = preds[0].softmax(0).topk(3)
@@ -174,7 +174,7 @@ Note: The first time you run `cog run`, the build process will be triggered to g
 
 ## Build an image
 
-We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves predictions with an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time predictions.
+We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time inference.
 
 ```bash
 cog build -t resnet