Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ For distributed development:
3. Build Store: `mvn clean package -pl hugegraph-store -am -DskipTests`
4. Build Server with HStore backend: `mvn clean package -pl hugegraph-server -am -DskipTests`

See Docker Compose example: `hugegraph-server/hugegraph-dist/docker/example/`
See Docker Compose examples: `docker/` directory. Single-node quickstart (pre-built images): `docker/docker-compose.yml`. Single-node dev build (from source): `docker/docker-compose-dev.yml`. 3-node cluster: `docker/docker-compose-3pd-3store-3server.yml`. See `docker/README.md` for full setup guide.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph references docker/docker-compose-dev.yml, but the docker/ directory in the repo doesn’t contain that file. Please add the missing compose file or update this guidance to only reference existing compose files.

Copilot uses AI. Check for mistakes.

### Debugging Tips

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,11 @@ docker run -itd --name=hugegraph -e PASSWORD=your_password -p 8080:8080 hugegrap

For advanced Docker configurations, see:
- [Docker Documentation](https://hugegraph.apache.org/docs/quickstart/hugegraph-server/#3-deploy)
- [Docker Compose Example](./hugegraph-server/hugegraph-dist/docker/example)
- [Docker README](hugegraph-server/hugegraph-dist/docker/README.md)
- [Docker Compose Examples](./docker/)
- [Docker README](./docker/README.md)
- [Server Docker README](hugegraph-server/hugegraph-dist/docker/README.md)

> **Docker Desktop (Mac/Windows)**: The 3-node distributed cluster (`docker/docker-compose-3pd-3store-3server.yml`) uses Docker bridge networking and works on all platforms including Docker Desktop. Allocate at least 12 GB memory to Docker Desktop.
Comment on lines +212 to +216
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note says the 3-node distributed cluster compose uses Docker bridge networking and works on Docker Desktop, but the current docker/docker-compose-3pd-3store-3server.yml still uses network_mode: host. Please update this statement (or update the compose file) so the root README doesn’t promise cross-platform behavior that the shipped compose file can’t provide.

Copilot uses AI. Check for mistakes.

> **Note**: Docker images are convenience releases, not **official ASF distribution artifacts**. See [ASF Release Distribution Policy](https://infra.apache.org/release-distribution.html#dockerhub) for details.
>
Expand Down
259 changes: 259 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# HugeGraph Docker Deployment

This directory contains Docker Compose files for running HugeGraph:

| File | Description |
|------|-------------|
| `docker-compose.yml` | Single-node cluster using pre-built images from Docker Hub |
| `docker-compose-dev.yml` | Single-node cluster built from source (for developers) |
| `docker-compose-3pd-3store-3server.yml` | 3-node distributed cluster (PD + Store + Server) |
Comment on lines +5 to +9
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker-compose-dev.yml is referenced in the file table, but it doesn't exist under docker/ in the repo. Either add the missing compose file or remove/update these references so users don't follow a broken path.

Copilot uses AI. Check for mistakes.

## Prerequisites

- **Docker Engine** 20.10+ (or Docker Desktop 4.x+)
- **Docker Compose** v2 (included in Docker Desktop)
- **Memory**: Allocate at least **12 GB** to Docker Desktop (Settings → Resources → Memory). The 3-node cluster runs 9 JVM processes (3 PD + 3 Store + 3 Server) which are memory-intensive. Insufficient memory causes OOM kills that appear as silent Raft failures.

> [!IMPORTANT]
> The 12 GB minimum is for Docker Desktop (Mac/Windows). On Linux with native Docker, ensure the host has at least 12 GB of free memory.

> [!WARNING]
> **Temporary workaround — source clone currently required.** The `docker-compose.yml` (quickstart) and `docker-compose-3pd-3store-3server.yml` compose files currently mount entrypoint scripts directly from the source tree because the published Docker Hub images do not yet include the updated entrypoints. This means these compose files currently require a full repository clone to run. This requirement will be removed in a follow-up image release once updated images are published to Docker Hub with the new entrypoints baked in. The `docker-compose-dev.yml` (dev build) is unaffected since it builds images from source.

Comment on lines +20 to +22
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning says the compose files mount updated entrypoint scripts from the source tree, but in the current repo the compose files mount config files under docker/configs/ (e.g. ./configs/application-pd0.yml:/hugegraph-pd/conf/application.yml) and don’t reference entrypoint script mounts. Please adjust this text (or update the compose files) so the workaround description matches what users actually need to mount.

Copilot uses AI. Check for mistakes.
## Why Bridge Networking (Not Host Mode)

Previous versions used `network_mode: host`, which only works on Linux and is incompatible with Docker Desktop on Mac/Windows. The cluster now uses a proper Docker bridge network (`hg-net`) where services communicate via container hostnames (`pd0`, `pd1`, `store0`, etc.) instead of `127.0.0.1`. This makes the cluster portable across all platforms.

Comment on lines +23 to +26
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section states the cluster now uses a bridge network (hg-net) and container hostnames, but the current docker/docker-compose.yml and docker/docker-compose-3pd-3store-3server.yml still configure network_mode: host. Please align the documentation with the actual compose files (or update the compose files in the same PR) to avoid misleading Docker Desktop users.

Copilot uses AI. Check for mistakes.
---

## Single-Node Setup

Two compose files are available for running a single-node cluster (1 PD + 1 Store + 1 Server):

### Option A: Quick Start (pre-built images)

Uses pre-built images from Docker Hub. Best for **end users** who want to run HugeGraph quickly.

```bash
cd docker
docker compose up -d
```

- Images: `hugegraph/pd:latest`, `hugegraph/store:latest`, `hugegraph/server:latest`
- `pull_policy: always` — always pulls the latest image
- PD healthcheck endpoint: `/` (root)
- Single PD, single Store (`HG_PD_INITIAL_STORE_LIST: store:8500`), single Server
- No `STORE_REST` or `wait-partition.sh` — simpler startup

### Option B: Development Build (build from source)

Builds images locally from source Dockerfiles. Best for **developers** who want to test local changes.

```bash
cd docker
docker compose -f docker-compose-dev.yml up -d
```

- Images: built from source via `build: context: ..` with Dockerfiles
- No `pull_policy` — builds locally, doesn't pull
- Entrypoint scripts are baked into the built image (no volume mounts)
- PD healthcheck endpoint: `/v1/health`
- Otherwise identical env vars and structure to the quickstart file

### Key Differences

| | `docker-compose.yml` (quickstart) | `docker-compose-dev.yml` (dev build) |
|---|---|---|
| **Images** | Pull from Docker Hub | Build from source |
| **Who it's for** | End users | Developers |
| **pull_policy** | `always` | not set (build) |

**Verify** (both options):
```bash
curl http://localhost:8080/versions
```

---

## 3-Node Cluster Quickstart

```bash
cd docker
docker compose -f docker-compose-3pd-3store-3server.yml up -d
```

**Startup ordering** is enforced via `depends_on` with `condition: service_healthy`:

1. **PD nodes** start first and must pass healthchecks (`/v1/health`)
2. **Store nodes** start after all PD nodes are healthy
3. **Server nodes** start after all Store nodes are healthy

This ensures Raft leader election and partition assignment complete before dependent services attempt connections.

**Verify the cluster is healthy**:

```bash
# Check PD health
curl http://localhost:8620/v1/health

# Check Store health
curl http://localhost:8520/v1/health

# Check Server (Graph API)
curl http://localhost:8080/versions

# List registered stores via PD
curl http://localhost:8620/v1/stores

# List partitions
curl http://localhost:8620/v1/partitions
```

---

## Environment Variable Reference

Configuration is injected via environment variables. The old `docker/configs/application-pd*.yml` and `docker/configs/application-store*.yml` files are no longer used.

Comment on lines +114 to +117
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states docker/configs/application-pd*.yml and docker/configs/application-store*.yml are no longer used, but docker/docker-compose-3pd-3store-3server.yml still mounts those files into the PD/Store containers. Please correct the documentation or update the compose files to actually stop using docker/configs/.

Copilot uses AI. Check for mistakes.
### PD Environment Variables
Comment on lines +114 to +118
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section claims configuration is injected via HG_* env vars, but the current docker entrypoints in this repo (e.g. hugegraph-pd/hg-pd-dist/docker/docker-entrypoint.sh, hugegraph-store/hg-store-dist/docker/docker-entrypoint.sh, hugegraph-server/hugegraph-dist/docker/docker-entrypoint.sh) don’t read HG_* variables and just start the services. Please update the docs to match the current images/entrypoints, or include the corresponding entrypoint changes that implement HG_* injection.

Copilot uses AI. Check for mistakes.

| Variable | Required | Default | Maps To (`application.yml`) | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_PD_GRPC_HOST` | Yes | — | `grpc.host` | This node's hostname/IP for gRPC |
| `HG_PD_RAFT_ADDRESS` | Yes | — | `raft.address` | This node's Raft address (e.g. `pd0:8610`) |
| `HG_PD_RAFT_PEERS_LIST` | Yes | — | `raft.peers-list` | All PD peers (e.g. `pd0:8610,pd1:8610,pd2:8610`) |
| `HG_PD_INITIAL_STORE_LIST` | Yes | — | `pd.initial-store-list` | Expected stores (e.g. `store0:8500,store1:8500,store2:8500`) |
| `HG_PD_GRPC_PORT` | No | `8686` | `grpc.port` | gRPC server port |
| `HG_PD_REST_PORT` | No | `8620` | `server.port` | REST API port |
| `HG_PD_DATA_PATH` | No | `/hugegraph-pd/pd_data` | `pd.data-path` | Metadata storage path |
| `HG_PD_INITIAL_STORE_COUNT` | No | `1` | `pd.initial-store-count` | Min stores for cluster availability |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `GRPC_HOST` | `HG_PD_GRPC_HOST` |
| `RAFT_ADDRESS` | `HG_PD_RAFT_ADDRESS` |
| `RAFT_PEERS` | `HG_PD_RAFT_PEERS_LIST` |
| `PD_INITIAL_STORE_LIST` | `HG_PD_INITIAL_STORE_LIST` |

Comment on lines +131 to +139
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tables claim the deprecated env var aliases “still work but log a warning”, but there’s no deprecation/alias handling in the current docker entrypoint scripts in this repo (they don’t parse env vars at all). Please remove the warning claim or update it to match the actual behavior implemented by the images.

Copilot uses AI. Check for mistakes.
### Store Environment Variables

| Variable | Required | Default | Maps To (`application.yml`) | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_STORE_PD_ADDRESS` | Yes | — | `pdserver.address` | PD gRPC addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
| `HG_STORE_GRPC_HOST` | Yes | — | `grpc.host` | This node's hostname (e.g. `store0`) |
| `HG_STORE_RAFT_ADDRESS` | Yes | — | `raft.address` | This node's Raft address (e.g. `store0:8510`) |
| `HG_STORE_GRPC_PORT` | No | `8500` | `grpc.port` | gRPC server port |
| `HG_STORE_REST_PORT` | No | `8520` | `server.port` | REST API port |
| `HG_STORE_DATA_PATH` | No | `/hugegraph-store/storage` | `app.data-path` | Data storage path |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `PD_ADDRESS` | `HG_STORE_PD_ADDRESS` |
| `GRPC_HOST` | `HG_STORE_GRPC_HOST` |
| `RAFT_ADDRESS` | `HG_STORE_RAFT_ADDRESS` |

### Server Environment Variables

| Variable | Required | Default | Maps To | Description |
|----------|----------|---------|-----------------------------|-------------|
| `HG_SERVER_BACKEND` | Yes | — | `backend` in `hugegraph.properties` | Storage backend (e.g. `hstore`) |
| `HG_SERVER_PD_PEERS` | Yes | — | `pd.peers` | PD cluster addresses (e.g. `pd0:8686,pd1:8686,pd2:8686`) |
| `STORE_REST` | No | — | Used by `wait-partition.sh` | Store REST endpoint for partition verification (e.g. `store0:8520`) |
| `PASSWORD` | No | — | Enables auth mode | Optional authentication password |

**Deprecated aliases** (still work but log a warning):

| Deprecated | Use Instead |
|------------|-------------|
| `BACKEND` | `HG_SERVER_BACKEND` |
| `PD_PEERS` | `HG_SERVER_PD_PEERS` |

---

## Port Reference

| Service | Container Port | Host Port | Protocol | Purpose |
|---------|---------------|-----------|----------|---------|
| pd0 | 8620 | 8620 | HTTP | REST API |
| pd0 | 8686 | 8686 | gRPC | PD gRPC |
| pd0 | 8610 | — | TCP | Raft (internal only) |
| pd1 | 8620 | 8621 | HTTP | REST API |
| pd1 | 8686 | 8687 | gRPC | PD gRPC |
| pd2 | 8620 | 8622 | HTTP | REST API |
| pd2 | 8686 | 8688 | gRPC | PD gRPC |
| store0 | 8500 | 8500 | gRPC | Store gRPC |
| store0 | 8510 | 8510 | TCP | Raft |
| store0 | 8520 | 8520 | HTTP | REST API |
| store1 | 8500 | 8501 | gRPC | Store gRPC |
| store1 | 8510 | 8511 | TCP | Raft |
| store1 | 8520 | 8521 | HTTP | REST API |
| store2 | 8500 | 8502 | gRPC | Store gRPC |
| store2 | 8510 | 8512 | TCP | Raft |
| store2 | 8520 | 8522 | HTTP | REST API |
| server0 | 8080 | 8080 | HTTP | Graph API |
| server1 | 8080 | 8081 | HTTP | Graph API |
| server2 | 8080 | 8082 | HTTP | Graph API |

---

## Healthcheck Endpoints

| Service | Endpoint | Expected |
|---------|----------|----------|
| PD | `GET /v1/health` | `200 OK` |
| Store | `GET /v1/health` | `200 OK` |
| Server | `GET /versions` | `200 OK` with version JSON |

---

## Troubleshooting

### Containers Exiting or Restarting (OOM Kills)

**Symptom**: Containers exit with code 137, or restart loops. Raft logs show election timeouts.

**Cause**: Docker Desktop does not have enough memory. The 9 JVM processes require at least 12 GB.

**Fix**: Docker Desktop → Settings → Resources → Memory → set to **12 GB** or higher. Restart Docker Desktop.

```bash
# Check if containers were OOM killed
docker inspect hg-pd0 | grep -i oom
docker stats --no-stream
```

### Raft Leader Election Failure

**Symptom**: PD logs show repeated `Leader election timeout`. Store nodes cannot register.

**Cause**: PD nodes cannot reach each other on the Raft port (8610), or `HG_PD_RAFT_PEERS_LIST` is misconfigured.

**Fix**:
1. Verify all PD containers are running: `docker compose -f docker-compose-3pd-3store-3server.yml ps`
2. Check PD logs: `docker logs hg-pd0`
3. Verify network connectivity: `docker exec hg-pd0 ping pd1`
4. Ensure `HG_PD_RAFT_PEERS_LIST` is identical on all PD nodes

### Partition Assignment Not Completing

**Symptom**: Server starts but graph operations fail. Store logs show `partition not found`.

**Cause**: PD has not finished assigning partitions to stores, or stores did not register successfully.

**Fix**:
1. Check registered stores: `curl http://localhost:8620/v1/stores`
2. Check partition status: `curl http://localhost:8620/v1/partitions`
3. Wait for partition assignment (can take 1–3 minutes after all stores register)
4. Check server logs for the `wait-partition.sh` script output: `docker logs hg-server0`

### Connection Refused Errors

**Symptom**: Stores cannot connect to PD, or Server cannot connect to Store.

**Cause**: Services are using `127.0.0.1` instead of container hostnames, or the `hg-net` bridge network is misconfigured.

**Fix**: Ensure all `HG_*` env vars use container hostnames (`pd0`, `store0`, etc.), not `127.0.0.1` or `localhost`.
4 changes: 2 additions & 2 deletions hugegraph-pd/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ store:
### Common Configuration Errors

1. **Raft peer discovery failure**: `raft.peers-list` must include all PD nodes' `raft.address` values
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments
2. **Store connection issues**: `grpc.host` must be a reachable IP (not `127.0.0.1`) for distributed deployments. In Docker bridge networking, use the container hostname (e.g., `pd0`) set via `HG_PD_GRPC_HOST` env var.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggests using HG_PD_GRPC_HOST for Docker bridge networking, but the current PD code/entrypoint in this repo doesn’t consume HG_PD_ variables. Please ensure the docs reference the env var names that are actually supported by the published images (or land the entrypoint changes that implement HG_PD_*).

Copilot uses AI. Check for mistakes.
3. **Split-brain scenarios**: Always run 3 or 5 PD nodes in production for Raft quorum
4. **Partition imbalance**: Adjust `patrol-interval` for faster/slower rebalancing

Expand Down Expand Up @@ -331,7 +331,7 @@ docker run -d -p 8620:8620 -p 8686:8686 -p 8610:8610 \
hugegraph-pd:latest

# For production clusters, use Docker Compose or Kubernetes
# See: hugegraph-server/hugegraph-dist/docker/example/
# See: docker/docker-compose-3pd-3store-3server.yml and docker/README.md
```

Exposed ports: 8620 (REST), 8686 (gRPC), 8610 (Raft)
Expand Down
39 changes: 36 additions & 3 deletions hugegraph-pd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,36 @@ raft:

For detailed configuration options and production tuning, see [Configuration Guide](docs/configuration.md).

#### Docker Bridge Network Example

When running PD in Docker with bridge networking (e.g., `docker/docker-compose-3pd-3store-3server.yml`), configuration is injected via environment variables instead of editing `application.yml` directly. Container hostnames are used instead of IP addresses:

Comment on lines +159 to +160
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says PD configuration is injected via HG_PD_* environment variables in Docker bridge mode, but the current hugegraph-pd/hg-pd-dist/docker/docker-entrypoint.sh doesn’t process these env vars (it just starts PD). Please adjust the doc to match the current image behavior, or land the entrypoint changes that implement HG_PD_* parsing alongside this doc update.

Copilot uses AI. Check for mistakes.
**pd0** container:
```bash
HG_PD_GRPC_HOST=pd0
HG_PD_RAFT_ADDRESS=pd0:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

**pd1** container:
```bash
HG_PD_GRPC_HOST=pd1
HG_PD_RAFT_ADDRESS=pd1:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

**pd2** container:
```bash
HG_PD_GRPC_HOST=pd2
HG_PD_RAFT_ADDRESS=pd2:8610
HG_PD_RAFT_PEERS_LIST=pd0:8610,pd1:8610,pd2:8610
HG_PD_INITIAL_STORE_LIST=store0:8500,store1:8500,store2:8500
```

See [docker/README.md](../docker/README.md) for the full environment variable reference.

### Verify Deployment

Check if PD is running:
Expand Down Expand Up @@ -210,15 +240,18 @@ docker run -d \
-p 8620:8620 \
-p 8686:8686 \
-p 8610:8610 \
-v /path/to/conf:/hugegraph-pd/conf \
-e HG_PD_GRPC_HOST=<your-ip> \
-e HG_PD_RAFT_ADDRESS=<your-ip>:8610 \
-e HG_PD_RAFT_PEERS_LIST=<your-ip>:8610 \
-e HG_PD_INITIAL_STORE_LIST=<store-ip>:8500 \
-v /path/to/data:/hugegraph-pd/pd_data \
--name hugegraph-pd \
hugegraph-pd:latest
hugegraph/pd:latest
```

For Docker Compose examples with HugeGraph Store and Server, see:
```
hugegraph-server/hugegraph-dist/docker/example/
docker/docker-compose-3pd-3store-3server.yml
```

## Documentation
Expand Down
Loading
Loading