Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,7 @@ tests/data_*.h5
tests/data_*/
tests/tmp.*
tests/.coverage

# local dev artifact
uv.lock
.venv/
142 changes: 142 additions & 0 deletions skills/dpdata-driver/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
name: dpdata-driver
description: Use dpdata Python Driver plugins to label systems (energies/forces/virials) via System.predict(), list available drivers, and build Driver objects (ase/deepmd/gaussian/sqm/hybrid). Use when working with dpdata Python API (not CLI) and you need driver-based energy/force prediction, plugin registration keys, or examples of using dpdata with ASE calculators or DeePMD models.
---

# dpdata-driver

Use dpdata “driver plugins” to **label** a `dpdata.System` (predict energies/forces/virials) and obtain a `dpdata.LabeledSystem`.

## Key idea

- A **Driver** converts an unlabeled `System` into a `LabeledSystem` by computing:
- `energies` (required)
- `forces` (optional but common)
- `virials` (optional)

In dpdata, this is exposed as:

- `System.predict(*args, driver="dp", **kwargs) -> LabeledSystem`

`driver` can be:

- a **string key** (plugin name), e.g. `"ase"`, `"dp"`, `"gaussian"`
- a **Driver object**, e.g. `Driver.get_driver("ase")(...)`

## List supported driver keys (runtime)

When unsure what drivers exist in *this* dpdata version/env, query them at runtime:

```python
from dpdata.driver import Driver

print(sorted(Driver.get_drivers().keys()))
```

In the current repo state, keys include:

- `ase`
- `dp` / `deepmd` / `deepmd-kit`
- `gaussian`
- `sqm`
- `hybrid`

(Exact set depends on dpdata version and installed extras.)

## Minimal workflow

```python
import dpdata
from dpdata.system import System

sys = System("input.xyz", fmt="xyz")
ls = sys.predict(driver="ase", calculator=...) # returns dpdata.LabeledSystem
```

### Verify you got a labeled system

```python
assert "energies" in ls.data
# optional:
# assert "forces" in ls.data
# assert "virials" in ls.data
```

## Example: use the ASE driver with an ASE calculator (runnable)

This is the easiest *fully runnable* example because it doesn’t require external QM software.

Dependencies (recommended): use `uv`.

Option A (one-off invocation):

```bash
uv run --with dpdata --with numpy --with ase python3 your_script.py
```

Option B (recommended for shareable scripts): declare dependencies in the script via inline metadata, then run `uv run script.py`.
See: https://docs.astral.sh/uv/guides/scripts/#inline-metadata

Script:

```python
import numpy as np
from ase.calculators.emt import EMT
from dpdata.system import System

# write a tiny molecule
open("tmp.xyz", "w").write("""2\n\nH 0 0 0\nH 0 0 0.74\n""")

sys = System("tmp.xyz", fmt="xyz")
ls = sys.predict(driver="ase", calculator=EMT())

print("energies", np.array(ls.data["energies"]))
print("forces shape", np.array(ls.data["forces"]).shape)
if "virials" in ls.data:
print("virials shape", np.array(ls.data["virials"]).shape)
else:
print("virials: <not provided by this driver/calculator>")
```

## Example: pass a Driver object instead of a string

```python
from ase.calculators.emt import EMT
from dpdata.driver import Driver
from dpdata.system import System

sys = System("tmp.xyz", fmt="xyz")
ase_driver = Driver.get_driver("ase")(calculator=EMT())
ls = sys.predict(driver=ase_driver)
```

## Hybrid driver

Use `driver="hybrid"` to sum energies/forces/virials from multiple drivers.

The `HybridDriver` accepts `drivers=[ ... ]` where each item is either:

- a `Driver` instance
- a dict like `{"type": "sqm", ...}` (type is the driver key)

Example (structure only; may require external executables):

```python
from dpdata.driver import Driver

hyb = Driver.get_driver("hybrid")(
drivers=[
{"type": "sqm", "qm_theory": "DFTB3"},
{"type": "dp", "dp": "frozen_model.pb"},
]
)
# ls = sys.predict(driver=hyb)
```

## Notes / gotchas

- Many drivers require extra dependencies or external programs:
- `dp` requires `deepmd-kit` + a model file
- `gaussian` requires Gaussian and a valid executable (default `g16`)
- `sqm` requires AmberTools `sqm`
- If you just need file format conversion, use the existing **dpdata CLI** skill instead.
113 changes: 113 additions & 0 deletions skills/dpdata-plugin/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
---
name: dpdata-plugin
description: Create and install dpdata plugins (especially custom Format readers/writers) using Format.register(...) and pyproject.toml entry_points under 'dpdata.plugins'. Use when extending dpdata with new formats or distributing plugins as separate Python packages.
---

# dpdata-plugin

dpdata loads plugins in two ways:

1. **Built-in plugins** in `dpdata.plugins.*` (imported automatically)
1. **External plugins** exposed via Python package entry points: `dpdata.plugins`

This skill focuses on **external plugin packages**, the recommended way to add new formats without modifying dpdata itself.

## What can be extended?

Most commonly: add a new **Format** (file reader/writer) via:

```python
from dpdata.format import Format


@Format.register("myfmt")
class MyFormat(Format): ...
```

## How dpdata discovers plugins

dpdata imports `dpdata.plugins` during normal use (e.g. `dpdata.system` imports it). That module:

- imports every built-in module in `dpdata/plugins/*.py`
- then loads all **entry points** in group `dpdata.plugins`

So an external plugin package only needs to ensure that importing the entry-point target triggers the `@Format.register(...)` side effects.

## Minimal external plugin package (based on plugin_example/)

### 1) Create a new Python package

Example layout:

```text
dpdata_random/
pyproject.toml
dpdata_random/
__init__.py
```

### 2) Implement and register your Format

In `dpdata_random/__init__.py` (shortened example):

```python
from __future__ import annotations

import numpy as np
from dpdata.format import Format


@Format.register("random")
class RandomFormat(Format):
def from_system(self, N, **kwargs):
return {
"atom_numbs": [20],
"atom_names": ["X"],
"atom_types": np.zeros(20, dtype=int),
"cells": np.repeat(np.eye(3)[None, ...], N, axis=0) * 100.0,
"coords": np.random.rand(N, 20, 3) * 100.0,
"orig": np.zeros(3),
"nopbc": False,
}
```

Return dicts must match dpdata’s expected schema (cells/coords/atom_names/atom_types/...).

### 3) Expose an entry point

In `pyproject.toml`:

```toml
[project]
name = "dpdata_random"
version = "0.0.0"
dependencies = ["numpy", "dpdata"]

[project.entry-points.'dpdata.plugins']
random = "dpdata_random:RandomFormat"
```

Any importable target works; this pattern points directly at the class.

### 4) Install and test

In a clean env (recommended via `uv`):

```bash
uv run --with dpdata --with numpy python3 - <<'PY'
import dpdata
from dpdata.format import Format

# importing dpdata will load entry points (dpdata.plugins)
print('random' in Format.get_formats())
PY
```

If it prints `True`, your plugin was discovered.

## Debug checklist

- Did you install the plugin package into the same environment where you run dpdata?
- Does `pyproject.toml` contain `[project.entry-points.'dpdata.plugins']`?
- Does importing the entry point module/class execute the `@Format.register(...)` decorator?
- If using `uv run`, remember each command runs in its own environment unless you’re in a `uv` project (or you rely on `uv run --with ...`).
Loading