-
Notifications
You must be signed in to change notification settings - Fork 19
Add collection export commands (create, get, cancel). #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
a233402
Add collection export commands (create, get, cancel).
jfrancoa f645a45
Adapt to latest UX changes.
jfrancoa d1a13b1
Adapt to path removal from python-client.
jfrancoa 3e73a6e
Address PR review feedback (round 1)
jfrancoa 8746028
Pin RC version for dev/1.37
jfrancoa bd8150b
Update weaviate-client version to use latest 4.21.0
jfrancoa 182a7a5
Fix test_create_export_with_exclude to leave an exportable collection
jfrancoa feb44f6
Address PR review feedback (round 4)
jfrancoa 8458b99
Initial plan for addressing export manager review feedback
Copilot 9c8c8c5
Fix backend enum serialization and export_id test assertions in Expor…
Copilot f8fea82
Revert result.backend.value access — backend is a str, not an enum
jfrancoa a34c857
Reflect actual export status in create_export JSON output
jfrancoa 065c6fc
Address PR review feedback (round 5)
jfrancoa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
.claude/skills/operating-weaviate-cli/references/exports.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Collection Export Reference | ||
|
|
||
| Export collections from Weaviate to external storage backends in Parquet format. | ||
|
|
||
| ## Create Export | ||
| ```bash | ||
| weaviate-cli create export-collection --export_id my-export --backend s3 --file_format parquet --wait --json | ||
| weaviate-cli create export-collection --export_id my-export --backend s3 --include "Movies,Books" --json | ||
| weaviate-cli create export-collection --export_id my-export --backend gcs --exclude "TempData" --json | ||
| ``` | ||
|
jfrancoa marked this conversation as resolved.
jfrancoa marked this conversation as resolved.
|
||
|
|
||
| ## Check Export Status | ||
| ```bash | ||
| weaviate-cli get export-collection --export_id my-export --backend s3 --json | ||
| ``` | ||
|
|
||
| Returns shard-level progress including objects exported per shard, errors, and timing. | ||
|
|
||
| ## Cancel Export | ||
| ```bash | ||
| weaviate-cli cancel export-collection --export_id my-export --backend s3 --json | ||
| ``` | ||
|
|
||
| Only works while the export is in progress. Returns an error if the export has already completed. | ||
|
|
||
| ## Options | ||
|
|
||
| **Create:** | ||
| - `--export_id` -- Export identifier (default: "test-export") | ||
| - `--backend` -- filesystem, s3, gcs, azure (default: filesystem) | ||
| - `--file_format` -- Export format: parquet (default: parquet) | ||
| - `--include` -- Comma-separated collections to include | ||
| - `--exclude` -- Comma-separated collections to exclude | ||
| - `--wait` -- Wait for completion | ||
|
|
||
| **Get Status:** | ||
| - `--export_id`, `--backend` -- Same as create | ||
|
|
||
| **Cancel:** | ||
| - `--export_id`, `--backend` -- Same as create | ||
|
|
||
|
jfrancoa marked this conversation as resolved.
|
||
| ## Prerequisites | ||
|
|
||
| 1. The export backend must be configured on the Weaviate cluster | ||
| 2. For local-k8s, deploy with `COLLECTION_EXPORT=true` (provisions MinIO, creates the `weaviate-export` bucket, and sets `EXPORT_DEFAULT_BUCKET`) | ||
| 3. `--include` and `--exclude` are mutually exclusive | ||
|
|
||
| ## Notes | ||
|
|
||
| - `--wait` blocks until the export completes (SUCCESS, FAILED, or CANCELED) | ||
| - Without `--wait`, the command returns immediately with status STARTED | ||
| - Poll progress with `get export-collection` to monitor shard-level status | ||
| - Export uses the same storage backends as backups (S3, GCS, Azure, filesystem) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| weaviate-client>=4.20.4 | ||
| weaviate-client>=4.21.0 | ||
| click==8.1.7 | ||
| twine | ||
| pytest | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,196 @@ | ||
| import json | ||
| import click | ||
| import pytest | ||
| import weaviate | ||
| from weaviate_cli.managers.collection_manager import CollectionManager | ||
| from weaviate_cli.managers.config_manager import ConfigManager | ||
| from weaviate_cli.managers.data_manager import DataManager | ||
| from weaviate_cli.managers.export_manager import ExportManager | ||
|
|
||
|
|
||
|
jfrancoa marked this conversation as resolved.
|
||
| EXPORT_COLLECTION = "ExportTestCollection" | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def client() -> weaviate.WeaviateClient: | ||
| config = ConfigManager() | ||
| return config.get_client() | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def collection_manager(client: weaviate.WeaviateClient) -> CollectionManager: | ||
| return CollectionManager(client) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def data_manager(client: weaviate.WeaviateClient) -> DataManager: | ||
| return DataManager(client) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def export_manager(client: weaviate.WeaviateClient) -> ExportManager: | ||
| return ExportManager(client) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def setup_collection(collection_manager, data_manager): | ||
| """Create a collection with data for export tests.""" | ||
| try: | ||
| collection_manager.create_collection( | ||
| collection=EXPORT_COLLECTION, | ||
| replication_factor=1, | ||
| vectorizer="none", | ||
| force_auto_schema=True, | ||
| ) | ||
| data_manager.create_data( | ||
| collection=EXPORT_COLLECTION, | ||
| limit=100, | ||
| randomize=True, | ||
| consistency_level="one", | ||
| ) | ||
| yield | ||
| finally: | ||
| if collection_manager.client.collections.exists(EXPORT_COLLECTION): | ||
| collection_manager.delete_collection(collection=EXPORT_COLLECTION) | ||
|
|
||
|
|
||
| def test_create_export_and_get_status( | ||
| export_manager: ExportManager, setup_collection, capsys | ||
| ): | ||
| """Test creating an export and getting its status.""" | ||
| export_manager.create_export( | ||
| export_id="integration-test-export", | ||
| backend="s3", | ||
| file_format="parquet", | ||
| include=EXPORT_COLLECTION, | ||
| wait=True, | ||
| json_output=False, | ||
| ) | ||
|
|
||
| out = capsys.readouterr().out | ||
| assert "integration-test-export" in out | ||
| assert "created successfully" in out | ||
|
|
||
| export_manager.get_export_status( | ||
| export_id="integration-test-export", | ||
| backend="s3", | ||
| json_output=True, | ||
| ) | ||
|
|
||
| out = capsys.readouterr().out | ||
| data = json.loads(out) | ||
| assert data["export_id"] == "integration-test-export" | ||
| assert data["status"] == "SUCCESS" | ||
| assert EXPORT_COLLECTION in data["collections"] | ||
| assert "shard_status" in data | ||
|
|
||
|
|
||
| def test_create_export_json_output( | ||
| export_manager: ExportManager, setup_collection, capsys | ||
| ): | ||
| """Test creating an export with JSON output.""" | ||
| export_manager.create_export( | ||
| export_id="integration-json-export", | ||
| backend="s3", | ||
| file_format="parquet", | ||
| wait=True, | ||
| json_output=True, | ||
| ) | ||
|
|
||
| out = capsys.readouterr().out | ||
| data = json.loads(out) | ||
| assert data["status"] == "SUCCESS" | ||
| assert data["export_id"] == "integration-json-export" | ||
|
|
||
|
|
||
| def test_create_export_with_exclude( | ||
| export_manager: ExportManager, | ||
| collection_manager: CollectionManager, | ||
| data_manager: DataManager, | ||
| setup_collection, | ||
| capsys, | ||
| ): | ||
| """Test creating an export with exclude filter. | ||
|
|
||
| Creates a second collection so that excluding it still leaves | ||
| EXPORT_COLLECTION exportable (the server rejects an export with no | ||
| exportable classes). | ||
| """ | ||
| second_collection = "ExportTestCollection_Excluded" | ||
| try: | ||
| collection_manager.create_collection( | ||
| collection=second_collection, | ||
| replication_factor=1, | ||
| vectorizer="none", | ||
| force_auto_schema=True, | ||
| ) | ||
| data_manager.create_data( | ||
| collection=second_collection, | ||
| limit=10, | ||
| randomize=True, | ||
| consistency_level="one", | ||
| ) | ||
| capsys.readouterr() # Clear setup output | ||
|
|
||
| export_manager.create_export( | ||
| export_id="integration-exclude-export", | ||
| backend="s3", | ||
| file_format="parquet", | ||
| exclude=second_collection, | ||
| wait=True, | ||
| json_output=True, | ||
| ) | ||
|
|
||
| out = capsys.readouterr().out | ||
| data = json.loads(out) | ||
| assert data["status"] == "SUCCESS" | ||
| assert second_collection not in data.get("collections", []) | ||
| assert EXPORT_COLLECTION in data.get("collections", []) | ||
| finally: | ||
| if collection_manager.client.collections.exists(second_collection): | ||
| collection_manager.delete_collection(collection=second_collection) | ||
|
|
||
|
|
||
| def test_create_export_include_and_exclude_raises( | ||
| export_manager: ExportManager, setup_collection | ||
| ): | ||
| """Test that specifying both include and exclude raises an error.""" | ||
| with pytest.raises(click.ClickException) as exc_info: | ||
| export_manager.create_export( | ||
| export_id="should-fail", | ||
| backend="s3", | ||
| file_format="parquet", | ||
| include=EXPORT_COLLECTION, | ||
| exclude="OtherCollection", | ||
| ) | ||
| assert "include" in str(exc_info.value).lower() | ||
| assert "exclude" in str(exc_info.value).lower() | ||
|
|
||
|
|
||
| def test_cancel_export(export_manager: ExportManager, setup_collection, capsys): | ||
| """Test canceling an export.""" | ||
| # Create export without waiting | ||
| export_manager.create_export( | ||
| export_id="integration-cancel-export", | ||
| backend="s3", | ||
| file_format="parquet", | ||
| wait=False, | ||
| ) | ||
| capsys.readouterr() # Clear output | ||
|
|
||
| # Try to cancel — may succeed or fail depending on timing. Only tolerate | ||
| # the specific "could not be canceled" path (export already finished); | ||
| # anything else is a real failure. | ||
| try: | ||
| export_manager.cancel_export( | ||
| export_id="integration-cancel-export", | ||
| backend="s3", | ||
| json_output=True, | ||
| ) | ||
| except click.ClickException as e: | ||
| assert "could not be canceled" in str(e) | ||
| return | ||
|
|
||
| out = capsys.readouterr().out | ||
| data = json.loads(out) | ||
| assert data["status"] == "success" | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.