Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/run-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
platform:
- ubuntu-latest
- macos-latest
- windows-latest
# - windows-latest
runs-on: ${{ matrix.platform }}
name: Python ${{ matrix.python }}, ${{ matrix.platform }}
steps:
Expand Down
6 changes: 2 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
# Changelog

## Version 0.1 (development)
## Version 0.0.1

- Feature A added
- FIX: nasty bug #1729 fixed
- add your changes here!
- Initial implementation to access Ehub's resources and load data using `rds2py`.
89 changes: 86 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,20 @@
[![PyPI-Server](https://img.shields.io/pypi/v/experimenthub.svg)](https://pypi.org/project/experimenthub/)
![Unit tests](https://github.com/YOUR_ORG_OR_USERNAME/experimenthub/actions/workflows/run-tests.yml/badge.svg)
![Unit tests](https://github.com/biocpy/experimenthub/actions/workflows/run-tests.yml/badge.svg)

# experimenthub

> Access Bioconductor's experimenthub resources
**ExperimentHub** provides an interface to access and manage data from the Bioconductor [ExperimentHub](https://bioconductor.org/packages/ExperimentHub/) service directly in Python.

A longer description of your project goes here...
It is designed to work within the **BiocPy** ecosystem, converting R data objects (like `SingleCellExperiment` or `SummarizedExperiment`) into their Python equivalents (e.g., `SummarizedExperiment`) using [rds2py](https://github.com/biocpy/rds2py).

> [!NOTE]
>
> This is an ***experimental*** package. It may not work with all RDS files from ExperimentHub.
> Currently, this package filters ExperimentHub resources to provide access to:
> - **File Formats:** `.rds`
> - **R Classes:** `SingleCellExperiment`, `SummarizedExperiment`, `RangedSummarizedExperiment`, `GRanges` etc
>
> Files are converted to their respective BiocPy representations or common Python formats.

## Install

Expand All @@ -15,6 +24,80 @@ To get started, install the package from [PyPI](https://pypi.org/project/experim
pip install experimenthub
```

## Usage

### Initialize the Registry

The registry manages the local cache of `ExperimentHub` metadata and resources. On the first run, it downloads the metadata database.

```py
from experimenthub import ExperimentHubRegistry

# Initialize the registry (downloads metadata if needed)
eh = ExperimentHubRegistry()
```

### Searching for Resources

ExperimentHub contains thousands of datasets. Use the `search()` method to find resources by title, description, or species.

```py
# Search for mouse-related datasets
results = eh.search("mus musculus")

# Print the first few matches
for record in results[:5]:
print(f"{record.ehub_id}: {record.title}")
# Output:
# EH1041: Brain scRNA-seq data, sample ...,
# EH1042: Brain scRNA-seq data, gene ...,
# ...
```

### Inspecting Metadata

You can retrieve detailed metadata for a specific ID.

```py
record = eh.get_record("EH4663")

print(f"Title: {record.title}")
print(f"Species: {record.species}")
print(f"Genome: {record.genome}")
print(f"Description: {record.description}")
print(f"R Class: {record.preparer_dataclass}")

## Output:
# Title: Lohoff biorXiv spatial coordinates (sample 2)
# Species: Mus musculus
# Genome: mm10
# Description: Cell spatial coordinates for sample 2 for the E8.5 seqFISH dataset from biorXiv
# R Class: character
```

### Loading Data

The `load()` method handles the download, caching, and loading of the dataset.

If the resource is an R data file (.rds) containing a supported Bioconductor object (e.g., `SingleCellExperiment`), it is automatically read and converted to an equivalent python object using rds2py.

```py
# Load a data.frame as an BiocFrame object
data = eh.load("EH4663")

print(data)
# BiocFrame with 8425 rows and 3 columns
# x y z
# <FloatList> <FloatList> <IntegerList>
# embryo1_Pos0_cell10_z5 0.7084368794499625 -2.7071263060540645 5
# embryo1_Pos0_cell100_z5 0.9763043488304248 -2.517971233335359 5
# embryo1_Pos0_cell101_z5 0.9749347757408557 -2.6739635081030855 5
# ... ... ...
# embryo1_Pos28_cell97_z5 -1.3992279805347039 3.1761928631722824 5
# embryo1_Pos28_cell98_z5 -1.389353519722718 3.1349508225406666 5
# embryo1_Pos28_cell99_z5 -1.394992277928857 2.5812717935734355 5
```

<!-- biocsetup-notes -->

## Note
Expand Down
24 changes: 15 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
# experimenthub

Access Bioconductor's experimenthub resources
**ExperimentHub** provides an interface to access and manage data from the Bioconductor [ExperimentHub](https://bioconductor.org/packages/ExperimentHub/) service directly in Python.

It is designed to work within the **BiocPy** ecosystem, converting R data objects (like `SingleCellExperiment` or `SummarizedExperiment`) into their Python equivalents (e.g., `SummarizedExperiment`) using [rds2py](https://github.com/biocpy/rds2py).

## Note

> This is the main page of your project's [Sphinx] documentation. It is
> formatted in [Markdown]. Add additional pages by creating md-files in
> `docs` or rst-files (formatted in [reStructuredText]) and adding links to
> them in the `Contents` section below.
> [!NOTE]
>
> This is an ***experimental*** package. It may not work with all RDS files from ExperimentHub.
> Currently, this package filters ExperimentHub resources to provide access to:
> - **File Formats:** `.rds`
> - **R Classes:** `SingleCellExperiment`, `SummarizedExperiment`, `RangedSummarizedExperiment`, `GRanges` etc
>
> Please check [Sphinx] and [MyST] for more information
> about how to document your project and how to configure your preferences.
> Files are converted to their respective BiocPy representations or common Python formats.

## Install

To get started, install the package from [PyPI](https://pypi.org/project/experimenthub/)

```bash
pip install experimenthub
```
## Contents

```{toctree}
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ version_scheme = "no-guess-dev"
[tool.ruff]
line-length = 120
src = ["src"]
exclude = ["tests"]
# exclude = ["tests"]
lint.extend-ignore = ["F821"]

[tool.ruff.lint.pydocstyle]
Expand Down
11 changes: 7 additions & 4 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,18 @@

[metadata]
name = experimenthub
description = Access Bioconductor's experimenthub resources
description = Access Bioconductors experimenthub resources
author = Jayaram Kancherla
author_email = jayaram.kancherla@gmail.com
license = MIT
license_files = LICENSE.txt
long_description = file: README.md
long_description_content_type = text/markdown; charset=UTF-8; variant=GFM
url = https://github.com/pyscaffold/pyscaffold/
url = https://github.com/biocpy/experimenthub
# Add here related links, for example:
project_urls =
Documentation = https://pyscaffold.org/
# Source = https://github.com/pyscaffold/pyscaffold/
Documentation = https://github.com/biocpy/experimenthub
Source = https://github.com/biocpy/experimenthub
# Changelog = https://pyscaffold.org/en/latest/changelog.html
# Tracker = https://github.com/pyscaffold/pyscaffold/issues
# Conda-Forge = https://anaconda.org/conda-forge/pyscaffold
Expand Down Expand Up @@ -49,6 +49,9 @@ package_dir =
# For more information, check out https://semver.org/.
install_requires =
importlib-metadata; python_version<"3.8"
rds2py
biocframe
pybiocfilecache


[options.packages.find]
Expand Down
3 changes: 3 additions & 0 deletions src/experimenthub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@
__version__ = "unknown"
finally:
del version, PackageNotFoundError

from .registry import ExperimentHubRegistry
from .record import ExperimentHubRecord
5 changes: 5 additions & 0 deletions src/experimenthub/_ehub.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__author__ = "Jayaram Kancherla"
__copyright__ = "Jayaram Kancherla"
__license__ = "MIT"

EHUB_METADATA_URL = "https://experimenthub.bioconductor.org/metadata/experimenthub.sqlite3"
53 changes: 53 additions & 0 deletions src/experimenthub/record.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from __future__ import annotations

from dataclasses import dataclass
from datetime import date, datetime
from typing import Optional

__author__ = "Jayaram Kancherla"
__copyright__ = "Jayaram Kancherla"
__license__ = "MIT"


@dataclass(frozen=True)
class ExperimentHubRecord:
"""Container for a single ExperimentHub entry."""

ehub_id: str
title: str
species: Optional[str]
taxonomy_id: Optional[str]
genome: Optional[str]
description: Optional[str]
url: str
release_date: Optional[date]
preparer_dataclass: Optional[str]

@classmethod
def from_db_row(cls, row: tuple) -> "ExperimentHubRecord":
"""Build a record from a database query row.

Expected row format:
(id, title, species, taxonomyid, genome, description, full_url, date_added, rdataclass)
"""
rid, title, species, tax_id, genome, desc, url, date_str, rdataclass = row
ehub_id = f"EH{rid}"

rel_date: Optional[date] = None
if date_str:
try:
rel_date = datetime.strptime(str(date_str).split(" ")[0], "%Y-%m-%d").date()
except ValueError:
pass

return cls(
ehub_id=ehub_id,
title=title or "",
species=species,
taxonomy_id=str(tax_id) if tax_id else None,
genome=genome,
description=desc,
url=url,
release_date=rel_date,
preparer_dataclass=rdataclass,
)
Loading