Conversation
…o check and test the training. To be able to download dataset in the function 'download_file_from_link' in requests.get() verify parameter should be specified as False. Note also that currently the run script on the data doesn't run as it fails to download data even if verify parameter set to False
…ll_to_dict and process_mat. I have also modified download_file_from_link by specifying verify=False in requests.get()
…is flag in the config file
- Implement TriangleClassifier utility for extracting and classifying triangles - Add create_triangle_classification_task() to generate 7-class role predictions - Support triangle task via dataset loader (task_type='triangle_task') - Add config section with triangle task settings (7 classes, 3 features) - Include comprehensive test suite for triangle classification pipeline Added traingle common neighbours classification task: predicting the number of neighbours based on triangle intra-data: - Dataset creation - Support triangle task via dataset loader (task_type='triangle_common_task') - Add config section - Include cmprehensive test suite
… to the for the datasets (use for the triangle options) as the classes are imbalanced. Ideally, need to add class weights to the model and opt for a focal loss
…sed the number of samples for downsampling
… TriangleClassifier into utils for reusability and created a new class inhereting from it to specify it for a123 task
…ion. Move the test class to appropriate file. Note, that the same changes were done in the PR #241 (they are duplicated here, as the script wouldn't run otherwise and would require additional adaptation to the old download_file_from_link function.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix prepared fixes for all 4 issues found in the latest run.
- ✅ Fixed: TestPipeline requires args, breaking pytest discovery
- Removed the required constructor argument from
TestPipelineand restored default dataset/model configuration so pytest can collect the class.
- Removed the required constructor argument from
- ✅ Fixed: Edge attributes misaligned after to_undirected sorting
- Edge weights are now indexed directly from
corrusing the finalto_undirectededge_indexorder, ensuring correct alignment.
- Edge weights are now indexed directly from
- ✅ Fixed: Download move logic uses wrong directory name
- The download reorganization now targets the extracted dataset directory name (
Auditory cortex data, with underscore fallback) instead ofself.name.
- The download reorganization now targets the extracted dataset directory name (
- ✅ Fixed: Loader uses parent path inconsistent with other loaders
- The A123 loader now uses
root=str(self.root_data_dir)so dataset storage matchesAbstractLoader.get_data_dir().
- The A123 loader now uses
Or push these changes by commenting:
@cursor push 2951dab532
Preview (2951dab532)
diff --git a/test/pipeline/test_pipeline.py b/test/pipeline/test_pipeline.py
--- a/test/pipeline/test_pipeline.py
+++ b/test/pipeline/test_pipeline.py
@@ -1,31 +1,29 @@
"""Test pipeline for a particular dataset and model."""
+import argparse
import sys
-import os
from pathlib import Path
+
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
import hydra
+
from test._utils.simplified_pipeline import run
-import argparse
+DATASET = "graph/MUTAG"
+MODELS = ["graph/gcn"]
+
+
class TestPipeline:
- """Test pipeline for a particular dataset and model.
+ """Test pipeline for a particular dataset and model."""
- Parameters
- ----------
- args : argparse.Namespace
- Command-line arguments containing dataset and models configuration.
- """
+ dataset = DATASET
+ models = MODELS
- def __init__(self, args):
- self.dataset = args.dataset
- self.models = args.models if isinstance(args.models, list) else [args.models]
-
def setup_method(self):
"""Setup method."""
hydra.core.global_hydra.GlobalHydra.instance().clear()
-
+
def test_pipeline(self):
"""Test pipeline."""
with hydra.initialize(config_path="../../configs", job_name="job"):
@@ -34,17 +32,18 @@
config_name="run.yaml",
overrides=[
f"model={MODEL}",
- f"dataset={self.dataset}", # IF YOU IMPLEMENT A LARGE DATASET WITH AN OPTION TO USE A SLICE OF IT, ADD BELOW THE CORRESPONDING OPTION
+ f"dataset={self.dataset}", # IF YOU IMPLEMENT A LARGE DATASET WITH AN OPTION TO USE A SLICE OF IT, ADD BELOW THE CORRESPONDING OPTION
"trainer.max_epochs=2",
"trainer.min_epochs=1",
"trainer.check_val_every_n_epoch=1",
"paths=test",
"callbacks=model_checkpoint",
],
- return_hydra_config=True
+ return_hydra_config=True,
)
run(cfg)
+
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Test Pipeline")
@@ -53,18 +52,22 @@
type=str,
required=True,
help="Path to data root directory",
- choices = ['graph/MUTAG', 'graph/a123'] # ADD YOUR DATASET HERE
+ choices=["graph/MUTAG", "graph/a123"], # ADD YOUR DATASET HERE
)
parser.add_argument(
"--models",
type=str,
- nargs='+',
+ nargs="+",
required=False,
default=["graph/gcn"],
help="Model(s) to use in the pipeline",
)
args = parser.parse_args()
- test_pipeline = TestPipeline(args)
\ No newline at end of file
+ test_pipeline = TestPipeline()
+ test_pipeline.dataset = args.dataset
+ test_pipeline.models = (
+ args.models if isinstance(args.models, list) else [args.models]
+ )
test_pipeline.setup_method()
- test_pipeline.test_pipeline()
+ test_pipeline.test_pipeline()
diff --git a/topobench/data/datasets/a123.py b/topobench/data/datasets/a123.py
--- a/topobench/data/datasets/a123.py
+++ b/topobench/data/datasets/a123.py
@@ -190,7 +190,9 @@
os.unlink(path)
# Move files from extracted "Auditory cortex data/" directory to raw_dir
- downloaded_dir = osp.join(folder, self.name)
+ downloaded_dir = osp.join(folder, dataset_key)
+ if not osp.exists(downloaded_dir):
+ downloaded_dir = osp.join(folder, dataset_key.replace(" ", "_"))
if osp.exists(downloaded_dir):
for file in os.listdir(downloaded_dir):
src = osp.join(downloaded_dir, file)
@@ -320,13 +322,9 @@
edge_index = torch.tensor(edge_index_np, dtype=torch.long)
# make undirected
edge_index = to_undirected(edge_index)
- # edge_attr: corresponding corr weights (for both directions, if made undirected)
- weights = corr[rows, cols]
- weights = (
- np.repeat(weights, 2)
- if edge_index.size(1) == weights.size * 2
- else weights
- )
+ # Align edge weights with the final (possibly sorted) edge order.
+ edge_index_np = edge_index.cpu().numpy()
+ weights = corr[edge_index_np[0], edge_index_np[1]]
edge_attr = torch.tensor(
weights.reshape(-1, 1), dtype=torch.float
)
diff --git a/topobench/data/loaders/graph/a123_loader.py b/topobench/data/loaders/graph/a123_loader.py
--- a/topobench/data/loaders/graph/a123_loader.py
+++ b/topobench/data/loaders/graph/a123_loader.py
@@ -91,9 +91,8 @@
# determine dataset name from parameters, fallback to expected id
name = self.parameters.data_name
- # root path for dataset: use the parent of root_data_dir since the dataset
- # constructs its own subdirectory based on name
- root = str(self.root_data_dir.parent)
+ # root path for dataset: align with AbstractLoader.get_data_dir().
+ root = str(self.root_data_dir)
# Construct dataset; A123CortexMDataset expects (root, name, parameters)
self.dataset = A123CortexMDataset(This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| """Test pipeline for a particular dataset and model.""" | ||
| def __init__(self, args): | ||
| self.dataset = args.dataset | ||
| self.models = args.models if isinstance(args.models, list) else [args.models] |
There was a problem hiding this comment.
TestPipeline requires args, breaking pytest discovery
High Severity
TestPipeline.__init__ now requires an args parameter, but pytest discovers test classes and instantiates them with no constructor arguments. This causes a TypeError when pytest tries to collect any test in this class, effectively breaking the entire test pipeline. Previously, the class had no __init__ and used module-level constants DATASET and MODELS, which allowed pytest to work normally.
topobench/data/datasets/a123.py
Outdated
| ) | ||
| edge_attr = torch.tensor( | ||
| weights.reshape(-1, 1), dtype=torch.float | ||
| ) |
There was a problem hiding this comment.
Edge attributes misaligned after to_undirected sorting
Medium Severity
After to_undirected(edge_index) sorts edges via internal coalesce, the edge ordering changes. However, np.repeat(weights, 2) assumes the doubled edges retain the original-then-reversed interleaving. Since to_undirected sorts all edges by (row, col), the resulting edge_attr values are misaligned with their corresponding edges, assigning incorrect correlation weights to most edges.
topobench/data/datasets/a123.py
Outdated
| else: | ||
| shutil.move(src, dst) | ||
| # Delete the extracted top-level directory | ||
| shutil.rmtree(downloaded_dir) |
There was a problem hiding this comment.
Download move logic uses wrong directory name
Medium Severity
The downloaded_dir is constructed as osp.join(folder, self.name) which resolves to <raw_dir>/a123_cortex_m, but the zip file from URL Auditory_cortex_data.zip extracts to a directory like Auditory cortex data/ (matching raw_file_names), not a123_cortex_m/. The osp.exists(downloaded_dir) check will always be False, so the intended file reorganization never happens. The comment on line 192 even says "Move files from extracted 'Auditory cortex data/' directory" but the code checks for self.name instead.
|
|
||
| # root path for dataset: use the parent of root_data_dir since the dataset | ||
| # constructs its own subdirectory based on name | ||
| root = str(self.root_data_dir.parent) |
There was a problem hiding this comment.
Loader uses parent path inconsistent with other loaders
Medium Severity
root = str(self.root_data_dir.parent) deviates from every other loader in the codebase, which all use root = str(self.root_data_dir). This causes the dataset to store files at a different path than what get_data_dir() (inherited from AbstractLoader) reports. When run.py calls dataset_loader.load(), the returned dataset_dir points to a non-existent directory, which can break downstream processing in PreProcessor.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #268 +/- ##
==========================================
- Coverage 94.01% 92.54% -1.47%
==========================================
Files 184 185 +1
Lines 6665 6859 +194
==========================================
+ Hits 6266 6348 +82
- Misses 399 511 +112 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
You have used all of your free Bugbot PR reviews. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |



Checklist
Description
This PR adds the A123CortexM dataset.
The code was taken from https://github.com/marindigen/TopoBenchForNeuro, but I removed the novel triangle classification tasks.
Note
Medium Risk
Adds a new externally-downloaded dataset pipeline and significantly changes
download_file_from_linkbehavior (streaming, retries, SSL verification control), which could affect all dataset downloads and processing stability.Overview
Adds support for the Bowen et al. A123 auditory cortex calcium imaging dataset as a new graph classification dataset, including a Hydra dataset config (
configs/dataset/graph/a123.yaml), a newA123CortexMDatasetthat downloads/unzips raw data and converts correlation submatrices into PyG graphs, and a correspondingA123DatasetLoader.Upgrades
download_file_from_linkto a streaming, chunked downloader with directory creation, progress reporting, configurable SSL verification and per-chunk timeouts, plus retry/backoff logic; adds MATLAB.mathelpers (collect_mat_files,process_mat, etc.) toio_utilsand expands unit tests to cover the new download behavior.Updates the pipeline test harness to accept dataset/model selection via CLI, and ignores
lightning_logs/in.gitignore.Written by Cursor Bugbot for commit 5afc5ca. This will update automatically on new commits. Configure here.