Outliers detection-imputation Standardization by claude-marie · Pull Request #50 · BLSQ/snt_development

claude-marie · 2026-03-16T16:20:09Z

This PR is about outiers detection pipelines output standardization.
please find the ticket here

…and pipeline; add reporting functionality and configuration management

EstebanMontandon · 2026-03-17T12:33:16Z

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

 )


+def _materialize_standard_outputs_from_legacy(


Hey Claude,
I don't think it's a good approach to add data transformation logic in Python on top of the R script output. This splits the computation logic across two languages. In the case of this Python pipeline, it's responsibility should only be to: 1. Pass parameters to the R script, 2. Execute the R notebook and 3. Load results into the dataset.

By adding transformations in Python, we're mixing computation logic between languages. This makes it harder for data scientists to understand, modify, or debug the full computation, since they'd need to look at both the R script and the Python code to see what's actually being produced.

I think all formatting and transformation logic should stay within the R code, keeping the computation self-contained in one code.

EstebanMontandon · 2026-03-17T12:35:30Z

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

Lets make it simple by using a bool parameter "Complete: True/False".
By default is Partial, with the option to execute complete if selected.

EstebanMontandon · 2026-03-17T12:39:45Z

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

By using a bool parameter, we can get rid of all this python checking code.
mode_clean = (mode or "partial").strip().lower() if mode_clean not in ("partial", "complete"): raise ValueError('mode must be "partial" or "complete".') run_mg_partial = True run_mg_complete = mode_clean == "complete"

We only need run_mg_complete True or False.

NOTE: Probably this will necessarily introduce changes in the way we inject and how the parameters are used in the computation R notebook.

EstebanMontandon · 2026-03-17T12:44:12Z

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

+                # Fallback for legacy MG notebooks still writing legacy file names.
+                legacy_data_path = root_path / "data" / "dhis2" / "outliers_detection"
+                legacy_candidates = [
+                    data_path / f"{country_code}_flagged_outliers_magic_glasses.parquet",


Link to my previous commnet:

We should not produce or read this file anymore:
f"{country_code}_flagged_outliers_magic_glasses.parquet"

The outputs are directly produced by the R notebook:
data_path / f"{country_code}_routine_outliers_detected.parquet", data_path / f"{country_code}_routine_outliers_imputed.parquet", data_path / f"{country_code}_routine_outliers_removed.parquet",

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

EstebanMontandon · 2026-03-17T12:55:22Z

snt_dhis2_outliers_imputation_magic_glasses/pipeline.py

            input_params = {
                "ROOT_PATH": Path(workspace.files_path).as_posix(),
                "RUN_MAGIC_GLASSES_PARTIAL": run_mg_partial,
                "RUN_MAGIC_GLASSES_COMPLETE": run_mg_complete,


I think we need to standardize the way we provide the information in the parameters file , so the other pipelines know how to handle the information depending of the method being used (if needed).

I propose to define after the root_path a standard node for all outliers methods called OUTLIERS_METHOD: so in this case:
OUTLIERS_METHOD: "MG_COMPLETE" ## or "MG_PARTIAL", "IQR", "MEAN"... etc

claude-marie added 5 commits March 12, 2026 17:04

1st step done

fe06e8a

separate mg pipeline

6b401e9

some fixes

fd152c2

feat(outliers): introduce Magic Glasses outlier imputation notebooks …

7dc891e

…and pipeline; add reporting functionality and configuration management

kernel stuff

8df1853

claude-marie requested a review from EstebanMontandon March 16, 2026 16:20

EstebanMontandon requested changes Mar 17, 2026

View reviewed changes

claude-marie added 2 commits March 17, 2026 15:19

Merge remote-tracking branch 'origin/main' into SNT25-384_outliers

6c3ccfb

test

b439a59

claude-marie requested a review from EstebanMontandon March 17, 2026 14:25

claude-marie added 3 commits March 18, 2026 09:39

same output for all

1c105c3

fix

8f7ff86

some

b0a6536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outliers detection-imputation Standardization#50

Outliers detection-imputation Standardization#50
claude-marie wants to merge 10 commits intomainfrom
SNT25-384_outliers

claude-marie commented Mar 16, 2026

Uh oh!

EstebanMontandon Mar 17, 2026

Uh oh!

EstebanMontandon Mar 17, 2026

Uh oh!

EstebanMontandon Mar 17, 2026

Uh oh!

EstebanMontandon Mar 17, 2026

Uh oh!

Uh oh!

EstebanMontandon Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)


		def _materialize_standard_outputs_from_legacy(

Conversation

claude-marie commented Mar 16, 2026

Uh oh!

EstebanMontandon Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

EstebanMontandon Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

EstebanMontandon Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

EstebanMontandon Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EstebanMontandon Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants