Skip to content
View mponsclo's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Madbox
  • Barcelona

Block or report mponsclo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
mponsclo/README.md

Hi, I'm Marcel 👋

Senior Data Scientist & Data Engineer based in Barcelona. I work at the intersection of data science and data engineering, building the pipelines that make models possible, and the models that make data useful.

Currently at Madbox, a mobile gaming studio, where I've spent 4 years going from BI Engineer to Senior Data Scientist. My background spans gaming, biomedical research, and programmatic advertising, which means I've had to think carefully about data in very different contexts.


Featured projects

Project Use case Stack Key result
bigquery-air-quality-forecasting Air quality forecasting + anomaly detection LightGBM, dbt, BigQuery, Terraform, Cloud Run 25-station ensemble, deployed to Cloud Run
banking-fraud-detection-pipeline Fraud detection + expense forecasting LightGBM, dbt, DuckDB, LangChain, BigQuery Fraud BA=0.97, forecasting R2=0.76
session-recommender-lambdarank Session-based product recommendations LightGBM LambdaRank, Item2Vec, dbt, DuckDB NDCG@5 = 0.377, Hit Rate@5 = 76%
music-streaming-churn-prediction Subscription churn prediction (KKBox, WSDM Cup 2018) LightGBM, Optuna, SHAP, dbt, DuckDB ROC-AUC 0.924 on temporal holdout
temporal-association-rules-multimorbidity Clinical pattern mining from EHR data Python, Apriori extensions, Fleiss Kappa MSc thesis, validated with physicians

Right now

  • Deepening DS/DE knowledge by applying new concepts to real use cases at Madbox
  • Exploring how far AI can go in replacing or augmenting data work (and where it falls short)

Domains I've worked in

Gaming analytics · Clinical & biomedical data · Programmatic advertising · NLP research (UPC/TALP)


Background

MSc Data Science at UPC. My thesis developed two algorithms extending the Apriori framework with temporal dimensions to extract sequential disease co-occurrence patterns from real EHR data -- the kind of problem where getting the definition of "pattern" wrong has clinical consequences. Validated with physicians using Fleiss Kappa.

Before that: NLP research at TALP/UPC, programmatic advertising analytics at Smadex, and a clinical ML internship at Predictheon.


Stack

Cloud BigQuery Pub/Sub Dataflow GCS Cloud Run
Engineering dbt Terraform Python Docker GitHub Actions
Modeling LightGBM scikit-learn SHAP Optuna
Serving FastAPI Streamlit

📎 LinkedIn · 🌐 mponscloq.com

Pinned Loading

  1. temporal-association-rules-multimorbidity temporal-association-rules-multimorbidity Public

    Final Master's Project 2022. Development of two algorithms extending the Apriori framework with temporal dimensions to mine sequential clinical patterns from electronic health records of multimorbi…

    Python

  2. drug-interaction-nlp drug-interaction-nlp Public

    Repository for the Advanced Human Language Technologies course at UPC. Named Entity Recognition and retrieval of drugs interactions from raw text using Natural Language Processing techniques

    Jupyter Notebook 1

  3. bigquery-air-quality-forecasting bigquery-air-quality-forecasting Public

    End-to-end air quality forecasting & anomaly detection for 25 Seoul stations: DBT pipeline, LightGBM ensemble, FastAPI serving, Streamlit dashboard

    Python

  4. banking-fraud-detection-pipeline banking-fraud-detection-pipeline Public

    End-to-end fraud detection pipeline: LightGBM + Focal Loss, dbt on BigQuery, Terraform-managed GCP, FastAPI on Cloud Run

    Python

  5. session-recommender-lambdarank session-recommender-lambdarank Public

    Two-stage session-based recommender (LightGBM LambdaRank) for an Inditex hackathon. Cold-start focused, 93% of sessions have no user history. NDCG@5 = 0.377, Hit Rate@5 = 76%.

    Python

  6. music-streaming-churn-prediction music-streaming-churn-prediction Public

    End-to-end churn prediction on KKBox WSDM 2018: DuckDB + dbt feature engineering, LightGBM + Optuna, SHAP, temporal holdout evaluation.

    Python