Senior Data Scientist & Data Engineer based in Barcelona. I work at the intersection of data science and data engineering, building the pipelines that make models possible, and the models that make data useful.
Currently at Madbox, a mobile gaming studio, where I've spent 4 years going from BI Engineer to Senior Data Scientist. My background spans gaming, biomedical research, and programmatic advertising, which means I've had to think carefully about data in very different contexts.
Featured projects
| Project | Use case | Stack | Key result |
|---|---|---|---|
| bigquery-air-quality-forecasting | Air quality forecasting + anomaly detection | LightGBM, dbt, BigQuery, Terraform, Cloud Run | 25-station ensemble, deployed to Cloud Run |
| banking-fraud-detection-pipeline | Fraud detection + expense forecasting | LightGBM, dbt, DuckDB, LangChain, BigQuery | Fraud BA=0.97, forecasting R2=0.76 |
| session-recommender-lambdarank | Session-based product recommendations | LightGBM LambdaRank, Item2Vec, dbt, DuckDB | NDCG@5 = 0.377, Hit Rate@5 = 76% |
| music-streaming-churn-prediction | Subscription churn prediction (KKBox, WSDM Cup 2018) | LightGBM, Optuna, SHAP, dbt, DuckDB | ROC-AUC 0.924 on temporal holdout |
| temporal-association-rules-multimorbidity | Clinical pattern mining from EHR data | Python, Apriori extensions, Fleiss Kappa | MSc thesis, validated with physicians |
Right now
- Deepening DS/DE knowledge by applying new concepts to real use cases at Madbox
- Exploring how far AI can go in replacing or augmenting data work (and where it falls short)
Domains I've worked in
Gaming analytics · Clinical & biomedical data · Programmatic advertising · NLP research (UPC/TALP)
Background
MSc Data Science at UPC. My thesis developed two algorithms extending the Apriori framework with temporal dimensions to extract sequential disease co-occurrence patterns from real EHR data -- the kind of problem where getting the definition of "pattern" wrong has clinical consequences. Validated with physicians using Fleiss Kappa.
Before that: NLP research at TALP/UPC, programmatic advertising analytics at Smadex, and a clinical ML internship at Predictheon.
Stack
| Cloud | BigQuery Pub/Sub Dataflow GCS Cloud Run |
| Engineering | dbt Terraform Python Docker GitHub Actions |
| Modeling | LightGBM scikit-learn SHAP Optuna |
| Serving | FastAPI Streamlit |
📎 LinkedIn · 🌐 mponscloq.com


