Data Science Portfolio

Featured Projects

NASA Turbofan Engine Anomaly Detection

An end‑to‑end predictive maintenance project built on NASA’s Commercial Modular Aero‑Propulsion System Simulation (C‑MAPSS) dataset. The dataset simulates realistic turbofan engine flights, recording 30 engine parameters at a 1 Hz sampling rate across multiple flight conditions【763856462666916†L58-L78】. Faults are introduced in various components—fan, compressors and turbines—to evaluate remaining useful life and early anomaly detection【763856462666916†L64-L82】.

Data acquisition: High‑fidelity simulations capture full flight recordings with seven distinct flight conditions for ascent and descent【763856462666916†L60-L78】.
Cleaning & preprocessing: Removed irrelevant observations, fixed structural issues, managed outliers and imputed missing values to ensure clean sensor signals【505161951259763†L95-L124】.
Modeling: Trained unsupervised models (Isolation Forest and autoencoders) on healthy data to detect deviations indicating degradation.
Result: Delivered a dashboard for real‑time anomaly scoring and projected remaining useful life, enabling proactive maintenance decisions.

Data Ingestion & Cleaning Pipeline

Demonstrates a robust pipeline for ingesting heterogeneous data sources and preparing them for analysis. Data ingestion is the first step of any analytics workflow: it involves collecting and importing data from diverse sources into a centralized system for storage and analysis【539835010212791†L397-L417】. Clean data drastically improves model performance—hence the adage “better data beats fancier algorithms.”

Ingestion process: Collected raw CSVs, API feeds and database extracts, then transferred and loaded them into a data lake. Both batch and real‑time ingestion modes were supported【539835010212791†L421-L436】.
Cleaning steps: Removed unwanted records, standardized formats, managed outliers and imputed missing values【505161951259763†L95-L124】. Documented every transformation for transparency and reproducibility.
Automation: Utilized Python and SQL to automate ingestion and cleaning, with scheduled ETL jobs monitoring pipeline health.
Outcome: Delivered curated datasets ready for exploratory data analysis, modeling and visualization.

About Me

I am a data scientist with a background in the finance industry. My passion lies in leveraging machine learning and statistical techniques to extract actionable insights. In my previous roles I have developed predictive models for risk assessment, automated reporting pipelines and implemented anomaly detection systems to safeguard mission‑critical operations. Beyond my professional work, I enjoy creating content around exploring open‑source projects and translating technical breakthroughs to my audience on social media platforms.

Contact

Interested in collaborating? Feel free to reach out via email or connect with me on LinkedIn.

Email: reachout.rodriguez@gmail.com
LinkedIn: linkedin.com/in/a-n-t-h-o-n-y-r/