AI Automation
Big Data Pipeline — Hadoop HDFS on Cloud Server
Big Data Engineer
Business summary
Operations automation • Automate document handling and eliminate duplicates
- Eliminated repetitive manual steps across the workflow
- Faster response time and fewer missed handoffs
- Error handling and edge-case coverage to reduce operational risk
What I built
Deployed and configured a Hadoop HDFS cluster on a remote Ubuntu server for processing large-scale datasets. Set up HDFS file system, implemented MapReduce jobs for data transformation, processed 189MB+ compressed datasets, and automated daily ingestion pipelines. Used for academic and applied data engineering work as part of my MSc in Big Data Management & Analytics at Griffith College Dublin. Demonstrates hands-on distributed computing and server administration skills alongside automation work.
Tech stack
HadoopBig DataPythonLinux / UbuntuData EngineeringHDFSMapReduceLinuxData pipeline