AI Automation

Big Data Pipeline — Hadoop HDFS on Cloud Server

Big Data Engineer

Business summary

Operations automation • Automate document handling and eliminate duplicates

Eliminated repetitive manual steps across the workflow
Faster response time and fewer missed handoffs
Error handling and edge-case coverage to reduce operational risk

What I built

Deployed and configured a Hadoop HDFS cluster on a remote Ubuntu server for processing large-scale datasets. Set up HDFS file system, implemented MapReduce jobs for data transformation, processed 189MB+ compressed datasets, and automated daily ingestion pipelines. Used for academic and applied data engineering work as part of my MSc in Big Data Management & Analytics at Griffith College Dublin. Demonstrates hands-on distributed computing and server administration skills alongside automation work.

Tech stack

HadoopBig DataPythonLinux / UbuntuData EngineeringHDFSMapReduceLinuxData pipeline