Jabir B

Data Engineer
🪙 10000 / month
June 7, 1999

About Candidate

Data Engineer with 4+ years of hands-on experience in designing, optimizing, and governing high
throughput, scalable data solutions, primarily within the Banking, Telecommunication, Transport Domain.
Expertise in PySpark/Spark, Scala, SQL ,kafka and the AWS ecosystem, with a proven history of migrating
legacy systems to modern, high-throughput pipelines, achieving up to 50% faster execution times. Proficient
in Data Quality and Governance to ensure production data is reliable and accurate for key stakeholders,
including Analyst and ML Engineers.

Location

Education

B
B.E in Computer science and engineering 2021
P A College of Engineering

Work & Experience

D
Data Engineer 3rd December 2025 - 23rd Jan 2026
Roads and Transport Authority, Dubai

Developing data pipelines leveraging Kafka and Spark Streaming with Scala to process real
time data, efficiently distributing it to multiple destinations MongoDB and postgreSQL.
Interact with and perform data operations on various databases, including MongoDB and
MSSQL
Migrated large‑scale datasets from MongoDB to the Big Data platform by developing Python
based ingestion pipelines, converting semi‑structured documents into optimized Parquet files
for efficient downstream processing and analytics.
Developed a Python-based data ingestion job to consume streaming data from Kafka topics
and load it into MongoDB and Kafka Topic

D
Data Engineering Management and Governance Analyst October 2021 - August 2025
Accenture India Pvt Ltd

Developed highly scalable ETL pipelines using PySpark on AWS Glue to process and transform
large data volumes from multiple source systems
Automated data ingestion workflows via AWS Lambda, Glue, S3, and Athena, reducing manual
intervention and improving overall pipeline reliability
Pioneered migration of legacy SAS processes to PySpark, achieving a seamless transition with
100% output accuracy (Tableau) and a significant 30%+ enhancement in processing speed
Implemented advanced data transformation logic in PySpark (spark with python) , including
window functions, aggregations, and joins, to meet complex business requirements.
Optimized PySpark jobs using effective caching and resource management strategies, resulting
in a 50% reduction in job execution time and minimized cluster resource consumption
Implement data quality checks, monitoring, and validation routines using SQL to ensure data
accuracy and reliability throughout the pipeline.
Worked with Hive and Impala for querying and storing large datasets, optimizing queries for
faster retrieval.
Collaborated with product managers and engineers to gather and define technical
requirements, conducted code reviews and provided mentorship to junior developers, fostering
a culture of continuous improvement and knowledge sharing within the team.
Developed and maintained data pipelines using ETL processes and tools
Provided technical support for big data systems and applications
Analyze code and data to troubleshoot dashboard exceptions and errors
Developed and implemented various shell scripts for automating daily jobs, minimizing manual
intervention which was taking 7 hours/week by 80%
Experienced in Importing and exporting data into HDFS and Hive using Sqoop
Participated in the development/implementation of Cloudera Hadoop environment
Loaded and transformed large structured and semi-structured data from UNIX file systems into HDFS..
Developed and optimized Hive queries for data visualization and reporting
Converted Pig scripts to shell scripts to streamline data processing workflows.
Worked with PySpark, including working with RDDs, DataFrames, and optimization techniques.

Skills

Data Engineer
Pyspark
SQL
Hadoop
Kafka
AWS
Hive
Hbase
Apache Oozie
Sqoop
Cloudera Data Platform

Awards

B
Be brilliant Award 2025

Award for excellence