Job Type
Full-time
Description
Key Responsibilities
- Build and maintain scalable data pipelines in AWS to support ingestion, transformation, and enrichment of structured and semi-structured data.
- Design and implement Delta Lake tables optimized for ACID compliance, partition pruning, schema enforcement, and query performance across large datasets.
- Develop ETL and ELT workflows that integrate multiple source systems into a centralized, query-optimized data warehouse architecture.
- Leverage AWS tools to implement business rules, dimensional joins, and aggregation logic aligned to warehouse modeling best practices.
- Collaborate with data architects and engineers to implement cloud-native data solutions on AWS using S3, Glue, RDS, and IAM for secure, scalable storage and access control.
- Optimize pipeline performance through intelligent partitioning, caching, broadcast joins, and adaptive query tuning.
- Deploy and version data engineering assets using Git-integrated development workflows and automate deployment with CI/CD tools such as GitLab or Jenkins.
- Monitor pipeline health, job execution, and cluster utilization using AWS CloudWatch, identifying bottlenecks and optimizing cost-performance tradeoffs.
- Conduct technical discovery and mapping of legacy source systems, identifying required transformations and designing end-to-end data flows.
- Implement governance practices including metadata tagging, data quality validation, audit logging, and lineage tracking using platform-native features and custom logic.
- Support ad hoc data access requests, develop reusable data assets, and maintain shared notebooks that meet operational reporting and analytics needs across teams.
Disclaimer "The responsibilities and duties outlined in this job description are intended to describe the general nature and level of work performed by employees within this role. However, they are not exhaustive and may be subject to change or modification at any time to meet the evolving needs of the organization and client.
Requirements
Required Qualifications
- This role requires knowledge and/or experience with Spark, Delta Lake, and distributed data pipelines
- The ideal candidate brings both engineering and strategic insight into enterprise data modernization
- 8+ years of experience in data engineering and Agile analytics
- 5 years of experience building scalable ETL and ELT workflows for reporting and analytics
- 3+ years of experience building enterprise data engineering solutions in the cloud, with preferred experience with cloud native technologies from AWS
- Hands-on experience in the following:
- Glue / Spark SQL–based data transformations
- S3 partitioning strategy
- Step Functions–based orchestration
- Infrastructure as Code (Terraform and/or CloudFormation)
- Deployment automation for data pipelines
- AWS services integration
- Experience with data quality, validation frameworks, and storage optimization strategies
- BA or BS degree
- Excellent communication and organizational skills with the ability to manage multiple priorities
- U.S. Citizenship is required by the Federal Client
- Must have or able to obtain DoD Public Trust Clearance
Preferred Qualifications
- Experience building data pipelines using Spark with Java or Scala
- Strong Java development experience in enterprise environments
Salary Description
$145,000 - $160,000