The succesful candidate will be responsible for:
Creating data feeds from on-premise to AWS Cloud Support data feeds in production on break fix basis
Creating data marts using Talend or similar ETL development tool
Manipulating data using python and pyspark
Processing data using the Hadoop paradigm particularly using EMR, AWS’s distribution of Hadoop
Devop for Big Data and Business Intelligence including automated testing and deployment
Design and develop data feeds from an on-premise environment into a datalake environment in an AWS cloud environment
Design and develop programmatic transformations of the solution, by correctly partitioning, formatting and validating the data quality