Job Description
As a Big Data Engineer, you will develop, maintain, evaluate, and test big data solutions. You will be involved in data engineering activities like creating pipelines/workflows for Source to Target and implementing solutions that tackle the clients needs.
Your primary responsibilities include:
- Design, build, optimize and support new and existing data models and ETL processes based on our clients business requirements.
- Build, deploy and manage data infrastructure that can adequately handle the needs of a rapidly growing data driven organization.
- Coordinate data access and security to enable data scientists and analysts to easily access to data whenever they need too.
Required Technical and Professional Expertise
- Developed the Pysprk code for AWS Glue jobs and for EMR..
- Worked on scalable distributed data system using Hadoop ecosystem in AWS EMR, MapR distribution..
- Developed Python and pyspark programs for data analysis.. Good working experience with python to develop Custom Framework for generating of rules (just like rules engine).
- Developed Hadoop streaming Jobs using python for integrating python API supported applications..
- Developed Python code to gather the data from HBase and designs the solution to implement using Pyspark.
Preferred Technical and Professional Expertise
- Apache Spark DataFrames/RDD’s were used to apply business transformations and utilized Hive Context objects to perform read/write operations..
- Re – write some Hive queries to Spark SQL to reduce the overall batch time.