We are looking for an experienced Data Engineer in order to develop DataGalaxy premium connectivity features, on top of the modern data stack services, starting with cloud datalakes and data warehouses solutions.
The product you will be working on aims on delivering insights at different metadata levels, such as :
- Urbanization : data should be organized with respect to urbanization rules. This scope aims at knowing how good these rules are enforced, and drives the way data will be aggregated for all others insights
- Storage : metrics about storage capacity and repartition, with drill down capabilities along the urbanization model
- Access : global access map, including recursive permissions given through inheritance
- Usage : metrics about usages and actions taken by users and applications
- Cost : global cost metrics and repartition along the urbanization model, that can be used in order to rebuild internally a service usage
On top of these scopes, data science algorithms and features are foreseen in order to identify, anticipate and leverage behaviors.
The final goal aims at feeding our data governance platform in order to organize and surface all this information for our users.
🌟Your team and colleagues
Your role is to lead data engineering developments within the team.
You’ll be working in close collaboration with a dedicated python Dev and Ops Lead in order to help you build reliable artifacts with automated pipelines. Other data engineers and scientists will be involved: being able to collaborate with teams is a key skill for this position.
We use Agile SCRUM methodology and you’ll be accountable towards your team Product Owner, as well as product lead guidance from your Product Manager.
The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up.
The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
- You have minimum 3 years of experience in a relevant role
- You are 100% Customer-oriented
- You know how to build reliable and scalable algorithms
- Strong Python and Spark competencies, with relevant experiences
- Cloud Vendor experiences (Azure, AWS, GCP), Databricks experience
- Create and maintain optimal data systems and pipeline architecture
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Conduct complex data analysis and report on top of these results
- Identify opportunities for data acquisition
- Design and evaluate open source and vendor tools for data acquisition and data lineage
- Collaborate with data scientists and architects on several projects
- You are experienced in designing end-to-end data flows, including defining complex functional rules and the best ways to implement them with pySpark. You can split the work and delegate tasks to other members of the team.
- You are familiar with TDD applied to data flows and Spark unit testing
By joining DataGalaxy, you become a part of a vibrant team which is instantly evolving in a fast-paced environment. Our expansion continues internationally so we're looking for curious and passionate individuals who are aligned with our values: “Be Intentional: Be Clear, Be Bold, Be Humble”. Ready to start your new adventure?
🚀What can you expect:
- Offices in the heart of Lyon and Paris
- Flexible working hours (“forfait jour”)
- Remote work at will
- Health Insurance Apicil
- Meal Vouchers (Swile)
- Daily coffee & snacks
- Mid-annual team events
RemoteFR t'aide à trouver ton prochain job 100% remote : Crée ton profil