Description
Job description
We are looking for an experienced Data Engineer in order to develop DataGalaxy premium connectivity features, on top of the modern data stack services, starting with cloud datalakes and data warehouses solutions.
đĄThe product
The product you will be working on aims on delivering insights at different metadata levels, such as :
- Urbanization : data should be organized with respect to urbanization rules. This scope aims at knowing how good these rules are enforced, and drives the way data will be aggregated for all others insights
- Storage : metrics about storage capacity and repartition, with drill down capabilities along the urbanization model
- Access : global access map, including recursive permissions given through inheritance
- Usage : metrics about usages and actions taken by users and applications
- Cost : global cost metrics and repartition along the urbanization model, that can be used in order to rebuild internally a service usage
On top of these scopes, data science algorithms and features are foreseen in order to identify, anticipate and leverage behaviors.
The final goal aims at feeding our data governance platform in order to organize and surface all this information for our users.
đYour team and colleagues
Your role is to lead data engineering developments within the team.
Youâll be working in close collaboration with a dedicated python Dev and Ops Lead in order to help you build reliable artifacts with automated pipelines. Other data engineers and scientists will be involved: being able to collaborate with teams is a key skill for this position.
We use Agile SCRUM methodology and youâll be accountable towards your team Product Owner, as well as product lead guidance from your Product Manager.
đŻ Requirements
The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing data systems and building them from the ground up.
The Data Engineer will support our software developers, database architects, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
- You have minimum 3 years of experience in a relevant role
- You are 100% Customer-oriented
- You know how to build reliable and scalable algorithms
- Strong python and Spark competencies, with relevant experiences
- Cloud Vendor experiences (Azure, AWS, GCP), Databricks experience
- Create and maintain optimal data systems and pipeline architecture
- Build processes supporting data transformation, data structures, metadata, dependency and workload management.
- Conduct complex data analysis and report on top of these results
- Identify opportunities for data acquisition
- Design and evaluate open source and vendor tools for data acquisition and data lineage
- Collaborate with data scientists and architects on several projects
- You are experienced in designing end-to-end data flows, including defining complex functional rules and the best ways to implement them with pySpark. You can split the work and delegate tasks to other members of the team.
- You are familiar with TDD applied to data flows and Spark unit testing
By joining DataGalaxy, you become a part of a vibrant team which is instantly evolving in a fast-paced environment. Our expansion continues internationally so we're looking for curious and passionate individuals who are aligned with our values: âBe Intentional: Be Clear, Be Bold, Be Humbleâ. Ready to start your new adventure?
đWhat can you expect:
- Offices in the heart of Lyon and Paris
- Flexible working hours (âforfait jourâ)
- Remote work at will
- Health Insurance Apicil
- Meal Vouchers (Swile)
- Daily coffee & snacks
- Mid-annual team events
5 autres jobs qui pourrait t'intéresser:
- đ Mentor en DĂ©veloppement d'Applications IA avec LangChain - Freelance (H/F)
- đ AI Tech Lead
- đ Data Scientist- remote CZ
- đ Data Analyst
- đ Business Data Analyst - US Market (F/H/X)
Obtiens 10x plus d'entretiens d'embauche grĂące Ă l'automatisation des candidatures avec l'IA
En fonction de tes critĂšres de recherche, postule automatiquement jusqu'Ă 1 500 offres d'emploi chaque mois