About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. We have a robust presence in four countries and a dedicated team of over 400 full-time employees, committed to fostering AI transformation journeys for businesses worldwide.
Location : Remote (Full-time)
About The Role
This is a remote, contract position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).
We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large-scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products in the Aviation Industry, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.
Qualification & Experience
- Must have a full-time Bachelor's degree in Computer Science or similar
- At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers
- 5+ years of experience with Azure DevOps, GitHub
- Proven experience delivering large scale projects and products for Data and Analytics as a data engineer, including migrations
- Following certifications :
Databricks Certified Associate Developer for Apache Spark
Databricks Certified Data Engineer AssociateMicrosoft Certified : Azure FundamentalsMicrosoft Certified : Azure Data Engineer AssociateMicrosoft Exam : Designing and Implementing Microsoft DevOps Solutions (nice to have)Required Skills / Competencies
Strong programming skills in Python (required), Scala, and writing efficient and optimized code for data integration, migration, storage, processing and manipulationStrong understanding and experience with SQL and writing advanced SQL queriesThorough understanding of big data principles, techniques, and best practicesExperience with scalable and distributed data processing technologies such as Spark / PySpark (Azure Databricks), DBT and KafkaDatabricks development experience with Python, PySpark, Spark SQL, Pandas, NumPy in AzureDesigning and implementing efficient ELT / ETL processes in Azure and Databricks; develop custom integration solutions as neededData integration from APIs, databases, flat files, event streamingData cleansing, transformation, and validationRelational Databases (Oracle, SQL Server, MySQL, Postgres) and NonSQL Databases (MongoDB or similar)Data modeling and database design principles; design efficient schemas for data architectureData warehousing, data lake and data lake house solutions in Azure and DatabricksDelta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT)SDLC knowledge, Agile methodologiesExperience with SDLC tools : Azure DevOps, GitHub; project management (Jira / Azure Boards), source control, CI / CD (GitHub Actions, Azure Pipelines), artifact managementDevOps principles : CI / CD, IaC (Terraform, ARM), configuration management, automated testing, performance tuning, cost optimizationCloud computing in Microsoft Azure for data and analytics : ADF, Databricks, Synapse, Data Lake, Data Lake Storage, SQL Database, etc.Orchestration with Databricks workflows and Apache AirflowData structures and algorithms; strong software engineering practicesMigrating from Azure Synapse to Azure Data Lake or other technologiesAnalytical skills to identify and address issues, bottlenecks, and failuresDebugging and troubleshooting in complex data environmentsData quality and governance, including data quality checks and monitoringBI solutions (Power BI) is a plusStrong communication skills for cross-functional collaborationDocumentation of processes and deployment configurationsSecurity practices : network security groups, Azure Active Directory, encryption; complianceAbility to implement security controls in data / analytics solutionsMentoring and coaching of team members; willingness to stay updated with trendsAbility to work independently in a rapidly changing environmentFocus on architecture, observability, testing, and reliable data pipelinesResponsibilities
Architect, design, develop, test and maintain high-performance, large-scale data architectures for data integration (batch and real-time), storage, processing, orchestration and infrastructure; ensure scalability, reliability, and performance (Databricks and Azure)Contribute to detailed design, architectural discussions, and customer requirementsParticipate in design, development, and testing of big data productsConstruct and optimize Apache Spark jobs and clusters within DatabricksMigrate from Azure Synapse to Azure Data Lake or other technologiesDesign schemas and data models to support modern analytics (descriptive to prescriptive)Develop clear, maintainable code with automated testingCollaborate with cross-functional teams to understand data requirements and deliver reusable componentsEvaluate and implement new technologies to improve data integration, processing, storage and analysisDesign, implement and maintain data governance : cataloging, lineage, data qualityMonitor and optimize workloads and clusters for performanceMentor junior team members and share best practicesMaintain documentation of solutions and configurationsPromote best practices in data engineering, governance, and qualityEnsure data quality and securityBe an active Agile team member and contribute to continuous improvementFusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.
#J-18808-Ljbffr