Talent.com
This job offer is not available in your country.
Sr. Data Engineer Azure Databricks

Sr. Data Engineer Azure Databricks

FusemachinesIslamabad, Islamabad Capital Territory, Pakistan
16 days ago
Job description

About Fusemachines

Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. We have a robust presence in four countries and a dedicated team of over 400 full-time employees, committed to fostering AI transformation journeys for businesses worldwide.

Location : Remote (Full-time)

About The Role

This is a remote, contract position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).

We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large-scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products in the Aviation Industry, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.

Qualification & Experience

  • Must have a full-time Bachelor's degree in Computer Science or similar
  • At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers
  • 5+ years of experience with Azure DevOps, GitHub
  • Proven experience delivering large scale projects and products for Data and Analytics as a data engineer, including migrations
  • Following certifications :

Databricks Certified Associate Developer for Apache Spark

  • Databricks Certified Data Engineer Associate
  • Microsoft Certified : Azure Fundamentals
  • Microsoft Certified : Azure Data Engineer Associate
  • Microsoft Exam : Designing and Implementing Microsoft DevOps Solutions (nice to have)
  • Required Skills / Competencies

  • Strong programming skills in Python (required), Scala, and writing efficient and optimized code for data integration, migration, storage, processing and manipulation
  • Strong understanding and experience with SQL and writing advanced SQL queries
  • Thorough understanding of big data principles, techniques, and best practices
  • Experience with scalable and distributed data processing technologies such as Spark / PySpark (Azure Databricks), DBT and Kafka
  • Databricks development experience with Python, PySpark, Spark SQL, Pandas, NumPy in Azure
  • Designing and implementing efficient ELT / ETL processes in Azure and Databricks; develop custom integration solutions as needed
  • Data integration from APIs, databases, flat files, event streaming
  • Data cleansing, transformation, and validation
  • Relational Databases (Oracle, SQL Server, MySQL, Postgres) and NonSQL Databases (MongoDB or similar)
  • Data modeling and database design principles; design efficient schemas for data architecture
  • Data warehousing, data lake and data lake house solutions in Azure and Databricks
  • Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT)
  • SDLC knowledge, Agile methodologies
  • Experience with SDLC tools : Azure DevOps, GitHub; project management (Jira / Azure Boards), source control, CI / CD (GitHub Actions, Azure Pipelines), artifact management
  • DevOps principles : CI / CD, IaC (Terraform, ARM), configuration management, automated testing, performance tuning, cost optimization
  • Cloud computing in Microsoft Azure for data and analytics : ADF, Databricks, Synapse, Data Lake, Data Lake Storage, SQL Database, etc.
  • Orchestration with Databricks workflows and Apache Airflow
  • Data structures and algorithms; strong software engineering practices
  • Migrating from Azure Synapse to Azure Data Lake or other technologies
  • Analytical skills to identify and address issues, bottlenecks, and failures
  • Debugging and troubleshooting in complex data environments
  • Data quality and governance, including data quality checks and monitoring
  • BI solutions (Power BI) is a plus
  • Strong communication skills for cross-functional collaboration
  • Documentation of processes and deployment configurations
  • Security practices : network security groups, Azure Active Directory, encryption; compliance
  • Ability to implement security controls in data / analytics solutions
  • Mentoring and coaching of team members; willingness to stay updated with trends
  • Ability to work independently in a rapidly changing environment
  • Focus on architecture, observability, testing, and reliable data pipelines
  • Responsibilities

  • Architect, design, develop, test and maintain high-performance, large-scale data architectures for data integration (batch and real-time), storage, processing, orchestration and infrastructure; ensure scalability, reliability, and performance (Databricks and Azure)
  • Contribute to detailed design, architectural discussions, and customer requirements
  • Participate in design, development, and testing of big data products
  • Construct and optimize Apache Spark jobs and clusters within Databricks
  • Migrate from Azure Synapse to Azure Data Lake or other technologies
  • Design schemas and data models to support modern analytics (descriptive to prescriptive)
  • Develop clear, maintainable code with automated testing
  • Collaborate with cross-functional teams to understand data requirements and deliver reusable components
  • Evaluate and implement new technologies to improve data integration, processing, storage and analysis
  • Design, implement and maintain data governance : cataloging, lineage, data quality
  • Monitor and optimize workloads and clusters for performance
  • Mentor junior team members and share best practices
  • Maintain documentation of solutions and configurations
  • Promote best practices in data engineering, governance, and quality
  • Ensure data quality and security
  • Be an active Agile team member and contribute to continuous improvement
  • Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.

    #J-18808-Ljbffr

    Create a job alert for this search

    Data Engineer • Islamabad, Islamabad Capital Territory, Pakistan