About Fusemachines
Fusemachines is a 10+ year old AI company, dedicated to delivering state-of-the-art AI products and solutions to a diverse range of industries. Founded by Sameer Maskey, Ph.D., an Adjunct Associate Professor at Columbia University, our company is on a steadfast mission to democratize AI and harness the power of global AI talent from underserved communities. We have a robust presence in four countries and a dedicated team of over 400 full-time employees, committed to fostering AI transformation journeys for businesses worldwide.
Location : Remote (Full-time)
About The Role This is a remote, contract position responsible for designing, building, and maintaining the infrastructure required for data integration, storage, processing, and analytics (BI, visualization and Advanced Analytics).
We are looking for a skilled Senior Data Engineer with a strong background in Python, SQL, PySpark, Azure, Databricks, Synapse, Azure Data Lake, DevOps and cloud-based large-scale data applications with a passion for data quality, performance and cost optimization. The ideal candidate will develop in an Agile environment, contributing to the architecture, design, and implementation of Data products in the Aviation Industry, including migration from Synapse to Azure Data Lake. This role involves hands-on coding, mentoring junior staff and collaboration with multi-disciplined teams to achieve project objectives.
Qualification & Experience
Must have a full-time Bachelor's degree in Computer Science or similar
At least 5 years of experience as a data engineer with strong expertise in Databricks, Azure, DevOps, or other hyperscalers
5+ years of experience with Azure DevOps, GitHub
Proven experience delivering large scale projects and products for Data and Analytics as a data engineer, including migrations
Following certifications :
Databricks Certified Associate Developer for Apache Spark
Databricks Certified Data Engineer Associate
Microsoft Certified : Azure Fundamentals
Microsoft Certified : Azure Data Engineer Associate
Microsoft Exam : Designing and Implementing Microsoft DevOps Solutions (nice to have)
Required Skills / Competencies
Strong programming skills in Python (required), Scala, and writing efficient and optimized code for data integration, migration, storage, processing and manipulation
Strong understanding and experience with SQL and writing advanced SQL queries
Thorough understanding of big data principles, techniques, and best practices
Experience with scalable and distributed data processing technologies such as Spark / PySpark (Azure Databricks), DBT and Kafka
Databricks development experience with Python, PySpark, Spark SQL, Pandas, NumPy in Azure
Designing and implementing efficient ELT / ETL processes in Azure and Databricks; develop custom integration solutions as needed
Data integration from APIs, databases, flat files, event streaming
Data cleansing, transformation, and validation
Relational Databases (Oracle, SQL Server, MySQL, Postgres) and NonSQL Databases (MongoDB or similar)
Data modeling and database design principles; design efficient schemas for data architecture
Data warehousing, data lake and data lake house solutions in Azure and Databricks
Delta Lake, Unity Catalog, Delta Sharing, Delta Live Tables (DLT)
SDLC knowledge, Agile methodologies
Experience with SDLC tools : Azure DevOps, GitHub; project management (Jira / Azure Boards), source control, CI / CD (GitHub Actions, Azure Pipelines), artifact management
DevOps principles : CI / CD, IaC (Terraform, ARM), configuration management, automated testing, performance tuning, cost optimization
Cloud computing in Microsoft Azure for data and analytics : ADF, Databricks, Synapse, Data Lake, Data Lake Storage, SQL Database, etc.
Orchestration with Databricks workflows and Apache Airflow
Data structures and algorithms; strong software engineering practices
Migrating from Azure Synapse to Azure Data Lake or other technologies
Analytical skills to identify and address issues, bottlenecks, and failures
Debugging and troubleshooting in complex data environments
Data quality and governance, including data quality checks and monitoring
BI solutions (Power BI) is a plus
Strong communication skills for cross-functional collaboration
Documentation of processes and deployment configurations
Security practices : network security groups, Azure Active Directory, encryption; compliance
Ability to implement security controls in data / analytics solutions
Mentoring and coaching of team members; willingness to stay updated with trends
Ability to work independently in a rapidly changing environment
Focus on architecture, observability, testing, and reliable data pipelines
Responsibilities
Architect, design, develop, test and maintain high-performance, large-scale data architectures for data integration (batch and real-time), storage, processing, orchestration and infrastructure; ensure scalability, reliability, and performance (Databricks and Azure)
Contribute to detailed design, architectural discussions, and customer requirements
Participate in design, development, and testing of big data products
Construct and optimize Apache Spark jobs and clusters within Databricks
Migrate from Azure Synapse to Azure Data Lake or other technologies
Design schemas and data models to support modern analytics (descriptive to prescriptive)
Develop clear, maintainable code with automated testing
Collaborate with cross-functional teams to understand data requirements and deliver reusable components
Evaluate and implement new technologies to improve data integration, processing, storage and analysis
Design, implement and maintain data governance : cataloging, lineage, data quality
Monitor and optimize workloads and clusters for performance
Mentor junior team members and share best practices
Maintain documentation of solutions and configurations
Promote best practices in data engineering, governance, and quality
Ensure data quality and security
Be an active Agile team member and contribute to continuous improvement
Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws.
#J-18808-Ljbffr
Data Engineer • Rawalpindi Cantonment, Pakistan