Overview
We are seeking a Data Scientist with hands-on Python experience and proven abilities to support software activities in an Agile software development lifecycle. We are seeking a well-rounded developer to lead a cloud-based big data application using a variety of technologies. The ideal candidate will possess strong technical, analytical, and interpersonal skills. In addition, the candidate will lead developers on the team to achieve architecture and design objectives as agreed with stakeholders. Responsibilities
Work with developers on the team to meet product deliverables Work independently and collaboratively on a multi-disciplined project team in an Agile development environment Contribute detailed design and architectural discussions as well as customer requirements sessions to support the implementation of code and procedures for our big data product Design and develop clear and maintainable code with automated open-source test functions Identify areas of code / design optimization and implementation Learn and integrate with a variety of systems, APIs, and platforms Interact with a multi-disciplined team to clarify, analyze, and assess requirements Be actively involved in design, development, and testing activities in big data applications Key Responsibilities
Data Engineering & Processing : Develop scalable data pipelines using PySpark for processing large datasets Work extensively in Databricks for collaborative data science workflows and model deployment Handle messy, unstructured, and semi-structured data, performing thorough Exploratory Data Analysis (EDA) Apply appropriate statistical measures and hypothesis testing to derive insights and validate assumptions Data Analysis & Modeling
Write complex SQL queries for data extraction, transformation, and analysis Build and validate predictive models using techniques such as GBMs (XGBoost, LightGBM) and GLMs (logistic / Poisson) Apply unsupervised learning techniques like clustering (K-Means, DBSCAN), PCA, and anomaly detection Automation & Optimization
Automate data workflows and model training pipelines using scheduling tools (e.g., Airflow, Databricks Jobs) Optimize model performance and data processing efficiency Cloud & Deployment
Basic experience with Azure or other cloud platforms (AWS, GCP) for data storage, compute, and model deployment Familiarity with cloud-native tools like Azure Data Lake, Azure ML, or equivalent Required Skills
Programming Languages : Python (with PySpark), SQL Tools & Platforms : Databricks, Azure (or other cloud platforms), Git Libraries & Frameworks : scikit-learn, pandas, numpy, matplotlib / seaborn, XGBoost / LightGBM Statistical Knowledge : Hypothesis testing, confidence intervals, correlation analysis Machine Learning : Supervised and Unsupervised learning, model evaluation metrics Data Handling : EDA, feature engineering, dealing with missing / outlier data Automation : Experience with job scheduling and pipeline automation Required Experience
Minimum 5+ years in Data Science or related fields Hands on experience with Databricks Experience with data cleansing, transformation, and validation Proven technical leadership on prior development projects Hands-on experience with versioning tools such as GitHub, Azure Devops, Bitbucket, etc Hands-on experience building pipelines in GitHub (or Azure Devops, etc.) Hands-on experience using Relational Databases, such as Oracle, SQL Server, MySQL, Postgres or similar Experience using Markdown to document code in repositories or automated documentation tools like PyDoc Strong written and verbal communication skills Preferred Qualifications
Experience with data visualization tools such as Power BI or Tableau Experience with MLOps, DEVOPS CI / CD tools and automation processes (e.g., Azure DevOPS, GitHub, BitBucket) Containers and their environments (Docker, Podman, Docker-Compose, Kubernetes, Minikube, Kind, etc.) Experience working in cross-functional teams and communicating insights to stakeholders Education
Master of Science / B. Tech degree from an accredited university Equal Opportunity
Fusemachines is an Equal Opportunities Employer, committed to diversity and inclusion. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or any other characteristic protected by applicable federal, state, or local laws. Job Details
Seniority level : Mid-Senior level Employment type : Contract Job function : Engineering and Information Technology Industry : Internet Publishing
#J-18808-Ljbffr
Data Scientist • Islamabad, Pakistan