We’re hiring a hands-on AI Learning Engineer who can build and fine-tune generative AI (diffusion & LLMs ), vision-language models (VLMs), classical & deep models from scratch, and productionize them end-to-end.
This role blends modeling (you’ll train and fine-tune models) with production systems (MLOps, LLMops, model optimization, serving, and API / backends).
You will not only use pre-trained models, you will design, train, optimize, and serve custom models for production use (GenAI, Stable Diffusion, OCR, theft detection, recommenders, etc.).
Requirements
Develop production inference stacks : convert & optimize models (Torch → ONNX → TensorRT when appropriate), quantize / prune, profile FLOPs and latency, and deliver low-latency GPU inference with minimal accuracy loss.
- Create robust model serving infrastructure : FastAPI / gRPC services for inference, streaming outputs (token-level streaming for LLMs, frame / segment streaming for CV), model versioning and routing, autoscaling, model rollback and A / B testing.
 - Build CV solutions from scratch : object detection, theft / theft-detection pipelines, OCR (document parsing, structured extraction), surveillance analytics, and integrate + finetune Hugging Face pretrained models when beneficial.
 - Fine-tune Stable Diffusion and other generative image models for brand / style-consistent image generation and downstream tasks.
 - Train and fine-tune VLMs (vision-language models) for multimodal tasks (captioning, visual QA, multimodal retrieval), using both from-scratch training and transfer learning from HF checkpoints.
 - Design, train & fine-tune GenAI models (LLMs) for use cases such as conversational agents, summarization, retrieval-augmented generation (RAG), and domain adaptation.
 - MLOps / LLMops / AIOps : CI / CD for training & deployment, dataset versioning, experiments tracking, model registry, monitoring (latency, throughput, model drift, data drift), alerting and automated retraining pipelines.
 - Data acquisition & pipeline work : build scrapers / collectors and scalable ingestion pipelines; implement proxy pools, rate limit handling, and rotation for reliability (with compliance & respect for target site terms).
 - Third-party model integration : call and compose third-party inference APIs (Hugging Face, OpenAI, other vendors), build fallback & hybrid inference strategies that combine local and cloud models.
 
Required qualifications :
Strong experience with computer vision : object detection, segmentation, OCR pipelines (training from scratch and transfer learning).Deep knowledge of model optimization : quantization, pruning, distillation, FLOPs analysis, CUDA profiling, mixed precision (AMP), and inference time tradeoffs.Demonstrated ability to design & implement models from scratch (not only using pretrained checkpoints) : architecture design, loss selection, training loops, evaluation metrics.Practical experience training and fine-tuning LLMs (transformers) and generative image models (Stable Diffusion or diffusion frameworks).Experience exporting & running models with ONNX, TensorRT, TorchScript, and familiarity with Triton, TorchServe, or ONNX Runtime for production serving.Hands-on with GPU infrastructure and CUDA (profiling with nvprof / nsight, memory management, multi-GPU training).Solid backend engineering skills : Python, FastAPI (or Flask), asynchronous programming, WebSockets / SSE, REST design.Containerization and orchestration : Docker, Kubernetes, Helm, and experience deploying GPU workloads to AWS / GCP / Azure or on-prem.Good understanding of classical ML (scikit-learn) : regression, classification, clustering; able to design experiments and baselines.Strong software engineering practices : unit tests, CI / CD, code reviews, reproducibility.Excellent communication skills, able to explain ML tradeoffs to product and frontend teams.Preferred / Nice-to-have :
Knowledge of privacy-preserving ML (DP, federated learning) or regulatory constraints for data handling.Experience with logging & observability : Prometheus, Grafana, Sentry, OpenTelemetry.