Skip to main content
Full-Time
On-Site

Machine Learning Platform Engineer

View on Map

Description

RBC Borealis is seeking an experienced Machine Learning Platform Engineer to design and implement machine learning infrastructure and automation tools (MLOps and DevOps). This role involves deploying and operating GenAI platforms on Kubernetes/OpenShift, managing large language model deployments on GPU infrastructure, monitoring performance, implementing observability stacks, and building scalable on-premise systems for ML.

What We're Looking For

Deploying and operating the GenAI platform across OpenShift/Kubernetes.,Managing large language model deployments (Cohere Command, Llama, Mistral) on GPU infrastructure (NVIDIA A100/H100).,Configuring RAG pipelines with serving frameworks like vLLM, NVIDIA NIM, and TensorRT-LLM.,Monitoring GPU utilization, model performance metrics, and resource allocation.,Implementing observability stacks (Prometheus, Grafana, Pushgateway, structured logging pipelines) for platform health and security.,Designing and implementing best practices and standards for data and machine learning pipelines.,Supporting platform users and cross-functional teams through infrastructure design guidance, documentation, and collaboration.,Building highly scalable, resilient on-premise systems for hosting machine learning systems.,Strong experience designing and operating distributed/ML systems.,Deep Kubernetes/OpenShift knowledge (Helm, operators, custom resources, RBAC, troubleshooting).,Proven history building DevOps/CI/CD pipelines (GitHub Actions), multi-stage Docker images, registry mirroring, and infrastructure automation in restricted enterprise environments.,In-depth knowledge of various stages of the machine learning application deployment process.,Proficiency with programming languages such as Python, Bash, or Rust.,Solid grasp of software engineering best practices (testing, coding standards, code reviews, source control, production monitoring, alerting).,Hands-on experience building and deploying hybrid environments on-premises enterprise environments.,Familiarity with Large Language Model (LLM) inference and serving (e.g., VLLM).

Ideal Candidate

Strong experience designing and operating distributed/ML systems.,Deep Kubernetes/OpenShift knowledge (Helm, operators, custom resources, RBAC, troubleshooting).,Proven history building DevOps/CI/CD pipelines (GitHub Actions), multi-stage Docker images, registry mirroring, and infrastructure automation in restricted enterprise environments.,In-depth knowledge of various stages of the machine learning application deployment process.,Proficiency with programming languages such as Python, Bash, or Rust.,Solid grasp of software engineering best practices (testing, coding standards, code reviews, source control, production monitoring, alerting).,Hands-on experience building and deploying hybrid environments on-premises enterprise environments.,Familiarity with Large Language Model (LLM) inference and serving (e.g., VLLM).

Hard Skills

Big Data Management
Data Mining
Data Science
Deep Learning
DevOps
Machine Learning (ML)
Machine Learning Operations
Programming Languages
Kubernetes
OpenShift
Helm
Docker
GenAI Platform
GPU Infrastructure
NVIDIA A100/H100
RAG pipelines
vLLM
NVIDIA NIM
TensorRT-LLM
Prometheus
Grafana
Pushgateway
Python
Bash
Rust
LLM inference and serving

Soft Skills

Collaboration
Problem-solving (implied by troubleshooting and building resilient systems)
Communication (implied by supporting users and documentation)
Attention to detail (coding standards, monitoring, alerting)

Work Hours

37.5 hours/week

Benefits

Total Rewards Program (bonuses, flexible benefits, competitive compensation, commissions, stock options)
Leaders supporting development through coaching and managing opportunities
Opportunity to make a difference and lasting impact

About the Company

R

Royal Bank of Canada

Royal Bank of Canada is a global financial institution with a purpose-driven, principles-led approach to delivering leading performance. As Canada's largest bank, it provides personal and commercial banking, wealth management, and capital markets services to over 17 million clients worldwide.

Purpose-driven
Inclusive
Innovative
Collaborative
Professional
View all jobs at Royal Bank of Canada

    We respect your privacy

    BerryMap uses cookies to provide essential features, analyze usage, and improve your experience. You can customize your preferences below.