Themesoft Inc. logo

Azure Cloud Engineer AI

Themesoft Inc.

Toronto, Canada

Share this job:
$3 - $5 Posted: 19 hours ago

Job Description

<p><b>Themesoft Inc.</b> is a global IT solutions provider and a Woman Owned Minority Business Enterprise headquartered in Dallas, TX. With a strong presence across the US, Canada, India, Singapore, and Brazil, we specialize in digital transformation, consulting, and workforce solutions across diverse industries.</p><p><br></p><p><b>We are currently looking for a tech-savvy and results-driven professional for one of our leading clients.</b> If you're passionate about technology and looking to grow in a dynamic, fast-paced environment, this could be the perfect fit for you!</p><p><br></p><p><b>Role : Azure Cloud Engineer AI</b></p><p><b>Location : Toronto, Canada- Hybrid (3 days to office)</b></p><p><b>6+ months</b></p><p><br></p><p><br></p><p><b>Cloud Engineer - AI Infrastructure</b></p><p><b>Role Overview</b></p><p>As a Cloud Engineer, you will be responsible for implementing and maintaining scalable, secure, and high-performance cloud infrastructure to support AI/ML workloads. You'll work closely with platform, application, and data teams to ensure reliable operations and efficient delivery of AI services.</p><p><br></p><p><b>Key Responsibilities</b></p><p><b>Infrastructure & Platform Operations</b></p><ul><li>Deploy and manage cloud-native infrastructure for AI/ML workloads (GPU/CPU clusters, autoscaling, spot instances).</li><li>Configure and maintain networking components (Azure VNet, Private Link, peering, HA/DR setups).</li><li>Operate storage and database systems including Azure Data Lake Storage, relational databases, and vector databases (FAISS, Milvus, Pinecone).</li><li>Implement IAM policies, secrets management (Key Vault), and encryption standards.</li></ul><p><b>Observability & Reliability</b></p><ul><li>Set up monitoring for latency, throughput, GPU utilization, and cost metrics.</li><li>Integrate logging and tracing tools (OpenTelemetry) and maintain SLOs/SLIs for infrastructure services.</li><li>Support incident response and root cause analysis using SRE principles.</li></ul><p><b>CI/CD & Infrastructure Automation</b></p><ul><li>Build and maintain CI/CD pipelines using GitHub Actions or Azure DevOps.</li><li>Implement GitOps workflows for infrastructure-as-code using Terraform or Bicep.</li><li>Create reusable IaC modules and templates for consistent deployments.</li></ul><p><b>FinOps & Cost Optimization</b></p><ul><li>Monitor and optimize GPU usage, caching strategies, and inference performance.</li><li>Support cost governance and reporting for AI infrastructure.</li></ul><p><b>Application Enablement</b></p><ul><li>Provide infrastructure support for APIs, microservices, and event-driven architectures.</li><li>Enable model serving runtimes (TensorRT-LLM, vLLM, Triton/KServe).</li><li>Support RAG pipelines including embeddings, chunking, and retrieval systems.</li></ul><p><b>Security & Compliance</b></p><ul><li>Apply defense-in-depth strategies: IAM least privilege, private networking, image signing.</li><li>Ensure compliance with data residency, encryption, and audit requirements.</li></ul><p><br></p><p><b>Qualifications</b></p><ul><li>Bachelor's degree in Computer Science, Engineering, or related field.</li><li>3-5 years of experience in cloud infrastructure (Azure preferred).</li><li>Hands-on experience with Kubernetes, Terraform/Bicep, and cloud networking.</li><li>Familiarity with AI/ML infrastructure components and model serving.</li><li>Proficiency in Python for automation; Go or TypeScript is a plus.</li></ul><p><br></p><p><b>Tech Stack</b></p><ul><li><b>Cloud & Infra</b>: Azure (AKS, Functions, Event Hubs, Key Vault), Terraform/Bicep, GitHub Actions</li><li><b>AI Infra</b>: Kubernetes, KServe/Triton, vLLM, TensorRT-LLM</li><li><b>Ops</b>: Prometheus, Grafana, OpenTelemetry, ArgoCD</li><li><b>Data</b>: Feature stores (Feast), vector DBs (FAISS, Milvus), relational DBs</li><li><b>App Layer</b>: APIs, microservices, frontend/backend integration</li></ul><p><br></p><p><b>Success Metrics</b></p><ul><li><b>Reliability</b>: SLOs met, uptime maintained</li><li><b>Security</b>: No critical vulnerabilities, audit-ready infrastructure</li><li><b>Cost Efficiency</b>: Optimized GPU and infra spend</li><li><b>Velocity</b>: Fast and reliable deployments</li><li><b>Collaboration</b>: Effective cross-team support and documentation</li></ul><p><br></p><p><br></p><p><br></p><p>Regards,</p><p><br></p><p> _</p><p>Parthasarathy K</p><p>Lead Recruiter</p><p>Work: <b></b> Ext: 306,Direct: </p><p></p><p>Themesoft Inc Themesoft Jobs</p><p></p>
Back to Listings

Create Your Resume First

Give yourself the best chance of success. Create a professional, job-winning resume with AI before you apply.

It's fast, easy, and increases your chances of getting an interview!

Create Resume

Application Disclaimer

You are now leaving Tdotjobs.ca and being redirected to a third-party website to complete your application. We are not responsible for the content or privacy practices of this external site.

Important: Beware of job scams. Never provide your bank account details, credit card information, or any form of payment to a potential employer.