Platform Engineer
We are seeking an experienced engineer to support the development and delivery of Generative AI platform capabilities across hybrid infrastructure environments. This role focuses on building scalable AI/ML platforms and supporting model delivery across both on-premises infrastructure and cloud platforms including GCP Vertex AI and Azure ML.
The ideal candidate has strong experience in LLM, Generative AI development/operations, MLOps, Python, and large-scale data platforms, along with the ability to design resilient, high-performance infrastructure supporting AI and NLP workloads.
Key Responsibilities
-
Participate in the development and expansion of Generative AI platform capabilities supporting enterprise AI initiatives
-
Deliver and operationalize AI models to on-prem infrastructure and cloud platforms (GCP Vertex AI, Azure ML)
-
Participate in daily standups and Agile development cycles supporting platform capability development
-
Research industry best practices, evaluate emerging technologies, and define engineering standards and automation strategies to improve platform resiliency and reliability
-
Execute technology roadmaps aligned with business and engineering strategy
-
Perform hardware capacity planning, performance analysis, and forecasting to ensure scalability and high availability of AI/ML workloads
-
Support infrastructure designed for high-throughput, low-latency AI and NLP workloads
-
Serve as a technical subject matter expert and collaborate with engineering teams across the organization
Minimum Requirements
-
2+ years of experience with LLMs and Generative AI (development, operations, or platform engineering)
-
5+ years of Python development experience
-
5+ years of experience with big data technologies such as BigQuery or Hadoop
-
5+ years of Linux systems experience
-
3+ years of experience in AI/ML and MLOps environments
-
3+ years of PySpark experience
-
3+ years of VMware virtualization experience
-
Experience working with AutoML technologies such as H2O Driverless AI, DataRobot, Vertex AI, Elastic, and Vector databases
-
Experience designing and supporting grid computing environments with CPU and GPU resources for AI/ML and NLP workloads
-
Working knowledge of high-performance storage systems and object storage
-
Strong understanding of network infrastructure supporting high-throughput, low-latency compute environments
-
Excellent communication skills with the ability to present technical solutions to both technical and business audiences
-
Demonstrated ability to influence technical and business decisions
Preferred Skills
-
Experience developing APIs on GCP, Azure, or API Gateway platforms
-
Experience with Elasticsearch, Vector databases, or AI model development
-
Experience with enterprise data processing platforms such as AbInitio, Informatica, or IBM DataStage
-
Experience working with large data ecosystems including Hadoop, Teradata, and Elasticsearch
-
Familiarity with Agile development methodologies and working within Agile teams
-
Experience implementing load balancing technologies such as F5
-
Experience designing high-resiliency cloud or grid computing environments supporting AI/ML workloads
-
Knowledge of cloud computing, PaaS architectures, microservices, and containerized environments
