Location
singapore
Job Type
Full-time
Posted
June 11, 2026
Job Description
Job Description
We are looking for a highly skilled engineer to build, optimize, and maintain high-performance inference services for large language models (LLMs) and multimodal models. You will work closely with algorithm, systems, and product teams to deliver best-in-class performance, stability, and efficiency in production environments-ensuring low-latency, highly available AI services for tens of millions of users.
Key Responsibilities
High-Performance Computing & Kernel Optimization
- Perform deep GPU/CUDA kernel optimization, including memory access pattern tuning, instruction-level parallelism, and warp-level optimization to fully utilize hardware capabilities
- Develop and optimize custom high-performance operators using advanced DSLs or compiler frameworks such as Triton and TVM
- Identify and resolve performance bottlenecks in scenarios such as operator fusion and quantization
Ready to Apply?
Submit your application for AI Inference Engineer at hpc ai technology pte. ltd.
Apply Now