Senior Backend Engineer - Machine Learning Platform (R&D, Ctr/Vtr Predictor) - Ego Team
Posted on Feb. 9, 2026 by Shopee
- Singapore, Singapore
- N/A
- nan
The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.
The platform covers the full lifecycle of Deep Learning, including sample generation, feature engineering, model training, deployment, online inference, and closed-loop monitoring. We have developed a robust training/inference acceleration framework, complemented by a Web UI and RESTful APIs, aiming to achieve a truly end-to-end, automated, and intelligent machine learning ecosystem.
- Responsible for the R&D and optimization of online inference services for deep learning models in large-scale sparse feature scenarios, supporting high-efficiency inference needs across Shopee’s various business lines.
- Conduct in-depth research into various inference acceleration algorithms to reduce the computational cost of model deployment.
- Collaborate across the business pipeline to tune the end-to-end online service system, ensuring high availability and stability.
- Research and implement efficient inference solutions that combine Large Language Models (LLMs) with Search, Ads, and Recommendation (GR).
- Bachelor’s degree or above in Computer Science, Electronics, Automation, Software Engineering, or related fields, with at least 2 years of relevant work experience.
- Expertise in C++ programming with a solid foundation in low-level systems; proficient in multi-threading, lock optimization, memory pools, thread pools, template programming, GDB debugging, performance profiling, and RPC frameworks.
- Experience in online inference/serving; has developed proprietary inference engines or is highly familiar with engines such as TensorFlow + XLA, TensorRT, Triton, vLLM, or TensorRT-LLM.
- Deep practical experience in GPU optimization, including operator fusion, graph optimization, CUDA programming, kernel scheduling, Warp execution models, memory access optimization, and VRAM scheduling.
- Preferred: Candidates who have researched or implemented GR (Generative Recommendation) solutions such as HSTU, HLLM, or OneRec.
- High passion for computer technology, proactive learning mindset, and a spirit for deep technical dive; maintains high standards for code quality and demonstrates a rigorous, detail-oriented work style.
- Strong team player with excellent continuous learning capabilities.
Advertised until:
March 11, 2026
Are you Qualified for this Role?
Click Here to Tailor Your Resume to Match this Job
Share with Friends!
Similar Internships
No similar Intern Jobs at the Moment!