ML Performance Engineer (Full Time) – Transformer Architecture Optimization

Group 1160449358(2).png

Company Description

Apex Compute is a trailblazer in redefining AI compute architectures. Our mission is to push the boundaries of machine learning performance by developing innovative hardware and software solutions. We pride ourselves on fostering a culture of innovation, collaboration, and excellence, and offer company stock options so that every team member shares in our long-term success.

Role Description

We are seeking a highly skilled ML Performance Engineer to join our dynamic team full-time. This hybrid role is based in Mountain View, CA, with some flexibility for remote work. In this role, you will focus on optimizing transformer architectures and other advanced ML models to achieve breakthrough performance improvements. You will leverage your expertise in C/C++ and Python, combined with a deep understanding of compiler technologies, memory scheduling, and numeric operations. Experience with MLIR, StableHLO, and the llama.cpp repository is highly desirable. Your contributions will directly impact the efficiency and scalability of our AI solutions.

Responsibilities

Transformer Architecture Optimization:
- Analyze and optimize transformer-based models to improve inference and training performance.
- Develop and implement novel techniques that enhance throughput and reduce latency in ML workloads.
Low-Level Performance Engineering:
- Write and optimize performance-critical code in C/C++ and Python.
- Design, implement, and refine algorithms that leverage advanced vector and matrix operations for efficient numerical computation.
- Collaborate with hardware teams to align software optimizations with underlying architectural features.
Compiler and IR Enhancements:
- Utilize your understanding of compiler infrastructures, particularly MLIR and StableHLO, to optimize and transform ML code.
- Develop custom passes or modifications to existing compiler flows to maximize performance benefits for ML workloads.
Memory Scheduling and Quantization:
- Engineer efficient memory placement and scheduling strategies to minimize bottlenecks and improve data throughput.
- Implement and refine quantization techniques to reduce model size and computational overhead without sacrificing accuracy.
Open Source and Community Engagement:
- Contribute to and leverage open-source projects (e.g., llama.cpp) to enhance performance and stability.
- Collaborate with the broader ML community to stay abreast of emerging trends and tools.
Performance Profiling and Debugging:
- Use profiling and diagnostic tools to identify performance issues and iterate on solutions.
- Develop benchmarks and tests to measure the impact of your optimizations across various hardware platforms.

Qualifications

Educational Background:
- Bachelor’s or Master’s degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
Technical Expertise:
- Strong proficiency in C/C++ and Python for performance-critical development.
- Solid experience with transformer architectures and deep learning model optimizations.
- Practical understanding of compiler technologies, specifically MLIR and StableHLO.
- Experience in memory placement, scheduling strategies, and performance tuning for compute-intensive applications.
- Familiarity with the llama.cpp repository or similar projects is a significant plus.
- Deep knowledge of quantization techniques, numerical operations, and computer architecture principles, including vector and matrix operations.
Analytical & Problem-Solving Skills:
- Proven track record in debugging complex systems and applying innovative solutions to optimize performance.
- Ability to translate high-level research into optimized, production-ready code.
Communication & Collaboration:
- Excellent written and verbal communication skills.
- Ability to work effectively in a collaborative, fast-paced startup environment.

Why Join Us?

You’ll have the opportunity to work on revolutionary AI hardware alongside a talented, passionate, and ambitious team. This role provides a chance to grow your technical skills, gain valuable hands-on experience, and contribute to groundbreaking innovations in AI compute hardware. Additionally, as part of our team, you’ll be eligible for company stock options, allowing you to share in our long-term success.

If you think you are a good fit, please send your resume to [email protected].