Apply

Embedded AI Runtime/Kernel Developer

Austin, TX • Artificial Intelligence

Apply

Job Type

Full-time

Description

Company Overview

Ambiq's mission is to enable intelligence everywhere by delivering the lowest power semiconductor solutions. Ambiq is a pioneer and a leading provider of ultra-low-power semiconductor solutions based on our proprietary and patented sub- and near-threshold technologies. With increased power requirements of artificial intelligence (AI) computing, our customers increasingly rely on our solutions to deliver AI to edge environments. Our hardware and software innovations fundamentally deliver a multi-fold improvement in power consumption over traditional semiconductor designs without expensive process geometry scaling. We started in 2010 addressing the needs of battery-powered devices at the edge, where power consumption challenges were most profound. As of the beginning of 2025, we’ve shipped more than 260 million units worldwide.

Our innovative and fast-moving teams of design, research, development, production, marketing, sales, and operations are spread across several continents, including the US (Austin and San Jose), Taiwan (Hsinchu), China (Shenzhen and Shanghai), Japan (Tokyo), and Singapore. We value relentless technology innovation, a deep commitment to customer success, collaborative problem-solving, and an enthusiastic pursuit of energy efficiency. We embrace candidates who also share these same values. The successful candidate must be self-motivated, creative, and comfortable learning and driving exciting new technologies. We encourage and nurture an environment for growth and opportunities to work on complex, meaningful, and challenging projects that will create a lasting impact and shape the future of technology. Join us on our quest for 100 billion devices. The edge intelligence revolution starts here.

This role is in Austin, Texas, or San Diego, CA.

Scope

At Ambiq, the AI Technology Group enables state-of-the-art ML and DL model development across our hardware portfolio, using sophisticated model compression and acceleration techniques to deploy previously impractical AI tasks to battery-powered environments. Our team identifies neural architectures best suited to our customer’s needs, selects those models most amenable to deployment on our platform, trains them carefully, tuning for memory, compute, and energy constraint tradeoffs, and deploys them using AI runtimes optimized for our hardware platform. Finally, we publish and socialize our findings via conferences, workshops, and publications.

Beyond a healthy obsession with computational efficiency, the successful candidate will be comfortable operating in a ‘version zero’ environment, marshaling internal, open-source, and third-party resources to solve our customers' problems quickly and elegantly.

Specific Responsibilities

Optimize embedded AI runtimes such as Tensorflow Lite for Microcontrollers to utilize Ambiq’s hardware products efficiently.
Develop advanced inference performance profiling tools to help customers identify optimization targets and solutions.
Develop novel, ahead-of-time AI model inference compilers to achieve better power, latency, and memory performance incorporating state-of-the-art pruning and quantization techniques.
Develop training-side tools and libraries to help AI developers identify neural architectures that optimally run on Ambiq’s platforms.
Publish and maintain these tools, including documentation and other assets our customers need to bootstrap their internal AI features.
Socialize their achievements via conferences, meetups, workshops, and publications.

Requirements

Education

A bachelor’s degree in computer science or a related field requires at least 2 years of relevant experience. A master’s degree or PhD in related topics is highly desirable

Required Skills/Abilities

Experience writing CPU kernels leveraging vector accelerators such as Arm Helium, Arm Neon, or Intel AVX. Past work with CUDA, OpenCL, or other low-level kernel development environments is a plus.
Experience with AI model performance profiling.
Experience with embedded C or C++
Experience with Keras and Tensorflow (TFLite, TFLite for Microcontrollers).

Bonus Qualifications

Experience with compiler development
Experience with developing for embedded NPUs
Past TinyML/EdgeAI involvement or experience
Experience developing and optimizing for TFLite for Microcontrollers
Experience with model-to-binary compilers (IREE, MicroTVM, etc)
Experience with ONNX, TOSA, Jax, LLVM, and/or MLIR
Experience with optimizing for heterogeneous AI compute (e.g., CPU+NPU+DSP)

Apply

View All Jobs