The wireless audio industry is undergoing an essential upgrade. The demands for consumer and commercial audio equipment have shifted from basic audio playback to five core capabilities: high-fidelity sound quality, millisecond-level low latency, precise multi-device synchronization, real-time two-way voice interaction, and full compatibility across legacy and new Bluetooth ecosystems.
Single-BLE chips can only support cutting-edge LE Audio products but fail to match massive stock mobile phones, laptops and vehicle-mounted multimedia devices. Pure Classic Bluetooth chips lack advanced features required for TWS earbuds, aparelhos auditivos, and Auracast public broadcasting. Enquanto isso, legacy chips rely on software sample rate conversion and limited single-core computing power. When running acoustic algorithms such as acoustic echo cancellation (AEC) and noise suppression (E), or decoding AAC/LC3 audio streams, they suffer excessive CPU and memory usage, leading to stuttering, audio distortion, desynchronized playback, and shortened battery life. These engineering drawbacks extend R&D cycles and raise hardware BOM costs significantly.
Espressif’s new dual-core RISC-V wireless SoC ESP32-S31 perfectly addresses the above industry pain points. It features enhanced dual-mode Bluetooth audio links, a built-in hardware ASRC sample rate converter, and a 320MHz high-performance vector acceleration core. A single chip supports both Classic BR/EDR Bluetooth and BLE 5.4 O Áudio, enabling full-range audio terminal development on one unified hardware platform. Official standardized Bluetooth audio demos fully verify end-to-end capabilities: Classic Bluetooth A2DP high-fidelity music playback, HFP high-definition hands-free calling, bidirectional media control via AVRCP; as well as LE Audio CIS unicast, BIS broadcast distribution, and multi-device millisecond-synchronized playback. It comprehensively solves manufacturers’ core challenges including multi-product line development, cross-protocol compatibility, and insufficient computing power for acoustic algorithms.
EU. Core Audio Technical Highlights of ESP32-S31
1. Fully Compatible Dual-Mode Bluetooth Audio Architecture Connecting Legacy & Next-Gen Audio Ecosystems
The previous ESP32-S3 only supports single BLE, creating compatibility barriers with traditional devices. In contrast, ESP32-S31 integrates complete Classic Bluetooth BR/EDR + BLE 5.4 dual-mode Bluetooth, enabling seamless interoperability between old and new audio ecosystems while greatly boosting parallel computing performance for local audio encoding and decoding.
(1) Classic Bluetooth BR/EDR: Full Coverage of Legacy Devices, Ideal for Automotive & Commercial Use
It natively supports complete protocols including A2DP stereo audio streaming, bidirectional AVRCP media control, and HFP 1.7 high-definition calling. It can directly connect to billions of existing smartphones, notebooks, vehicle-mounted central controls, traditional Bluetooth speakers and desktop conference terminals without external auxiliary Bluetooth chips, simplifying PCB design and cutting BOM costs.
Equipped with a 320MHz dual-core RISC-V processor, each core integrates a 128-bit wide data path and SIMD vector parallel instruction set to hardware-accelerate mainstream SBC and AAC audio decoders. CPU utilization during decoding remains single-digit, leaving abundant computing resources to run parallel acoustic algorithms such as AEC, NS and voice enhancement, delivering remarkably clearer call audio than competing chips of equivalent specifications.
(2) BLE 5.4 O Áudio: Core Carrier for Next-Generation Wireless Audio Standards
It fully implements the complete LE Audio standard suite, supporting CIS point-to-point unicast audio and BIS broadcast audio distribution, catering to cutting-edge products including TWS stereo earbuds, smart hearing aids, exhibition tour guide systems, multi-room synchronized speakers and public TV audio receivers.
It adopts the new-generation high-efficiency LC3 codec, which delivers richer, finer audio details at half the bitrate of traditional SBC. The low-bitrate characteristic drastically reduces wireless power consumption and extends device battery life. The lightweight LC3 encoding frees up substantial CPU resources, allowing the chip to load multi-channel noise reduction and microphone array algorithms simultaneously, perfectly meeting stringent voice clarity requirements for hearing aids and open-fit earbuds.
2. Unified Standardized esp_bt_audio Software Component Shortens R&D Cycles Dramatically
Espressif independently developed the esp_bt_audio unified Bluetooth audio abstraction component, eliminating underlying protocol differences between Classic Bluetooth and LE Audio. Developers no longer need to debug two separate sets of Bluetooth audio drivers. The component uniformly manages Bluetooth master-slave roles, audio data streams, track switching control, volume linkage, call activation/deactivation, device connection and disconnection across all scenarios. Standardized event callback interfaces directly connect upper-layer UI and business logic, reducing over 70% of low-level protocol debugging workload. Even manufacturers without professional Bluetooth protocol stack teams can quickly mass-produce audio products.
3. Exclusive Hardware ASRC Sample Rate Converter Outperforms Software SRC by a Wide Margin
Audio development frequently encounters mismatched sampling rates across different audio sources: 44.1kHz from mobile phones, 48kHz local audio, 16kHz microphones, and 24kHz LC3 LE Audio streams switch frequently. Traditional solutions depend on software SRC conversion, consuming massive CPU and memory resources and easily triggering audio stuttering, dropouts and synchronization drift.
ESP32-S31 integrates a dedicated hardware ASRC sample rate conversion unit paired with DMA hardware data transfer channels, completely offloading CPU from audio sampling conversion tasks. Official test data across hundreds of working conditions demonstrates groundbreaking performance advantages:
| Performance Metric | ESP32-S31 with Hardware ASRC | ESP32-S3 with Software SRC | Performance Improvement |
|---|---|---|---|
| Average CPU Load | 0.79% | 2.35% | Only 33% of ESP32-S3 |
| Peak CPU Load | 1.86% | 14.17% | Peak only 13% of ESP32-S3 |
| Average Memory Consumption | 539 Byte | 19400 Byte | Only 2.8% of ESP32-S3 |
| Peak Memory Consumption | 672 Byte | 87000 Byte | Negligible memory overhead |
In 100 groups of mixed sampling rate and concurrent multi-audio-stream tests, the peak CPU load of ESP32-S31 stays below 2% with peak memory usage under 1KB. The freed CPU and memory resources can be allocated to high-load tasks including screen rendering, AI voice wake-up, multi-channel noise reduction and timing calibration for multi-device synchronization, delivering more stable operation and lower overall power consumption without auxiliary external MCUs.
4. Dual Independent I2S Hardware Channels Ensure Hardware-Level Precise Audio Timing Alignment
The chip integrates two standalone I2S audio controllers supporting hardware-based Bluetooth audio clock synchronization without software timing compensation. It fundamentally resolves industry-wide pain points such as unsynchronized multi-device playback, audio-video latency and TWS left-right channel offset, delivering millisecond-level synchronization accuracy with zero cumulative timing drift after long-hour continuous playback, greatly improving the audio experience of multi-room speakers, TWS earbuds and vehicle-mounted multimedia systems.
II. Two Core Application Scenarios
Scenario 1: Screen-Enabled Classic Bluetooth Multimedia Control Center
(In-vehicle infotainment / desktop conference terminal / large-screen Bluetooth speaker)
Based on the ESP32-S31, an integrated multimedia interactive terminal is built, combining dual-mode Bluetooth audio, HD calling, and screen UI graphical rendering capabilities. With a single chip, it enables end-to-end interaction including playback, music control, incoming calls, and multimedia information visualization. It is an optimal single-chip solution for in-car infotainment systems, commercial desktop conferencing devices, and home Bluetooth speakers with screens.
A2DP lossless stereo reception and playback
As an A2DP audio receiver, the chip wirelessly receives music streams pushed from smartphones or computers, and outputs audio to speakers via an external I2S audio codec. Leveraging SIMD hardware-accelerated AAC decoding, it delivers distortion-free and stutter-free high-dynamic-range music playback, enabling long-term continuous playback under low CPU load.

Multimedia information visualization display
Supports synchronized reception of full media metadata from mobile devices, including song title, artist, album artwork, and scrolling lyrics. Combined with the chip’s built-in 2D graphics accelerator, it enables smooth rendering of multimedia UI interfaces, delivering a visual experience close to consumer-grade smart terminals and addressing the limitation of traditional Bluetooth devices that only support audio playback without visual information.

Bidirectional AVRCP media control
Breaks the one-way limitation of traditional Bluetooth playback by enabling bidirectional synchronization between phone and device. Touch controls on the device screen can directly trigger previous track / play-pause / next track, with commands instantly sent back to the mobile music app. Playback status, progress, and volume are synchronized in real time in both directions. When the phone adjusts media volume, the chip automatically synchronizes local output gain, ensuring a seamless interactive experience.
HFP HD hands-free calling + local dialing
When a call is received or initiated on the phone, the audio channel automatically switches. Downlink voice is played through speakers, while multi-MEMS microphones capture uplink voice. With remaining processing power, the chip runs hardware-accelerated AEC echo cancellation and noise suppression in real time, eliminating speaker echo and filtering environmental wind and road noise in hands-free mode.

It also supports a digital dial-pad UI on the device side, allowing users to place calls directly from the control screen without using the phone. The chip independently manages the Bluetooth call link, making it suitable for in-vehicle Bluetooth calling systems and front-desk conferencing intercom terminals.
Scenario 2: LE Audio Multi-Device Synchronized Audio Distribution
(TWS earbuds / aparelhos auditivos / commercial broadcasting systems)
The core value of LE Audio lies in low power consumption, high audio quality, and scalable multi-device audio distribution. O ESP32-S31, with BLE 5.4 full protocol stack, hardware time synchronization, and LC3 codec acceleration, covers both consumer wearable and commercial broadcasting markets.
CIS unicast audio: core solution for TWS earbuds and smart hearing aids
Based on CIS point-to-point audio links for low-latency stereo transmission, each earbud in a TWS pair is equipped with an ESP32-S31 chip. Leveraging LE Audio standard timing mechanisms and dual I2S hardware synchronization, left and right audio channels are perfectly aligned, eliminating latency between ears and ensuring lip-sync accuracy.
For smart hearing aid applications, the low-bitrate and high-efficiency LC3 codec significantly reduces power consumption. The chip’s spare computing power enables multi-channel audio pickup noise reduction and ambient sound transparency algorithms, balancing clarity for hearing assistance with all-day battery life.

BIS Auracast broadcast audio: one-to-many commercial shared audio
Supports LE Audio BIS broadcast distribution, allowing a single audio source to broadcast synchronized audio streams to dozens of ESP32-S31 terminals. Receiving devices do not require complex Bluetooth pairing and can join the broadcast with a single action. This is suitable for exhibition guided tours, simultaneous interpretation at events, silent audio listening for public displays in shopping malls, whole-home multi-room synchronized audio systems, and campus broadcasting—significantly reducing deployment costs for commercial audio systems.
Full-chain multi-device synchronization control and playback calibration
Real-time interconnection across multiple terminals: phone-based track switching and volume adjustments are simultaneously pushed to all broadcast receivers. All devices refresh track information and volume indicators synchronously on their screens. Pulse waveform synchronization tests verify that audio output waveforms across multiple devices are perfectly aligned at startup, with no drift during long-term playback, solving the common issue of inconsistent timing and fragmented listening experience in traditional multi-speaker systems.
III. Comprehensive Deployment Advantages
Single-chip full-category coverage, reducing hardware SKU complexity
A single ESP32-S31 supports traditional Bluetooth speakers, in-car infotainment systems, TWS earbuds, aparelhos auditivos, and commercial broadcast terminals simultaneously. There is no need to select different chips for legacy and next-gen Bluetooth products. Unified PCB and firmware development reduces supply chain inventory, R&D, and testing costs.
Hardware-level compute offloading, reducing system power consumption and BOM cost
Hardware ASRC, SIMD decoding acceleration, and dual I2S synchronization significantly reduce CPU and memory usage. No auxiliary MCU is required, reducing peripheral components and overall system power consumption. Battery life for wearable and portable devices increases by more than 30%.
Standardized software ecosystem, faster mass production and shorter time-to-market
The esp_bt_audio unified audio component, complete official demos, and mature ESP-IDF development framework encapsulate Bluetooth protocol and audio pipeline layers. Manufacturers can focus on product logic and UI design, significantly shortening development cycles and accelerating entry into the LE Audio product market.
Backward and forward compatibility across Bluetooth ecosystems
Fully compatible with both legacy Bluetooth devices and next-generation LE Audio standards. Products can serve both mainstream older phones and automotive systems while also supporting advanced TWS and broadcast features, expanding user coverage and sales channels.
Estável, commercial-grade performance for consumer and industrial use
Validated across hundreds of operational scenarios with low load, low memory usage, and high-precision multi-device synchronization. Continuous playback remains stable with no audio dropouts or synchronization drift. Strong temperature adaptability makes it suitable not only for mass-market consumer electronics but also for long-duration commercial deployments such as exhibitions, campuses, and in-vehicle systems.
Closing
As LE Audio gradually becomes the unified standard for the wireless audio industry, chip platforms that support backward/forward compatibility, low computational overhead, and multi-device synchronization will become a core competitive advantage for manufacturers. With its combination of dual-mode Bluetooth, hardware audio acceleration, and unified software development framework, the ESP32-S31 provides audio device manufacturers with a low-cost, alto desempenho, full-scenario SoC solution, helping brands rapidly iterate next-generation high-fidelity, baixa latência, and highly interactive smart audio products.













