Skip to content

Make microphone polling interval configurable#877

Open
watsaig wants to merge 5 commits intoRustAudio:masterfrom
watsaig:feat/adjust-mic-polling-interval
Open

Make microphone polling interval configurable#877
watsaig wants to merge 5 commits intoRustAudio:masterfrom
watsaig:feat/adjust-mic-polling-interval

Conversation

@watsaig
Copy link
Copy Markdown

@watsaig watsaig commented Apr 24, 2026

The microphone functionality added recently is very useful but for some use cases, the hard-coded 5ms poll interval is a limitation.

This makes the interval configurable, as suggested by @roderickvd in the original PR.

@roderickvd
Copy link
Copy Markdown
Member

Thanks. If memory serves me well, then in #799 we even moved to a Condvar-based approach, removing the poll interval altogether. Would you port that from there to here?

@yara-blue
Copy link
Copy Markdown
Member

yara-blue commented Apr 25, 2026

If memory serves me well, then in #799 we even moved to a Condvar-based approach

Oh hey, apologies I dropped the ball there, we should have landed that!

Would you port that from there to here?

@watsaig if you could help out there that would be super nice :)

@yara-blue
Copy link
Copy Markdown
Member

Regarding this PR: as far as I know there are two big latency sources for a VoIP like application:

  • OS input buffer size (configurable through CPAL and should be exposed on the builder)
  • The polling interval seen above

Switching to the condvar should remove the polling latency entirely. I would also be open to other implementations.

The reason we have the polling interval is to convert from OS callbacks pushing samples to OS callbacks pulling samples:

Microphone 
  ----- Analog cable -----> DAC
  ----- Kernel polling/interrupts -----> OS audio buffer
  ----- Kernel calling on "non empty" buffer ----> Rodio microphone callback (via CPAL)
  
  
  Rodio callback <----- kernel calling on "non full" buffer ----
  OS audio buffer <----- kernel pollling/interrupst ------------ 
  ADC <--------------------analog cable-------------------------
  Speaker

So we have a thread to "reverse the flow" at the microphone side, to go from the
kernel pushing things into it to Rodio's iterator like architecture where
(eventually) the ADC pulls samples in.

I did the initial microphone design mostly to get the feature landed. It might
very well not be optimal.

For example maybe we could do without the thread. If you want to feel free to
give it a thought. Could we use shared memory here instead?

@watsaig
Copy link
Copy Markdown
Author

watsaig commented Apr 25, 2026

Thanks for the feedback! I'll take a deeper dive and see if the condvar approach works best or if we can do without the thread as you mentioned

@watsaig
Copy link
Copy Markdown
Author

watsaig commented May 5, 2026

Hi again,
I dug through your previous PRs and did a little exploration on a new branch.

In my opinion this the perfect situation to use a channel, as we have data produced in a callback and the consumer implementing Iterator (the popular channel crates already include an Iterator implementation so the API is trivial to work with). There may be ways to eke out a little extra performance, but we would be reinventing the channel and so my first instinct is to leave those optimizations to the internals of the channel implementation.

In the new branch, I added feature-gated support for std::mpsc (which has been based on crossbeam for a while so is already well optimized) as well as kanal which is supposed to be even faster. I added a little profiling script that measures jitter by recording audio for 5s and keeping track of the time interval between samples.

Here are some results, with a buffer size of 16 (to minimize latencey) vs the default 512, ran on my macbook pro M3 with the internal microphone:

Current implementation (rtrb)

Buffer size: 16

intervals_rtrb_buffer16

Buffer size: 512

intervals_rtrb_buffer512

std::mpsc

Buffer size: 16

intervals_std_mpsc_buffer16

Buffer size: 512

intervals_std_mpsc_buffer512

kanal

Buffer size: 16

intervals_kanal_buffer16

Buffer size: 512

intervals_kanal_buffer512

I also added a benchmark to look at channel throughput. I'm not sure it's very representative of real use but it shows kanal as the slowest, with std::mpsc being about 2x faster and rtrb another 2x faster (though this is with polling in a busy loop instead of using a thread::sleep):

     Running benches/microphone_channels.rs (target/release/deps/microphone_channels-41b9cd8b7e09a05d)
Timer precision: 41 ns
microphone_channels   fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ kanal_bounded                    │               │               │               │         │
│  ├─ 256             655.9 ns      │ 832.8 µs      │ 6.616 µs      │ 78.6 µs       │ 1000    │ 8000
│  │                  390.2 Mitem/s │ 307.3 Kitem/s │ 38.68 Mitem/s │ 3.256 Mitem/s │         │
│  ├─ 512             6.624 µs      │ 3.01 ms       │ 18.2 µs       │ 158.8 µs      │ 1000    │ 1000
│  │                  77.28 Mitem/s │ 170 Kitem/s   │ 28.11 Mitem/s │ 3.223 Mitem/s │         │
│  ╰─ 1024            10.45 µs      │ 4.328 ms      │ 37.24 µs      │ 338.5 µs      │ 1000    │ 1000
│                     97.91 Mitem/s │ 236.5 Kitem/s │ 27.49 Mitem/s │ 3.024 Mitem/s │         │
├─ mpsc_sync_channel                │               │               │               │         │
│  ├─ 256             3.582 µs      │ 148.7 µs      │ 3.665 µs      │ 3.874 µs      │ 1000    │ 1000
│  │                  71.45 Mitem/s │ 1.721 Mitem/s │ 69.83 Mitem/s │ 66.07 Mitem/s │         │
│  ├─ 512             7.207 µs      │ 162.5 µs      │ 8.207 µs      │ 9.648 µs      │ 1000    │ 1000
│  │                  71.03 Mitem/s │ 3.149 Mitem/s │ 62.38 Mitem/s │ 53.06 Mitem/s │         │
│  ╰─ 1024            9.832 µs      │ 95.7 µs       │ 11.74 µs      │ 12.45 µs      │ 1000    │ 1000
│                     104.1 Mitem/s │ 10.69 Mitem/s │ 87.15 Mitem/s │ 82.19 Mitem/s │         │
╰─ rtrb_in_spin_loop                │               │               │               │         │
   ├─ 256             1.165 µs      │ 20.95 µs      │ 1.415 µs      │ 1.388 µs      │ 1000    │ 1000
   │                  219.6 Mitem/s │ 12.21 Mitem/s │ 180.8 Mitem/s │ 184.3 Mitem/s │         │
   ├─ 512             2.416 µs      │ 15.24 µs      │ 2.541 µs      │ 2.57 µs       │ 1000    │ 1000
   │                  211.8 Mitem/s │ 33.57 Mitem/s │ 201.4 Mitem/s │ 199.1 Mitem/s │         │
   ╰─ 1024            4.374 µs      │ 35.83 µs      │ 4.583 µs      │ 4.867 µs      │ 1000    │ 1000
                      234 Mitem/s   │ 28.57 Mitem/s │ 223.3 Mitem/s │ 210.3 Mitem/s │         │

Overall std::mpsc looks like both the simplest and best-performing implementation. This doesn't account for a few things though:

  • Latency from sample acquisition in the OS to when the consumer receives it via rodio (I considered adding a round-trip latency test that would produce a tone through the speakers and monitor the microphone to identify when the sound is received, but that felt a bit too high-effort at this stage)
  • CPU usage; should be correlated with the channel throughput measured above but that doesn't indicate what typical CPU load would be

Hope that helps, let me know if you have any suggestions for other things to look into!

@yara-blue
Copy link
Copy Markdown
Member

I added a little profiling script that measures jitter by recording audio for 5s and keeping track of the time interval between samples.

Amazing! I've been planning to add a tool like this forever. You went above and beyond here :)

Overall std::mpsc looks like both the simplest and best-performing implementation.

Completely agreed!

Latency from sample acquisition in the OS to when the consumer receives it via rodio

OS <-> Rodio callback latency should be constant.

CPU usage; should be correlated with the channel throughput measured above

I'm surprised rtrb_in_spin_loop seems to have more than twice as much throughput. That is probably because the benchmark does not account for the normal time between samples.


anyways, given the data you so clearly presented I'd be happy to merge a PR switching to the mpsc implementation you have in that branch.

I do wonder (this is an idea, follow up on this only if you'd like to!) if we could lower CPU utilization by sending groups of 16 samples. Maybe even eight (OS audio output & input uses double buffering with a period of half the buffer size as far as I know, so theoretically eight could give a lower total latency).

@watsaig
Copy link
Copy Markdown
Author

watsaig commented May 5, 2026

Thanks, sounds good! I'll cherry-pick the mpsc parts here.

Latency from sample acquisition in the OS to when the consumer receives it via rodio

OS <-> Rodio callback latency should be constant.

Agreed, I was unclear here: I meant that there could be different delays all the way to the consumer e.g. if different channel implementations have a different way of signaling to the receiver that new data is available. But that might not make a big difference anyway in the grand scheme of things.

I do wonder (this is an idea, follow up on this only if you'd like to!) if we could lower CPU utilization by sending groups of 16 samples. Maybe even eight (OS audio output & input uses double buffering with a period of half the buffer size as far as I know, so theoretically eight could give a lower total latency).

I don't quite follow, sorry. Do you mean batching in the callback, so that the channel type becomes [Sample; 16] (or Vec<Sample> for dynamic sizes)? Or batching on the receiver side with something like Itertools::chunks?

@roderickvd
Copy link
Copy Markdown
Member

Very nice!

Did you also get a chance to measure the Condvar based approach? I’m surprised that sending samples down a channel works so well. My experience so far has been that’s slower and more jittery. Happy to be proven wrong though!

@yara-blue
Copy link
Copy Markdown
Member

yara-blue commented May 6, 2026

I don't quite follow, sorry. Do you mean batching in the callback, so that the channel type becomes [Sample; 16] (or Vec for dynamic sizes)? Or batching on the receiver side with something like Itertools::chunks?

Sorry I meant send chunks as in [Sample; 16]. That should lower the overhead of send/recv etc.

Did you also get a chance to measure the Condvar based approach?

Would be interesting to see! Give that a shot if you got time Craig!

@watsaig
Copy link
Copy Markdown
Author

watsaig commented May 6, 2026

I didn't try the condvar approach. Correct me if I'm misunderstanding the idea but a ring buffer + condvar sounds like a channel which is what I was referring to when I mentioned reinventing the channel. Actually that is more or less the first example that both of these intros to channels go with: Jon Gjenset - Crust of Rust: Channels; Mara Bos - Rust Atomics and Locks, which is then heavily optimized in mpsc and others to minimize locking and waiting.

As far as I understand, the sender side in #799 is less efficient (notifies the receiver on every push even if the receiver is already awake), and there is a potential race in case the sender notifies the condvar while the receiver is locking the mutex here where the receiver would then have to wait until the next sample is sent to wake up. Again please correct me if I'm wrong but I believe this is exactly the kind of thing that channels abstract away to provide a more efficient (and simpler) way to send data between threads.


@yara-blue

I don't quite follow, sorry. Do you mean batching in the callback, so that the channel type becomes [Sample; 16] (or Vec for dynamic sizes)? Or batching on the receiver side with something like Itertools::chunks?

Sorry I meant send chunks as in [Sample; 16]. That should lower the overhead of send/recv etc.

Yes, I agree that batching data should be a little more efficient. Would you advise an array (with microphone being generic over N; or the size being hardcoded)? Or a vec with the length set to the value (or half the value) of the buffer size? And should microphone::next return a batch of data? Or cache it internally and return samples one-by-one to preserve the current API?


On another note I forgot to ask this earlier but my current mpsc implementation uses send which will block if the receiver is full. But the current rtrb implementation instead silently drops the sample. Should I switch to try_send to keep that behavior? We could also log a warning if the tracing feature is enabled, potentially

@roderickvd
Copy link
Copy Markdown
Member

I didn't try the condvar approach. Correct me if I'm misunderstanding the idea but a ring buffer + condvar sounds like a channel which is what I was referring to when I mentioned reinventing the channel.

Sure, that's what channels do under the hood. I'm not saying you should copy the #799 approach directly (we closed it because we don't need the non-Send workaround anymore) but I don't immediately buy that generic crates or types are more optimized. I've got real experience with ring buffers being faster than piping samples down a channel.

Yes, I agree that batching data should be a little more efficient. Would you advise an array (with microphone being generic over N; or the size being hardcoded)? Or a vec with the length set to the value (or half the value) of the buffer size? And should microphone::next return a batch of data? Or cache it internally and return samples one-by-one to preserve the current API?

@yara-blue will be able to provide more guidance for compatibility with the future direction of Rodio. I just wanted to add that working with chunk_by or better yet chunks_exact will enable SIMD auto-vectorization.

Rodio's architecture is all about iterating over single samples (although "producers" should guarantee that they are always frame-aligned; i.e. zero-pad when a mid-stream interruption occurs). Keeping internal buffers with full chunks as kind of a double-buffer scheme could work, but likely extends well beyond the scope of this PR. It would be really nice to get support on such a topic though.

On another note I forgot to ask this earlier but my current mpsc implementation uses send which will block if the receiver is full. But the current rtrb implementation instead silently drops the sample. Should I switch to try_send to keep that behavior? We could also log a warning if the tracing feature is enabled, potentially

I propose that it silently drops. This will facilitate the use case where the sink is stalled e.g. due to heavy CPU load, and when it recovers, it immediately gets "live" data again.

@yara-blue
Copy link
Copy Markdown
Member

Correct me if I'm misunderstanding the idea but a ring buffer + condvar sounds
like a channel which is what I was referring to when I mentioned reinventing
the channel

Pretty much, though what we ideally have is an spsc. Rust's mpsc channel has to
account for multiple consumers something we will never need. Whether that opens
a large avenue for optimizations? No idea. I can imagine the condvar +
ringbuffer being faster but it might also not be.

It is more complex so if we can get 90% of the way there with a simple channel
I'm more then happy to do that way. Still could be worth having a look at, but
only if you want and have the time! The mpsc is already a great improvement
right now.

Yes, I agree that batching data should be a little more efficient. Would you
advise an array (with microphone being generic over N; or the size being
hardcoded)?

should microphone::next return a batch of data? Or cache it internally and
return samples one-by-one to preserve the current API?

I'm gonna answer these out of order.

For the forseeable future Rodio will keep extending Iterator<Item = sample>,
in other words returning one-by-one. Theoretically that should still allow
vectorization though that does not reliably happen - if at all - right now. I
might see if I can dive into the compiler side and adress that though I'm wildly
unequipped for that right now, still that is how you learn. If that fails and we
lost the hope of the compiler auto vectorizing iter.map().map().map() (which
is what Rodio kinda is) we could change Source to return an array. I think there
is much value in the simplicity of the current approach and even without
auto-vectorization it is sufficient for nearly all usecases. If it is at all
possible to tacle it from the compiler side I think we should. Let me know what
yu think though!

I think an array of 16 seems like a reasonable minimum. At 44khz that's 0.3 ms
of latency induced by the chunking. Though even that latency can fall away if
later parts of the audio pipeline had to chunk anyway. We could use a
heapless::Vec those are stack backed vecs so the 16 would be a maximum
instead. On the other hand [Sample; 16] gives more freedom to the optimizer,
for example loop unrolling and simd may be easier.

Should I switch to try_send to keep that behavior?

Good question! As usual agree with Roderick, we should definitely silently drop.
Otherwise the latency could grow over time as more and more samples get stuck in
the channel. For the mpsc implementation you can use the sync_channel, that is
bounded. A capacity of 1 would work well with the [Samples; 16] arrays if
that is the path you choose to go.

We could also log a warning if the tracing feature is enabled, potentially

Mhm.. not sure, ALSA (the linux audio API) force prints on under-run (an
under-run is the opposite of this, when we do not have enough samples ready on
time) which is really annoying. If we want to report anything it should
probably go through a optional callback. The default implementation of that
callback could eprintln! or tracing::warn. I'd say it's fine to postpone the
call on any error reporting till later, no need for that to hold up all this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants