2024 Feb 28

Audio stream median filter

Median filtering is used in image processing to remove noise and damage. I never encountered it as an audio effect. Let’s figure it out and try it! Here are the requirements:

Compute the median of the last N values of a signal for N ∈ [3, 48].
Do it in real-time.
On a 168-MHz ARM Cortex-M4F microcontroller.

I use a ring buffer in each solution. This is standard in audio stream processing. I don’t show its implementation because you can find it elsewhere, or challenge yourself to implement it!

Algorithm #1: `qsort()`

buffer_update(b, x);
buffer_copy(b, array, length);
qsort(array, length, sizeof(array[0]), float_compare);
return middle(array, length);

Sort the values in the ring buffer using the C standard library’s qsort() function. Calculate the median from the sorted values.

We can’t sort the ring buffer directly — that would break its semantics — so we have to copy its values into an array. This takes time, but isn’t the main bottleneck: qsort() is slow for small values of N.¹ Its performance was so bad, I didn’t measure it.

This solution is quick to implement and easy to verify its correctness. The main drawback of this algorithm is that it does more work than necessary:

It copies the values from the ring buffer into an array.
The fully sorted array requires comparisons that don’t affect the value of the median.
The sorted array is computed from scratch, which is wasteful. Since only one of the values in the ring buffer changes per update to the stream, there might be a way to re-use old sorted values. This would keep the array mostly sorted, and there are efficient algorithms for that.

Algorithm #2: insertion sort

Like Algorithm #1, but with insertion sort instead of qsort():

buffer_update(b, x);
buffer_copy(b, array, length);
sort_insertion(array, length);
return middle(array, length);

This is faster because insertion sort is fast for small arrays. However, because the basic approach is the same, this algorithm keeps each of the drawbacks of Algorithm #1.

Maybe we could avoid sorting the whole array?

Algorithm #3: heapselect

buffer_update(b, x);
buffer_copy(b, array, length);
struct heap h;
heap_init(&h, length, array);
return heap_select(&h, length / 2);

Heapselect allows us to compute the median without sorting the entire array. However, implementing it was annoying, and the results did not justify the effort.

To implement heapselect, you need a binary heap.

I found binary heaps confusing. There seem to be different APIs for heaps which have different levels of abstraction, and I found no explanation of this. Internal operations are named according to a wide range of different conventions. It was hard to find a clean reference implementation in C. Many references use 1-based array indexing, which is yucky? I also found unnecessary recursion, which is not appropriate for systems with limited stack space.

I commend the Go standard library for providing a nice implementation! Despite it not being written in C, Go is similar enough that the translation was easy.²

Despite using a decent implementation, heapselect was slow for this situation. Although it avoids sorting the entire ring buffer, it constructs and discards a new heap for each value of the audio stream.

Algorithm #4: permutation array + insertion sort

A permutation array of length N contains a permutation of the integers [0, N-1]. These values can be used as indexes into another array, which allows us to do really cool things, like create a sorted view of unsorted data!

When we apply a permutation array to a ring buffer, we get several benefits:

Better separation of concerns: The permutation array contains information about order; it does not contain values. This avoids the overhead of copying the ring buffer!
The permutation array is reused across updates. When the ring buffer is updated, the sorted view of the ring buffer stays mostly sorted. This allows us to make good use of insertion sort, which is fast for small, mostly sorted arrays!
The god of data-oriented design will smile upon us: The median filter data structure is a struct of arrays. Since the ring buffer is small, the index values can fit in a uint8_t, saving space.

For this algorithm, I chose to merge the ring buffer, permutation array, and insertion sort into one data structure.

#define MEDIAN_SIZE_MAX 48

struct median {
	float buffer[MEDIAN_SIZE_MAX];
	uint8_t permutation[MEDIAN_SIZE_MAX];
	uint8_t length;
	uint8_t index_oldest;
};

void median_init(struct median *m, uint8_t length);
void median_update(struct median *m, float value);
float median_get(const struct median *m);

The main trick to getting this data structure working is to customize insertion sort:

Compare against the view created by the permutation array, not the values in it.
Swap with respect to the permutation array.

Here’s the sorting algorithm:

static void
median_sort(struct median *m)
{
	for (uint8_t i = 1; i < m->length; i++) {
		const uint8_t x = m->permutation[i];
		uint8_t j = i;

		for (; 0 < j && m->buffer[x] < m->buffer[m->permutation[j - 1]]; j--) {
			m->permutation[j] = m->permutation[j - 1];
		}

		m->permutation[j] = x;
	}
}

For the sake of completeness, here’s the rest of the code:

void
median_init(struct median *m, uint8_t length)
{
	assert(length <= MEDIAN_SIZE_MAX);
	assert(0 < length);
	m->length = length;
	m->index_oldest = 0;

	for (uint8_t i = 0; i < m->length; i++) {
		m->buffer[i] = 0;
		m->permutation[i] = i;
	}
}

void
median_update(struct median *m, float value)
{
	// update ring buffer
	m->buffer[m->index_oldest] = value;
	m->index_oldest = (m->index_oldest + 1 < m->length) ? m->index_oldest + 1 : 0;
	median_sort(m);
}

float
median_get(const struct median *m)
{
	assert(0 < m->length);
	return middle(m->permutation, m->length);
}

To verify this code, I wrote an assertion that compared its output to one of the preceding algorithms. This is the parallel approach! It’s amazing to realize that the two algorithms create the same audio stream, but with very different performance characteristics.

CPU time measurements

I measured the performance of these algorithms by toggling a GPIO and measuring the duration of the pulse. The GPIO is set when the audio engine starts working, and is unset when it stops. The pulse width measures the performance of the entire audio engine, which includes components other than the median filter. The time taken by the other components is small, so the measurements reflect well the performance of the median filter.

The audio engine computes the stream in 1-ms chunks, so it has to finish its work in under 1 ms. Figures are omitted for algorithms that failed to meet the deadline for a given buffer size.

All measurements are in microseconds. These are maximum values observed over a period of several dozen seconds.

buffer size	4	8	16	20	24
heapselect	181	355	744	964
insertion		215	644	952
permutation	81	136	243	297	351

The permutation array algorithm is about three times faster than heapselect.

The sound

I like the sound of it. It reminds me of downsampling, but without the aliasing artifacts! A unique lo-fi effect.

It has a notchy timbre that becomes more overt as the buffer size increases. In fact, the median filter has zero gain for frequencies whose period is a multiple of the buffer length. For such frequencies, the median doesn’t change from one sample to the next — the newest sample has the same value as the sample that went out — creating a gain of zero.

Some waveforms, such as square and sawtooth, are barely modified. High frequency information is attenuated, but it sounds different than a linear lowpass filter.

Lessons

In a realtime application, what matters is worst-case execution time for a known input size, not asymptotic time complexity. “Faster” algorithms were not. Simpler algorithms were faster for small inputs. A custom data structure played a role in a faster solution.

The C standard library merely requires qsort() to sort. The standard says nothing about its implementation, performance, or asymptotic complexity. If sorting is the bottleneck in your application, you probably need to use something else. ↩︎
This is the second time I found algorithms in the Go ecosystem that could be easily translated to C. Check out yourbasic.org’s graph package, which has a nice implementation of topological sorting. ↩︎

Feedback

Discuss this page by emailing my public inbox. Please note the etiquette guidelines.