Golden Eyes: Experiments with Audio - Part 6 (Nyquist said it first!)

[ Catch up:  part 1part 2part 3part 4, part 5. ]

Hi again,

In this part, let's analyze (or eyenalyze?) another kind of degradation: the sample rate. So, consider the following:

Track G

This is a single snare hit from our sample collection. If you compare track G with the original, it is quite obvious that there are some fine details missing though the overall shape is there. Therefore there should be a process that strips the high frequencies (the fine details of the waveform). Let's look at the difference as usual:

It lost all the details!

The difference looks to be devoid of bass frequencies. Let's look closer:

It is like a summary of the original

Yes, the details are simply not there! The difference track only consists of higher frequencies. Now we overlay them:

Overlaid version

The blue processed line cannot keep up with the red original one. We can clearly see the highest frequency visible in this sample part by taking note of the time information from the top bar. The minimum distance between two positive peaks in the processed track looks like about 1/4th of a millisecond which corresponds to 4 kHz for a full wave period. Since at least two sampling points are needed to accurately represent a sinusoidal wave, this processed waveform must be sampled in 8 kHz or higher. Similarly, the red original waveform has about half the length of the processed one. Therefore that one must be at 16 kHz or higher.

Sample rate differences

In fact, the original is at 48 kHz sample rate. But the snare sound sample does not have sound components in the 16 kHz+ range (8 kHz frequency in audio). Therefore we only see the maximum we have in the data. The modified track was down-sampled to 8 kHz allowing only frequencies 4 kHz and lower to pass (effectively a low-pass filter). It is practically (but not exactly) the same thing using a low-pass EQ filter.

The down-sampled version usually will not correspond with the exact sampling point of the higher sample original version. The reason is that many down-sampling algorithms employ specific averaging and anti-aliasing techniques for a more accurate representation. When the original data have more fine details (higher frequencies) than that can be captured by the sample rate, then we may have an aliasing or moiré pattern effect. This is very interesting for our purposes and I will investigate that in a future part.

Using a 44.1 kHz ve can accurately represent 22.05 kHz frequencies which are at the upper limit of human hearing threshold. Sometimes it is advised to use higher sampling rates such as 48 kHz, 88.2 kHz (or even 192 kHz!) because of some concerns about digital processing artifacts in the very high frequency ranges. But theoretically (see Nyquist-Shannon Sampling Theorem) two samples are enough to represent a specific sinusoidal frequency.

That's all for this part. In the next one, I want to delve into audio compression magic!


Continue with part 7.

No comments:

Post a Comment