2015-03-26

Golden Eyes: Experiments with Audio - Part 7 (Squeezing Those Waves)

[ Catch up:  part 1part 2part 3part 4part 5, part 6 ]

I am back with more eye candy! This time we will squeeze and squash our waveforms for fun and profit, in other words we will compress them. This compression should not be confused with 'audio data compression to reduce size' like in a mp3 format.

The kick sample is a good candidate to demonstrate this effect. We have four tracks with different variations of compression applied to them. The final amplitudes have been roughly matched here for easy comparison (using the makeup gain):

Compression at work

Let's see all the binary comparisons to our original:

A1

A2

A3

A4

You can enlarge the images by clicking on them. If you have first looked for high or low frequency loss or addition, you would have seen that there isn't any. All the frequencies are accounted for. The only difference seems to be in the amplitude or volume of some parts. Let's also look at the comparison between A2 and A3 for the heck of it:

A2 & A3

The amplitudes in the beginning seem to be different though the final parts of the tracks look the same in amplitude. Notice also the timeline. The whole part takes about about 100 ms in time with the first low frequency wave spanning about 10 ms and the longest sinusoidal wave spanning 20 ms (which means the frequencies are 100 Hz and 50 Hz). The amplitude comparison would be really easier if we overlay them like this:

A1 (black) & Original (red)
 A1 looks almost the same except being somewhat quieter in the beginning.

A2 (black) & Original (red)

A2 looks similar but the second low frequency peak in the waveform seems quieter than the rest.

A3 (black) & Original (red)

When we come to A3, the differences become noticeable. It looks like it is quieter overall with larger attenuation for louder parts.

A4 (black) & Original (red)

A4 looks same as A3 in the beginning; however the attenuation is greater in the following section. Let's look at A2 vs A3 vs Original for a better understanding.

A2 (blue) & A3 (black) & Original (red)

The difference lies in the beginning where the amplitudes are greater. Is it a time-based effect, or amplitude-based, or both? Let's add A4:

A2 (blue) & A3 (black) & A4 (green) & Original (red)

A4 is same with A3 in the beginning but then it becomes quieter after some time. Finally, A1 is added:

A1 (orange) & A2 (blue) & A3 (black) & A4 (green) & Original (red)

A1, as we noticed before, only differs in the beginning and about the same in the following parts. So, what is happening here?

The answer is compression! It is both an amplitude-based and a time-based effect. The settings are like this:

  • A1: Maximizer (short attack, short release, practically infinite ratio)
  • A2: Compressor with 10:1 ratio, 50 ms attack, 3 ms release (long attack, short release)
  • A3: Compressor with 10:1 ratio, 5 ms attack, 3 ms release (short attack, short release)
  • A4: Compressor with 10:1 ratio, 5 ms attack, 50 ms release (short attack, long release)

Threshold point is the same for all. In short, maximizer tries to cut all the peaks above a certain threshold immediately to allow an increase in volume and compressors do the same but gradually using a time component for activation and deactivation. This is why A2 (with a slower attack) does not start compressing immediately at the beginning but gradually compresses for 50 milliseconds while A3 starts compressing the offending peaks almost immediately. Meanwhile A4 does not want to let go for a while, even after the offending peaks have been passed :).

The comparison would be easier on a repeating faster drum hits. Notice the timeline on the following example. Every labelled tick is 100 ms in time. Here the compression threshold is also lower.

Drum part compression comparison

 As we can see, maximizer (A1) tries to remove (gently) all the offending peaks to allow for a louder sound (after adding make-up gain) while removing transients (making the sound less 'punchy'). The compressor with a slower attack (A2) passes more of the transient 'attack' part of the beat while compressors with fast attacks (A3 & A4) leaves only the very beginning of the 'attack' sound. This makes the difference between a 'fatter' kick sound and a 'punchier' kick sound. A very-fast (or immediate) attack would make the sound even less punchier and perhaps 'duller'. Adequate use of compressors is a long topic and maybe best left for another article.

Let's compare them one by one (this time on black):

A1-maximizer (white) & Original (red)

Yep, the maximizer removes all of our precious transients (see loudness wars!).

A2 [long attack, short release] (white) & Original (red)

A long attack gently carves out the transients.

A3 [short attack, short release] (white) & Original (red)

A short attack leaves out only the immediate transients carving out the excess fat :).

A4 [long attack, long release] (white) & Original (red)

A long release means we also squeeze the quiet parts (sustain and release part of the drum hit). If there are above-threshold parts encountered during the release part of a compressor, these will be immediately compressed regardless of the attack settings since we are already activated at that time. Therefore, it is not a good idea to time the release longer than two consecutive attacks of our sample.

A2 [long attack] (white) & A3 [short attack] (blue) & Original (red)

Long attack vs. short attack in plain sight above. Let's see the 'difference' tracks also! These difference tracks show 'what' exactly carved out by the compression.

Differences with the original for the tracks A1, A2, A3, and A4
Many variations of compression can also be easily heard with an experienced ear. In the final parts I plan to provide audio examples for all of the examples so far since the beginning of this series to provide an audio guide to these visual cues.

Until next time....
Cagil

2015-03-13

Golden Eyes: Experiments with Audio - Part 6 (Nyquist said it first!)

[ Catch up:  part 1part 2part 3part 4, part 5. ]

Hi again,

In this part, let's analyze (or eyenalyze?) another kind of degradation: the sample rate. So, consider the following:

Track G

This is a single snare hit from our sample collection. If you compare track G with the original, it is quite obvious that there are some fine details missing though the overall shape is there. Therefore there should be a process that strips the high frequencies (the fine details of the waveform). Let's look at the difference as usual:

It lost all the details!

The difference looks to be devoid of bass frequencies. Let's look closer:

It is like a summary of the original

Yes, the details are simply not there! The difference track only consists of higher frequencies. Now we overlay them:

Overlaid version

The blue processed line cannot keep up with the red original one. We can clearly see the highest frequency visible in this sample part by taking note of the time information from the top bar. The minimum distance between two positive peaks in the processed track looks like about 1/4th of a millisecond which corresponds to 4 kHz for a full wave period. Since at least two sampling points are needed to accurately represent a sinusoidal wave, this processed waveform must be sampled in 8 kHz or higher. Similarly, the red original waveform has about half the length of the processed one. Therefore that one must be at 16 kHz or higher.

Sample rate differences


In fact, the original is at 48 kHz sample rate. But the snare sound sample does not have sound components in the 16 kHz+ range (8 kHz frequency in audio). Therefore we only see the maximum we have in the data. The modified track was down-sampled to 8 kHz allowing only frequencies 4 kHz and lower to pass (effectively a low-pass filter). It is practically (but not exactly) the same thing using a low-pass EQ filter.

The down-sampled version usually will not correspond with the exact sampling point of the higher sample original version. The reason is that many down-sampling algorithms employ specific averaging and anti-aliasing techniques for a more accurate representation. When the original data have more fine details (higher frequencies) than that can be captured by the sample rate, then we may have an aliasing or moiré pattern effect. This is very interesting for our purposes and I will investigate that in a future part.

Using a 44.1 kHz ve can accurately represent 22.05 kHz frequencies which are at the upper limit of human hearing threshold. Sometimes it is advised to use higher sampling rates such as 48 kHz, 88.2 kHz (or even 192 kHz!) because of some concerns about digital processing artifacts in the very high frequency ranges. But theoretically (see Nyquist-Shannon Sampling Theorem) two samples are enough to represent a specific sinusoidal frequency.

That's all for this part. In the next one, I want to delve into audio compression magic!

Cagil

Continue with part 7.

2015-03-07

Golden Eyes: Experiments with Audio - Part 5 (a 'bit' of degradation)

[ Catch up: part 1, part 2, part 3, part 4. ]

Welcome back!

Alright, just look at this mysterious process 'H' in comparison with the original track for a minute. Here it might help to have high frequencies. Therefore I chose a single cymbal hit part of the track.

Sneaky process H

Can you see a difference (you can click all the images to enlarge them)? Probably not. Amplitudes and overall form looks to be the same. Let's see the difference of those:

Silence?

Ok, looks completely silent! Maybe they are identical. What if we listen to the tracks?

{...a quick listening later... well, this is Golden Eyes, no eavesdropping! ;D }

Crazy! There is definitely something big lurking in the audio. The cymbal sound is much much more 'hissy' and also there are some loud digital/robotic distortions. It seems that the quieter the audio the more pronounced these robotic effects. Since the tail (release) part of musical audio is usually a gradual amplitude decline, it is really audible in those tail parts. For example during the synth solo we hear some distortion along with a loud hissing; but at the tail part of the solo there is a long gradual decline of volume and at this part it is really glitchy, almost like a science fiction sound effect. I will show that part in a minute.

Turning back to the cymbal sound, here is the amped-up (amplitude exaggerated) version of it:

What is there?!

Now we see that there is some data in the difference. Let's zoom into the beginning of that cymbal attack:

Nope...

The applied process is now somewhat visible to the trained eye but let's make it clear (enhance!) [the amping-up is done using the vertical slider in Cubase visible on the upper right side of the image below]. We both zoomed and amped-up here. The waveform in the window takes about 2 milliseconds:

If I reaally squint my eyesss...

Could you notice the processing yet? What about if we overlay them:

OK, red one is the lazy one.

The red one is the process track. It looks like it skips some values and rounds them up or down (hint hint!). I think we should look at the quieter tail part of this cymbal sample and reeeeally amp it up to see this rounding in action. Here is the tail part of the cymbal waveform:

Cymbal tail comparison

Bingo! Track H looks like a poor copy of the original trying to imitate but failing miserably :P. It is now clear that the values are restricted and being rounded in the processed track. It also looks like the minimum value change time is longer (the sample rate is lower). But it is just an illusion when the value does not change from one sample point to another because of the rounding restriction. Just look at the quick change of values in the left channel of H visible in several places. So, the sample rate looks to be the same, but the number of distinct values the data can store is reduced!

This is called the bit depth of the audio sample. In the original track bit depth is 24-bits which means there are 224 (or over 1.6 million!) distinct values to store the current amplitude of sound at the sample point. In the modified track bit depth has been reduced to only 8-bits (think Atari) which corresponds to 28 or 256 distinct values. So, for every two adjacent possible values in the modified track, there can be 65536 possible distinct values in the original one; and the 8-bit version had to round these to the nearest one of the 256 values.

This illustration shows the difference as bit-depth doubles up (original audio is a sinusodial wave):




Let's see an overlay of our example to get a better look:

Bit-depth difference is in the house!

Since the changes in the audio now more rectangular and less sinusodial, the sound produced by the speakers become distorted. This almost looks like the chiptunes or 8-bit music of the past. However there are some differences since Cubase extrapolates this to project bit depth (24-bits) and our audio interface is also set to 24-bits. Also our sample rate is still at 48kHz; way higher than the tunes of the digital past :).

If you want to export some audio with a different bit depth, you can choose the desired bit depth in the export dialog as shown below. However it will be up-bit-depthed (is that a thing?) to the project settings when you import again.

Cubase export dialog

Finally, as promised, the tail of the synth solo where it sounds almost like an android (notice the amp-up factor):

I, robot!


Till next time,

Cagil

Continue with the part 6.

2015-03-06

Golden Eyes: Experiments with Audio - Part 4 (I think I saw some noise!)

note: you can read part 1, part 2, and part 3.

Hi again,

In this part I will show another processing on our audio. Let's look at a kick drum waveform for the processed track D and compare it with our original:

Operation D

What do you see? Can you spot some difference? Perhaps if we could see the difference:

Whispering?

The difference is really minimum but there is definitely something there! This kick audio has some high frequencies in there and that makes it difficult to analyze. Below is a zoomed in part of the bass track and its processed version.

..or you have shaky hands!

Now it is easy to see that some low amplitude, high frequency and random looking data has been added in the processing. Here they are overlaid for comparison (click the image for a bigger version). Smooth red line is the original waveform while the blue jagged line riding on top of it is the modified one.

Overlay of process D

This type of effect is most probably some type of high frequency noise (or hissing) riding below our audio. Let's really amplify the difference track:

The portrait of a noise

Yes, it really looks like high-pass white noise here. In fact, I had put some high-passed noise generator plugin to get this effect.

What about when we listen? Well, the noise has been clearly heard and correctly identified as such when listened; especially when compared with the original audio. Many times some sort of tape hiss is deliberately added to tracks to get a more organic or vintage feel and it can be quite pleasant when used correctly. In fact many analog-mimicking VST plugins will just do that for the analog effect.

Hope you enjoyed that. In the next part, let's degrade the sound and see what happens!

Cagil

Continue with part 5.





2015-03-04

Golden Eyes: Experiments with Audio - Part 3 (phase inversion, inverse phasination)

(It is recommended to read part 1 and part 2 beforehand.)

OK, let's subtract the clipped track from the original track to see only the differences. How do you subtract one audio from another? Well, you invert one of them and then 'sum' them!

Here is a small portion of our original vocal part:

 "eieieieieiiii" in about 50 ms

And here with the phase inverted version:

Like looking in the mirror!

In Cubase you can select the audio clip and apply Audio->Process->Phase Reverse to get the inverted version. Notice how it is inverted by the 0-dB-axis. Now, what if we play both at the same time? As you can guess, the answer is "complete silence" since the waves will superimpose and cancel each other completely! Let's bounce the summation result:

Silence...because, science! :)
So, in a way if A' is the phase inverted version of A, we confirm A + A' = 0. What about our clipped track in the previous part? Here another part from the vocal from the clipped track and compared with the original track:

Looks like digital clipping to me

Remember the 'shaving'? Now, let's see only the clipped parts by taking the difference (summing with the inversion of the original):

Ahh, so this is the place where all those clipped data was going after all!

Now we can clearly see what exactly is being clipped in track C. Notice that the weird bounce-back dips we have talked about in the previous part are also clearly present in the difference. Pretty neat, don't you think? In the next part, I will investigate an interesting 'process' on the original.

Continue with part 4.

Cagil


2015-03-01

Golden Eyes: Experiments with Audio - Part 2 (are you shaved?)

Continuing from the Part 1, let's look at the waveform for track C:

Track C

If you are familiar with waveforms, you may recognize the apparent shaved-ness of the data here. I have to point out that all of the processes/artifacts has been applied to each instrument part separately and then they are glued together. Because of that, the flatness is at different amplitudes for each part as can be seen above. If we zoom into the orchestra part, it is obvious that the flat shaving is amplitude dependent and the quiet parts are not affected:

Orchestra part

Let's compare it with the original (select both audio clips in Cubase and then select 'events to part' from the Audio menu then press Enter):

Comparison of the orchestra part

Now the effect is clear! Remember to compare the left channels and right channels with each other. There are two main ways to do this effect to an audio: 1) clipping 2) limiting/maximizing. Of course clipping can also be considered as a crude type of limiting/maximizing. In clipping, the digital audio signal has become above the maximum representation capacity of the digital audio format and become cut-off abruptly. In limiting/maximizing, the audio has been similarly prevented to go above some limit but the transition to the cut-off is more gentle to prevent audio artifacts and degeneration. To make it clear, clipping is 'bad' and you don't want it to happen unless going for that specific effect. On the other hand limiting (which is a form of compression) and maximization are very common processes in the audio world but their specific usage is outside the scope of this post.

So, which has been applied here? We need to zoom in further:

Zoomed in to orchestra part


...and further:

I see waves itself!


..and further:

Shaving in plain sight

Now we see the individual waves and it looks like digital clipping more and more. If we zoom in further in the audio part editor in Cubase, the display changes to lines:

Really zoomed in

Here, the cut-off parts look flat with some exceptions. For a sound wave flatness is usually audible as pops and clicks if separated out in time, but if they are really frequent in a small time period (as in here, look at the timecode) then it sounds cracked and unpleasantly distorted (clipping distortion). (If you remember that sound is essentially a continuous or smooth change of air pressure as a function of time, the musical content will be essentially removed from these flat parts).

Even further...

I produced this effect by turning up the volume fader of the main output in Cubase allowing it to clip and then bouncing the result into the project. Let's overlay the differences showing the clipping line:

Overlay of the effect

It is interesting that the clipped parts are not actually completely flat as expected; but instead almost dips or bounces back as the clipped data goes higher. It is really weird and if anyone knows why this is happening please share it in the comments. I think this may be the result of how Cubase or my audio interface are handling the clipping data.

As another example, let's look at the drum kick clipping. Here the longer wavelength low frequency part of the kick drives the data over the clipping point. Since the higher frequencies 'live' on the lower frequency waveform, they are also gone forever (apart from that interesting dipping effect).

Overlay of the effect in the drum kick

This will provide a reference point when we compare the results with the more gentle maximizing and compression effects. In the next part I will look into phase inversion and difference of two tracks which will come really handy for our analysis.

Take care until then,
Cagil

Continue with Part 3