Digital processing can cause a lot of confusion. In this tutorial we'll shed some light on the link between the worlds of analog and digital.
Analog devices were used exclusively in the early years of sound processing, simply because there was no other choice. Computers were the size of a small house, programmed by feeding them paper rolls of data, and with no sound processing software in existence. The only storage device available was magnetic tape which does have some positive features. It saturates the audio for example, and most engineers agree that this kind of saturation sounds pleasing. However it also brings some disadvantages, not least of which is a low resolution. This poor resolution meant that the majority of tracks were recorded with the whole band live on stage and then mixed, with the complete results recorded onto the tape.
Analog devices work in much the same way as our ears and brain. Digital sound however, is a little harder to understand and this may explain why many people are not willing to completely switch to the digital realm. In the analog domain, you typically have a physical quantity (in our case voltage), and you use a device (monitors or headphones), to convert this into air motion, which is detected by our ears to produce 'sound'. The more the voltage differs, the more the air pressure changes and the louder the resulting sound is.
If you look at a waveform in a sound editor, imagine that the voltage is changing in exactly the same way. This voltage is transmitted by electrons and the good news is that there's just 'gazillions' of them, so with the relatively low speed of our hearing system, we can theoretically achieve an extremely high accuracy. The issue is how to control and measure these electrons as there is no such thing as a continuous flow of them. It is akin to an astronaut having to count and measure the speed of meteorites while travelling through space, as they pass by every few kilometers!
We do actually have some ways to process this type of signal. For example we are able to change the signal level (amplify or attenuate it) and to control some frequencies within it (we'll cover what frequency is a little later). Engineers have spent lots of time developing electrical circuits, that can do these tasks with minimal distortion. However, considering the variety of active monitors available, we can conclude that we still have a long way to go.
The problem arises because none of the electric components are perfect. A transistor used in an amplifier does not have a linear response and actually waveshapes the result, so we need to use another circuit to 'invert' the transistor's output. To illustrate, a capacitor of 10nF, is not actually 10nF, but 8-12nF and if another transistor located just 1cm from it, heats up to 30 degrees Celsius, it also heats the nearby air and electrical connections, making the 10nF capacitor 7nF. Engineers have to deal with this scenario with every piece of the hardware. One of the reasons high end analog devices are so expensive is that every piece must be tested and measured (well they are supposed to be, who knows if they actually are, but I hope so). In any case, this also explains why analog devices are almost always extremely simple compared to digital software.
The biggest problem is usually the non-linearity of the active components - transistors and tubes although there are some positives as most of these components are fairly linear within certain ranges. So what you input will be very similar to what is output (with just a change in level for example), until you enter the nonlinear stage, where you get the so-called "saturation", this effect increases as we get closer to the maximum voltage.
We can demonstrate this by imagining a flood where the water has completely filled the riverbed. It then starts flooding the lowlands around the bed, but there are too many obstacles for the water to move effectively. The transistor, like the flood water, starts passing less electrons until they either just cannot get through the transistor, or there are no more electrons available. When this occurs, you have just clipped your audio. Guitarists use this effect all the time, and even many audio engineers will say that they actually prefer to record most of the instruments through tube based amplifiers because the character of the distortion sounds pleasing.
The scenario is similar with magnetic tape. The audio is recorded by magnetizing tiny components. Simply put, the higher the voltage, the more of them must be magnetized. However increasing the voltage too much, results in less and less of them being available until eventually we just run out completely.
In the analog world we are limited only by physics, unfortunately our capability to control and measure physical quantities is still very low. We can however, find advantages in some of these imperfections.
In the digital domain we are able to benefit from almost infinite accuracy of every kind, well that is if you are willing to wait years to process your recording! We also encounter problems when converting between our analog and digital realms but more on this later.
If you didn't actually want to listen to the audio but just generate it instead, (i.e. no recording, everything is 'created inside'), then if you want a 1GHz sampling rate, you can have it! If you want 4096 bits sample precision, no problem! There probably won't be any device capable of actually playing this audio file for centuries, but it certainly is possible to produce. Even if we assume such a device to exist, we are very far away from having enough computing power to run it. The point is, the limitations of digital signal processing are purely mathematical, as opposed to physical in our analog world.
Let's get back to reality. We often use a 44kHz sampling rate and it's a reasonable number, since the so-called 'Niquist theorem' states that we can represent all frequencies up to 22kHz this way. Since humans are rarely able to hear anything above 20kHz, then this is sufficient.
So what is sampling? Remember how the voltage was going up and down in our analogue example? Well if we were to ask about "the amount and rate of the electrons at a specific interval" to a device called an 'analog-digital (AD) convertor', it may say "yeah, many of them and moving fast!, so this is obviously 0.87645" :). The result of this sampling process is a set of numbers measured in approximately the same interval. The separate samples mean nothing, but together they tightly follow the signal.
With actual samples, simple actions such as amplifying the audio become extremely easy, and you can have any precision you want, if you are willing to wait for the result. You can also look back in time, and as a result you can also look forward in time. This way, implementing delays and reverbs is only a matter of quality as opposed to the analog world, where it is virtually impossible. You can analyze whole blocks of audio, which allow you to detect and manipulate spectral content and much more. Everything is achieved with incredible accuracy, provided by mathematics.
There are some drawbacks as well however: Firstly, although the basic features like amplifying and mixing are very simple, the more complicated tasks can often become a huge mathematical mess. If an engineer developing DSP (digital signal processing) software does not completely understand this, then the result could potentially be a piece of software, which is very poor yet is marketed as top end. This makes software selection much more difficult, because while with analog you can be more or less certain, that expensive means good, with digital the situation is much more complicated.
Secondly, digital audio is too perfect. The imperfections of analog circuits are hard to simulate. Despite the fact that they are usually rather subtle, these analogue imperfections have often proven to be useful so we strive to recreate them. There are so many analog device simulations, which usually don't achieve anything other than create some nonlinearities. But in reality, the physics is much more complex. One of the most important things is the random character of analog processing which is actually more of a problem in the analog world, yet would be beneficial in this instance. Unfortunately, in the digital realm, random usually means noise which is not the desired effect we want. The goal is of course to simulate only the good things about analog and we are still struggling with this.
Is the analog quality really that ultimate? Several people seem to believe that the audio character of analog devices is basically unbeatable and digital world should focus on recreating these qualities. But this actually makes no sense at all. Analog device engineers did the best they could with what they had, with all the problems and limitations. But stating that this is the top of the mountain is just plain brain washing. It's important to understand, that despite the existing qualities of analog hardware, the results can always be even better! And if you want an improvement, you are probably going to find it in the digital world.
The digital domain itself provides much more power, it is easier to use and cheap to produce. There are problems with conversion from & to analog however and there are also a few advantages that analog processing has, compared to digital, which we cannot currently simulate accurately.
Conversion between analog and digital
Dealing with analog is inevitable. Whenever you record something, it is originally analog, and to play it, you have to convert it back to analog again. After all, we live in a physical world. Maybe in the future it will be possible to live as a piece of software inside a computer, enjoying audio in any accuracy we want ;).
Analog -> digital
Let's say you are recording something with a microphone. The microphone either generates some small electrical current (dynamic) or modifies the voltage you put into it (condenser). This signal is fed into a preamplifier (because the voltages produced by the microphones are just too small), and then into a device called an analog-digital (AD) converter, which converts the voltage to a binary number (the sample). Both the preamp and the AD converter are very delicate devices and are the most important and expensive parts of your analog chain.
The best AD converters currently provide 24-bit precision and 192kHz sampling rates, but note that the fact they claim to have such parameters does NOT mean they actually do. For example the converter may have 22-bits precision with the 2 remaining bits more or less just noise.
Why is it such a problem? If we think about it, let's say the preamp is extremely good and creates about 1 Volt (this is similar to what AA batteries typically produce) from the original millivolts with almost no distortion. We then have the AD converter generating 24 bits so the first bit has an accuracy of 1V, the second one of 0.5V, third 0.25V and so on. The 16th bit detects 0.015mV and 24th 0.06uV (that's 0.00000006V!). From the audio perspective this is about -144dB, thus much less than any human can hear. It's like standing next to a jet plane and trying to hear someone whispering a few meters from you!
And with a 192kHz sampling rate, the AD converter has to perform the measurement 192000 times per second! You could argue that computers can do billions of operations per second, but this is physically very different as computers are dealing with just 2 scenarios - 1 and 0, there is a voltage, or there is not.
Generally the need for sampling rates higher than 96kHz are questionable. 96kHz can represent all frequencies up to 48kHz, which is more than an octave above our hearing limit. However if you study what happens to a nice 48kHz sine wave when you sample it, you'll notice that the waveform doesn't look much like a sine wave anymore even if it still sounds like one (which after all is what matters).
Comment: Personally I'd like to know what some animals with better hearing, feel when listening to our music. If we take the 20kHz sine sampled at 44kHz as an example, then it isn't a sine, but virtually a triangle, and therefore contains several higher harmonics. That would sound like a very sharp distortion, but fortunately, our monitors are far from being good enough to play this, otherwise I would never let my dog listen to my music :).
The bit-resolution of 24 bits provides a range from 0dB to - 144dB which is more than enough, but there's a catch. The limits are fixed so when the input exceeds 1V (creating a signal more than 0dB), it will be clipped back to 0dB and when the input is below -144dB, it becomes silent. Therefore the only way for you to actually use all 24 bits of available resolution is to adjust the preamp so that in the loudest parts of the waveform, the audio gets close to but never exceeds 0dB. Since we can never be sure that the drummer won't play louder this time, because he's in a bad mood and needs to get it out of his system, engineers always employ some headroom, so that the audio never gets clipped. However because of this you often won't actually use the highest bit (the most accurate one), and so you have just lost 6dB, plus now there's just 23 bits left.
You now may end with about 20 reliable bits, which corresponds to a dynamic range of 120dB, which is still more than you need, but not that generous anymore.
Digital -> analog
After you process your recordings, you have to convert them into analog (or rather the listener has to). Let's say you distribute your music in a lossless format, on a CD for example, this gives us a sampling rate of 44kHz and 16-bits resolution. We then need a digital to analog (DA) converter and an amplifier, because the output of the DA converters isn't usually high enough to power even a pair of headphones.
DA converters do the exact opposite of AD converters. Some circuit feeds them 44100 times per second with a 16-bit number, and their task is to generate an output according to these numbers. This means that once they receive number 1, they start generating say 1V output. When it is 0.66, it would be 0.66V. The accuracy of these devices is usually quite good, although not perfect.
After the DA converter we have our amp, which suffers from similar problems to any other amp or preamp. There are nonlinearities in shape, in the frequency domain, just about everywhere.
Next we have our monitors or headphones or whatever we use, and here we have even bigger problems, because designing a reproductor with a linear output is an art on its own. Generally no reproductor has a flat response, which means that they play different frequencies with different levels. Therefore engineers need to measure the response and try to invert it to make it flat. This can be done by special circuits (essentially equalizers inside the monitor), and by the shape and material of the case and so on.
Finally, there is your room and your ears. The room equalizes the audio, and produces echoes and resonances. Ears do the same and are even adjustable, changing the equalization shape in order for you to hear as much as possible. For example, if you play your song and attenuate (reduce) high frequencies with an equalizer, after a while your ears will adjust and will allow you to hear high frequencies better. You can check this by simply stopping the playback and the world will immediately sound "brighter", because it basically won't contain much bass sound (your brain attenuates it because of its high volume).
Digital audio can do marvelous things, analog audio has some advantages as well, but the conversion between them is extremely tricky. So if you are wondering what analog equipment you should get right and spend your money on, then make sure it is the preamp, the audio interface (AD and DA converters), active monitors and the room. Nevertheless we have to deal with analog equipment to actually enjoy the music, so the small "digital excursion" doesn't seem as such a big problem. And then you should go and train your ears ;).