By John Shirley
Last time, we explored the basic principles of digital audio. This included discussion of bit-depth, quantization, clipping, dither, signal-to-noise ratios, sample rates, and aliasing. Part 2 builds on this foundation: exploring converter designs, buffers, latency, hard drives, data compression, file formats, and connecting digital audio gear together. Knowledge of these topics is invaluable in helping make informed decisions when choosing equipment, getting the most from your studio, and in learning how to best integrate your digital audio gear.
Quality conversion, from analog to digital and back again, is essential to accurate, high fidelity recording. Unfortunately, converters themselves are an often overlooked and misunderstood part of the signal chain. The technology and terminology surrounding this process may seem straightforward on the surface, but this can be misleading. While factors such as bit depth, sample rate, and theoretical Signal-to-Noise ratios do describe important aspects of the basic function of a converter; they only begin to describe its real quality. The fact that seemingly similar units can cost from as little as $20 to close to $800 per channel gives some indication that there is much more going on here.
So what’s the difference?
The quality of the analog components of any digital audio converter is actually among the converter’s most important and expensive elements. Connectors, shielding, grounding, capacitors, and transformers all effect the how well a converter works. As discussed in TCRM #3, A/D and D/A converters require analog filters to remove all high frequency sounds above half the sample rate (according to the Nyquist Theorem). A 44.1 kHz sample rate converter mu st remove anything above 22.05 kHz (the Nyquist limit). Any frequencies higher than that would be misinterpreted completely in a process known as aliasing. (As you may also remember from TCRM #3, 44.1 kHz is the common CD’s sample rate.)
These low-pass, anti-aliasing filters need to be quite severe, causing phase misalignments and distortion to occur while still allowing some low-level (quiet) aliasing to sneak through. The quest for higher quality audio requires better designs and more expensive components for these analog filters.
Oversampling is another way to help alleviate aliasing, while taking away some of the expense and severity associated with analog filter designs. The process of oversampling temporarily increases the sample rate and, therefore, increases the frequency range in order to allow simpler, and cheaper, analog filters to be used. p;
Since the high frequency range is increased, these filters can have gentler cutoff slopes. Because of this, they don’t alter audio phase relationships as much or cause as much distortion as filters that have to do all their work right around 20 kHz. Digital filtering is also often also used with oversampling as it’s both inexpensive and accurate. Common oversampling rates range from 2 to 512 times the base sample rate.
A second, less widely known, benefit to oversampling is a reduction in quantization noise. This is due to the fact that although the noise level stays the same, it is spread over a wider range. More of this range is filtered out as the amount of oversampling increases. Mathematically, this equals around 3 dB less noise with every doubling of the rate. There’s 3 dB less noise at a rate of 2x than without oversampling, and 6 dB less at 4x.
Single Bit Conversion
The Single Bit Conversion approach to converter design can certainly be confusing. It sounds a bit like a football term…. But if 16-bits are better than 8, and 24 are better than 16… why would anyone in their right mind want a 1-bit converter?
Despite what their name may imply, single bit converters do still use 16- and 24-bit words; they just process them one bit at a time. For example, a conventional 16-bit converter converts all 16 simultaneously (in parallel). This actually takes 16 separate circuits, each of which will add a small amount of variance (error). It’s actually both cheaper and more accurate to use one high-quality circuit to convert all 16 bits. Of course, to do the job right, the converter must work 16-times as fast as the parallel designs to keep up with the base sample rate.
Got the Jitters?
To play back a 24-bit audio file at 48 kHz would require a single-bit DAC to convert over 69 million bits each minute. That’s more than one bit every one-millionth of a second! At that rate, small changes in the speed of the converters internal clock can be significant to the data stream. Even the tiniest variance in clocking (known as jitter) can cause both noise and distortion. Stability in the rate of conversion is vital.
For two or more devices to exchange digital audio in real-time, they must process samples at the same rate to avoid jitter and other clock-related problems. In addition, they must also be synchronized, advancing each bit at the same moment. If they do not, the side effects can include noise, distortion, audible clicking, dropout, or even the complete inability of a device to function.
As odd as it may sound, I actually prefer when gear refuses to play when there are problems, as I have seen many good takes and mixes ruined by clock issues that went unnoticed (at least at first). Clocking problems are, in fact, some of the most common and insidious that plague the modern studio.
Digital Connections and Clocking
There are a number of popular, real-time, data transfer protocols used to move digital audio between devices. These connections include S/P-DIF, AES/EBU, T-DIF, LightPipe, and R-BUS (more on these in a minute). To facilitate proper timing, each of these carries a clock signal embedded with the audio. When recording from a digital source, many recorders will look to the selected digital input for their clocking information. The internal clock of the recorder is then ignored in an effort to alleviate errors. In this situation, the playback machine acts as the masterclock (and the recorder is said to be the “slave”.)
When there are multiple digital inputs present, or the supplied signal contains too much jitter or error, a recording device can be either fooled or confused into using a different sample rate than the incoming data actually calls for. When the audio is later played back at the original (correct) rate, it is at the wrong speed! For example, if a DAW mistakenly records a 44.1 kHz signal at 48 kHz and then plays it back at the appropriate 44.1 kHz session rate, the audio is both slow and pitch-shifted down by around a musical minor third!
As more digital audio devices are added to a recording setup, clocking i ssues become more complicated. When there are more than two digital components involved, all parts of the system have to synchronize to a single master clock signal, called word clock. Wordclock does not contain audio data or any information about session or song location. It is simply a timing reference, making sure each piece of gear moves the data stream ahead at the same speed. To simplify and ensure stability, many studios connect all digital gear directly to a special wordclock generator with multiple outputs.
To reduce cost, many manufactures have chosen not to include full wordclock compatibility on their project studio gear. In many situations it is necessary to use a combination of word-clock and digital audio clocking. It’s easier to ensure a proper setup in these situations by making a flow chart/diagram of the clock information. All gear should trace their clock back to a single source, even if indirectly (though as directly as possible.)
Digital Connection Types
- S/P-DIF (Sony/Philips Digital Interchange Format) – A 2-channel (stereo) format originally intended for consumer gear, but which has become quite common in professional studios. It can be found with either an unbalanced RCA or optical connection
- AES/EBU (Audio Engineering Society/European Broadcasters Union) – Also a 2-channel (stereo) format, but specifically designed for professional applications. Balanced connections are made using XLR jacks. Often called simply “AES” today.
- T-DIF (Tascam Digital Interchange Format) – An 8-channel format originally designed for the Tascam DA-88. Uses a 25-pin D-sub cable.
- LightPipe – An 8-channel format originally designed for the Alesis ADAT. Uses an optical connection. Sometimes you’ll see the word Toslink used in connection with Lightpipe audio; technically that word describes the type of connector and cable used, not the audio format. Toslink, developed by Toshiba, is also often used for optical S/P-DIF.
- R-BUS – sp;An 8-channel format used by Roland. Uses a 25-pin D-sub cable but is not interchangeable with T-DIF.
A buffer is a place in RAM (random access memory – the volatile memory active in a running computer as long as power is on) where digital audio information is temporarily stored. Buffers aid in the smooth recording or playback of discrete chunks of data (samples). For example, when eight tracks are played back from a hard disk workstation, small sections must be read from each track consecutively and the data stored in RAM. Only then can the program mix the tracks together and output the results. This is because the hard drive can read only one thing at a time. It’s the buffer that makes it seem as if all tracks are being read simultaneously.
Portable CD players also rely heavily on buffers, but for skip protection rather than track count. When the user presses play, the CD player begins loading audio into the buffer. It does this at a faster rate than it plays audio out of the buffer. The CD player is actually reading ahead of what it’s playing. Now, when the player is bumped and cannot read properly from the CD, it has until the buffer is empty to stabilize and start reading again. As long as it can do this, and the buffer is never emptied completely, there will be no interruption in the audio output.
While buffers are necessary in any digital recording studio, and larger buffers can increase stability, processing power, and track count… there is one major drawback: delay. Buffers, by their very nature, cause a delay in the recording or playback of the audio. The larger the buffer: the longer the delay. Many software recorders allow the buffer size to be adjusted. This allows the user to find the balance between power and speed that works best for them and their studio system.
Latency (a fancy word for lateness) is the time difference between input and output of a digital audio system, signal pathway, or process. It’s caused by both algorithmic/mathematical issues as well as purely mechanical/physical procedures and occurs most commonly in software, A/D/A converters and hard drives. These time delays can create such problems as: dull or uneven high frequency response, vocals that exhibit a sonically “pinched” or “nasal” quality, or a persistent awkwardness on overdubs where musicians can’t seem to get the right rhythmic feel (a.k.a. groove). The latter issue occurs when performers monitor the delayed signal through headphones, or when a mix of input and output is monitored during recording. These situations can cause poor performances or even make accurate playing impossible.
Fortunately, many DAW recording platforms now offer “zero latency monitoring” which simply means that the signal sent to the outputs and/or headphones during recording is split from the input signal before digital conversion. Often, this means that any effects, EQ, or dynamics in the software will not be available during record monitoring. Also, since some recording systems will allow for both latent and non-latent signals to be heard simultaneously, you must check with the software and interface manuals to make sure this does not happen.
Some of the more professional and powerful recording production systems are built to work so fast as to eliminate these performance and monitoring issues. They are often stand-alone systems, which do not rely on a host computer for processing power, or systems which require the addition of extra chips to the computer by way of cards added to the PCI bus slots or an external FireWire box.
The hard drive is the heart of most digital workstations, yet its performance is often overlooked. While a drive should have the capacity to store large amounts of data, it should also be fast enough to handle high track counts, bit-depths, and sample rates. To ensure this the following things should be considered:
Interface type and speed –
A hard drive can only be as fast as the interface it uses to connect to the rest of the system. Suitable interfaces include UltraIDE, Wide Ultra SCSI, Ultra2 SCSI, IEEE-1394, USB 2, Wide Ultra2 SCSI, Ultra3 SCSI, and Wide Ultra3 SCSI. All of these can work at greater than 30 MB/sec.. Note that SCSI 1 and USB 1 are definitely not fast enough for multitracking; while regular IDE can work in some situations, Ultra IDE is much better suited for the studio.
Rotational speed (RPMs) –
This is a particularly important factor in how quickly a drive can access information. While the disk rotates, the head that reads and writes the data can only move across it on a single axis. This means that the information must come around and pass beneath the head. It also means that the drive cannot read any faster than the rate at which the data moves past the head. 7200 RPMs is a minimum acceptable speed for multitrack audio.
Access time –
This is a published measurement of how quickly a drive can be expected to locate and begin reading a file (on average). Fast drives usually measure under 9 ms (milliseconds).
Buffer (cache) –
As mentioned in the above section on buffers and latency, audio drives may need to read from multiple files in multiple sectors: which the system will play as if they are happening simultaneously. This can be a tall order for the physical mechanism of the drive, which can only go so fast. For situations such as this, hard drives have built-in buffers that range from 128 KB to 2 MB. While the delay trade-off is the same as with the buffers discussed earlier, a 512 or 1MB cache can be a suitable compromise.
Many stand-alone workstations come with a drive pre-installed and optimized for use in that machine. In this case, it is not necessary to worry about the aforementioned aspects of the drive, as the manufacturer has already taken these things into account. Despite the manufacturers’ best efforts, however, drive fragmentation may still be a concern.
Due to the constant addition, editing and erasing of data on a hard drive, pieces of a single sound file may actually be stored in several different locations. This is known as fragmentation. This can cause extra lag time as a drive must work extra hard to read the data.
Drives should be defragmented periodically to keep the drive working at its most efficient. On a computer, this can be done using a number of utility software programs. On many stand-alone workstations defragmenting can be accomplished by backing up the files to another location, formatting the drive, and then copying the files back. Some standalone recorders have built-in disk optimization routines that throw away unused audio and free up space on the disk, but this usually isn’t the same as a full defragmentation pass.
A word to the wise: make backup copies of all your data to another drive before defragmenting. In fact, it is highly recommended that you backup all of your files on a regular basis (weekly or after each recording session).
Since audio is now being written to a hard drive as separate files, through some sort of software application, there are numerous formats it can be stored in depending on operating system, software manufacturer, and delivery method (web vs. CD vs. video clip…) The most common high quality ones are WAV, AIFF, and SDII. WAV is generally for systems running Windows, while SDII is for the ProTools platform. AIFF is often used as a generic format that can be read by a majority of systems. Unfortunately, it is not always the best format for some of those same systems. Read the software manuals to know which format a particular application favors.
Web based formats (such as MP3) should not be used as the original capture or storage format. Each compromises quality to various degrees in order to reduce file sizes through data compression. If web delivery is ultimately desired, create a second copy in the appropriate format. Always keep the high quality original WAV, AIF, or SDII files archived and unchanged.
Every process in audio has some kind of secondary effect; analog filtering causes phase shifting and all changes made to audio in the digital domain cause quantization noise…. There’s a special term for the audible side effects of digital signal processing: artifacts. Most often, these are caused by the use of extreme effects. This catchall-style term is used to describe just about anything that’s not caused directly by quantization or aliasing.
My favorite example of artifacts is their intentional use as the watery high-frequency swirling on the first track of the Nine Inch Nails’ Downward Spiral, entitled “Mr. Self Destruct.” Check it out.
Next time we’ll explore topics such as mixing consoles, signal flow, mic preamps, bussing, inserts, aux sends, and groups.
John Shirley is a recording engineer, composer, programmer and producer. He’s also on faculty in the Sound Recording Technology Program at the University of Massachusetts Lowell. Check out his wacky electronic music CD, Sonic Ninjutsu, OR on iTunes.
Supplemental Media Examples
ANALOG VERSUS DIGITAL
The first nine TCRM4 soundfiles compare analog versus digital recording at their highest levels. Three performances (each of a different instrument) were recorded using a 24 bit/88.2 kHz digital system as well as a 2-inch analog tape machine, once with noise reduction and once without (making three samples of each performance).
The digital system utilized an Apogee AD-16x with Lynx AES-16 PCI Card into Nuendo. The analog 2-inch was RMG900 tape recorded on a Studer A827 with Dolby SR (and once without). The recordings done to tape without SR were recorded at a higher level to compensate for the higher noise floor. Of course, this does increase the saturation (distortion) on the analog recordings without SR. The microphone preamps used were all Millennia HV-3D Mic Pres.
These demonstration recordings were made at University of Massachusetts Lowell by graduate students Gavin Paddock and Tim Brault.
First, a 12-string guitar is recorded in digital: TCRM4_1a.wav
Next, the 12-string guitar as recorded to 2-inch tape with noise reduction: TCRM4_1b.wav
Finally, the 12-string guitar as recorded to 2-inch tape without noise reduction: TCRM4_1c.wav
Now a drum kit as recorded in digital: TCRM4_2a.wav
Next, the drums as recorded to 2-inch tape with noise reduction: TCRM4_2b.wav
Finally, the drums as recorded to 2-inch tape without noise reduction: TCRM4_2c.wav
Now a xylophone is recorded in digital: TCRM4_3a.wav
Next, the xylophone as recorded to 2-inch tape with noise reduction: TCRM4_3b.wav
Finally, the xylophone as recorded to 2-inch tape without noise reduction: TCRM4_3c.wav
In digital audio systems, all changes made to the audio not only add further noise through quantization error, but can also exhibit other unwanted side-effects called “artifacts.”
First, let’s listen to a sample of a song (Witch Doctor by Deuce) as originally recorded by Bernie Mack of Flashpoint: TCRM4_4a.wav
Now, the same song in mp3 format. Listen to those lovely, swirly and synthy artifacts: TCRM4_4b.wav
A similar sound can occur if a noise reduction plug-in is used either incorrectly or too extravagantly…. This sound has actually been found on numerous commercial recordings (sometimes intentionally, but often not): TCRM4_4c.wav
In TCRM3 we heard how quantization and base sample rate effect audio quality. Now, let’s explore how the consistency of the sample rate can also effect quality….
Jitter is the fluctuation of a sample rate, but is often also used to describe the effects it can have on audio.
To show demonstrate some of these, an acousic guitar recording is used:
First, without any extra jitter artifacts added: TCRM4_5a.wav
Now, the same sample is subjected to timing fluctuations that cause clocking problems between two pieces of digital audio gear. Note the “clicks” that this creates: TCRM4_5b.wav
Next, the sample is subjected to timing fluctuations that cause slight frequency shifts: TCRM4_5c.wav
As TCRM4_5c.wav, but more extreme…. TCRM4_5d.wav
Now, TCRM4_5c.wav is differenced (flipped in phase and mixed) with the original sample (TCRM4_5a.wav) to make the effects even more obvious: TCRM4_5e.wav
Finally, TCRM4_5d.wav is differenced with the original sample: TCRM4_5f.wav
The following is a program to show the effects of latency and comb filtering. It is offered for both Mac OS and Windows. As it is freeware, there’s no support or warranty of any kind (either stated or implied). That said… enjoy!