Last time this column looked at MIDI sequencing. Many of today’s computer-based DAWs began life as sequencers and derive their basic working methods and user interfaces from sequencers. Building on those concepts, we will spend the next three installments delving deeper into the technology and usage of DAWs. You most likely already using a DAW; thoroughly understanding DAWs and their underlying technology will help you make higher-quality recordings.
A DAW is a device which can record, edit, process and play digital audio files. It can be either a stand-alone unit (such as the Boss BR-1600CD, the TASCAM 2488 or Korg D3200), or built around a personal computer (such as the ubiquitous ProTools rig). While digital tape recorders – Alesis ADAT, TASCAM DA-series, etc. – were the rage in the early 90’s, tape has given way to hard disks and other non-linear data storage, like CompactFlash, SD, or micro SD memory cards.
Those digital tape recorders stored audio as bits and bytes just like the hard drives do in computer-based systems. The huge practical difference is that hard drives allow the system to jump back and forth seamlessly between non-adjacent audio files, something that was not possible using linear tape formats (whether analog or digital). It is precisely this ability that is the foundation of the DAW’s power, and when computer prices came down and and capabilities went up, DAWs saw a meteoric rise, to the point of becoming the preeminent recording platform.
Editing: Basic Modes and Views
Though DAWs differ in their graphic interface styles and the names by which they call their different functions, their basic principles and functionality are really very similar. Below are some of the windows, controls and editing modes common to most current DAW designs.
Main Edit Window
The main edit window is where a multitrack overview of basic region placement, editing, and automation information is displayed over time (usually referred to as the timeline). Parts of audio and MIDI files (regions) can be edited like text in a word processor, with basic functions like Copy, Cut, Paste, and Duplicate.
Of the common editing modes (modes where regions of audio and/or MIDI are moved around with respect to one another within a timeline), Slip mode is the most basic, and intuitive, of the common editing modes. Regions are moved or placed in the edit window by the cursor (usually controlled by mouse). They are manually dropped on a specific track and at a specific location in time.
Grid mode (sometimes called Snap or Quantize mode) allows the user to set up a timing resolution appropriate to the type of work at hand. For instance, the grid can be set to a musical time scale such as eighth notes at 112 beats per minute. In this way, all edits will fall exactly on the eights (a concept called quantization in MIDI sequencing, remember?). Grids can also be set to absolute time or SMPTE time for film or video applications.
Shuffle mode (as originally defined by Digidesign for ProTools) allows regions to quickly and accurately be moved so that one starts just after the prior one ends (or vice versa). In a Slip-style mode it can be difficult to get regions to start so precisely in relation to one another. In order to accomplish this, you have to zoom in to the sample level (where it can sometimes be hard to see the forest for the trees), make the shift, and zoom back out again.
Be very careful with shuffle mode when inserting new regions into the timeline. In a number of instances, the audio that comes after the new region will be offset further in time by the length of the new region. This time offset is quite similar to what would have happened had the edit been done by splicing analog tape. If this side effect cannot be toggled off in a menu or preference somewhere the best way to avoid such problems is to insert the region first in a Slip mode (as close to where you need it as you can) and then change to Shuffle mode for the final touches.
A final common mode, originally intended for work with film and video, is called Spot. When a region is either added or moved, a pop-up screen appears requesting the exact time at which you’d like the region to start. This is especially handy when you have a rather extensive video cue list and effects to match up. Rough edits can be placed using numbers rather than by performing a lot of tedious scrolling, zooming, and dragging.
In some workstations, edit subwindows are available (often through double or right-clicking) to focus in on the information specific to individual tracks or regions. Where available, these usually offer some more precise editing features as well as an enlarged view of the waveform. More expansive crossfade functions may be available here as well, or these may be in yet another edit subwindow.
Due to the particular nature of the MIDI language and the intensive nature of editing MIDI data, most DAWs incorporate MIDI editing subwindows. These may allow editing of MIDI data as text, musical notation, or across an X/Y graph (pitch/time) similar to a player piano roll. (See the previous TCRM articles on MIDI sequencing)
The Mix Window offers a view which typically emulates a traditional mixing console. It allows control of levels and panning as well as audio routing (signal flow) and effects (including eq, dynamics, and reverb). It is also the best place to monitor signal levels on individual channels, subgroups, effects returns and the master bus. In many DAWs this is also where you’ll find virtual instruments. While a few of the above functions are shared with the edit window, it’s the particular combination and layout of these that makes the edit window indispensable.
It’s also important to note that effects added in the mix window are usually non-destructive, while those added in the edit window are destructive (they either rewrite the audio file or create a new one). This makes the effects in the mix window more processor-intensive, but easier to edit and automate. Edit window effects tend use up a lot of extra drive space and sometimes it’s more difficult to change their settings later.
The transport may be in a separate window, be a part of the main edit window, or both. It contains playback and record functions (such as play, rewind, fast forward, record, and return to zero). It may also contain information and/or controls for sample rate, bit depth, song location, clocking, sync, tempo and other aspects of the session.
I/O, Playback Tracks and Virtual Tracks
On stand-alone workstations, the number and type of inputs and outputs (I/O) is often fixed. In addition, the number of inputs may be the same as the number of simultaneous playback tracks. Neither of these is generally true with computer based DAWs.
The configuration of systems built around a personal computer is very flexible. To add more inputs, another interface is added… or the old one is swapped out for a more expansive one. In most cases, this has little to do with the playable track count. These are either fixed by the software, or are left open-ended (limited only by the configuration and power of the system). In this case, the number of tracks a DAW can play simultaneously is influenced by factors such as the drives, system bus, RAM (amount and type), CPU computational speed, and operating system, as well as the session bit depth and sample rate (more on these below).
Most DAWs, both hardware and computer-based, take advantage of hard drive capacities by allowing audio tracks to be stored (and edited) in excess of the playable number of tracks. These are called virtual tracks, and an indispensable part of the DAW’s charm. Tracks beyond the playback limit do not need to be deleted and can still be recalled when desired, usually by making a currently active track inactive.
This is great for recording take after take to get it right, or to edit together multiple takes to create a single comp track. If you’re finding that you need more tracks than your playback limit allows, you can submix your 12 drum tracks to a stereo track, and turn off all of the originals, freeing up 10 tracks for further vocal overdubs. If you discover that you need to remix the drums later, the original 12 tracks are still there to work with.
Audio Interfaces and Conversion
Audio interfaces provide the actual inputs and outputs (I/O) to a DAW. Analog I/O requires the interface to convert analog signals to digital on input and back to analog on output. The interface is a vital part of the system since it is the DAW’s link to the outside world.
When choosing what sort of interface to use with your DAW, it is important to know exactly what sorts of I/O capabilities you need and the detailed specs of the various interfaces. Don’t rely on the quick overviews often given in the ads or sales literature. These can be particularly misleading when it comes to I/O.
I have known many a poor soul who has purchased an interface only to find out later that it didn’t actually do what they needed. For instance, an interface with “ten inputs” may have two XLR mic inputs with preamps, four line-level inputs, a stereo S/P-DIF digital input, and a stereo optical digital input. This will not suit the needs of someone who needs ten analog inputs with mic preamps! To make matters worse, the interface may really toggle between inputs – for example, let’s say only one of the digital inputs is active at a time, and when the mic pres are used, two of the line inputs are turned off. That would really make this box a six-input device in this configuration: four analog plus two digital input channels. You’re given ten inputs, but they don’t all work at once.
The same situation is true of outputs, especially when it comes to the digital outputs, which often simply mirror what is sent to the first two analog outputs. It’s very common to add an ADAT optical output to an interface and claim it has eight more outs, even though some of those may be duplicates of the analog or stereo digital outs. Be careful to look deeper into the specs and try out the gear whenever possible, so you know what you’re really getting.
As stated several times already in this series, bit depth determines amplitude resolution and the inherent signal-to-noise ratio. Greater depths, such as 24 or 32 bits, are better at capturing and maintaining higher fidelity sound. Be aware that there are often several points in the signal path of most DAWs where bit depths may be different. There are the A/D converters, the DAW mixer, the written audio files, the effects processing, and the D/A converters, to name a few. A DAW that writes and processes 24-bit audio, but has a 16-bit A/D front end, will be limited to 16-bit audio quality at best.
This is not to say that there is no benefit to processing and/or storing audio at bit depths well above the A/D conversion though! In a digital system, every change made to a signal (conversion, level, EQ, reverb, effects…) has a negative impact on the signal-to-noise ratio. Multiplying or adding numbers in a digital system always introduces tiny errors from rounding off, and those errors translate into noise. However, all is not lost! The damage done is in inverse proportion to the bit depth at which it’s accomplished. That means the more digits you have for your math, the less the errors and added noise can be heard. It’s arguable whether we’ll ever need more than 24 bits of A/D or D/A conversion. But a signal that’s stored as 24-bit audio on disk, processed with 32 or more bits of resolution, and sent out through a 16-bit converter, will almost always sound much better than a signal that’s been converted, stored, and processed entirely at 16-bits.
This brings us to another aspect of DAW word length implementation, which is fixed vs. floating point data and processing. This describes how numbers are stored and processed: as long strings of digits with no decimal point (integers), or as numbers with a decimal point that can move around to provide shifting resolution. We won’t get into this distinction here; certain operations benefit from fixed point math and others from floating point, with respect to accuracy and speed on different computers. More bits in your data path will almost always help you; fixed vs. floating point may or may not.
The sample rate of an audio system is the number of digital words processed per second. It determines the range of frequencies the system can adequately resolve. The Nyquist theorem states that an ideal digital recording system can accurately record frequencies up to half its sample rate – for example according to the theorem a rate of 44.1 kHz can record up to a 22.05 kHz signal. Because of how digital math works, any sound beyond that limit will be misrepresented as a different, lower frequency, usually harmonically unrelated to the source sound. This is known as aliasing.
Well, a 22 kHz range would seem to be enough, considering the limits of human hearing… so why the recent push for rates up to 192 kHz?
One reason revolves around the design of filters known as anti-aliasing filters. They are lowpass filters used to remove frequencies above the Nyquist limit, so the process of conversion doesn’t mix a lot of aliased crud into the signal we want to hear. These filters must be quite severe: nothing above the Nyquist limit can get through, or you’ll have aliasing. The problem with filters of this type is that they often cause high-frequency phase shifting and other unwanted artifacts just under the Nyquist limit.
Now, at 44.1 kHz, those artifacts are down in the range of human hearing, and can be pretty obtrusive. But if we raise the sample rate to 96 or 192 kHz, the Nyquist frequency goes way up beyond the audible range… and so do the artifacts. Even better, we can use a filter with a gentler slope starting at a lower (but still supersonic) frequency, which reduces the artifacts to begin with. This is also the most compelling reason for the use of oversampling, where the sample rate is momentarily increased to allow better filtering algorithms.
While the use of higher rates is generally a good thing, there are a few drawbacks. Compared to traditional 44.1 and 48 kHz schemes, the 88.2 and 96 kHz rates take up twice the disk space and require twice the overall system speed and processing power. The 176.4 and 192 rates require four times the space, speed and power. If your bit depth goes up from 16 to 24 bits for data storage, disk usage goes up by 50%, and doing higher-resolution math requires more speed and power as well.
For these reasons, there’s usually a tradeoff of some sort. When choosing bit depths and sample rates you must balance system capabilities (such as track counts and available instances of plug-ins) with sonic accuracy. When deciding how to best balance these issues it is wise to note that there’s a much more obvious sonic advantage to choosing 24 bits than there is to choosing a higher sample rate – that’s why the default bit depth and sample rate for many hardware DAWs is 24 bits at 44.1 kHz. That being said, I would do all my work in 32-bit/192 kHz if I could….
In the current audio marketplace things are more confusing than ever. There’s an enormous amount of gear out there and the prices and features range dramatically. But what makes interfaces with the same basic features vary so widely in price? We’d all like to think that the bargain box is just as good as the high-end model and that we are wise (not just thrifty) for seeing past the hype. Alas, most of the time this is not the case.
Manufacturing tolerances, consistency, and quality control all play a part in how well a device will perform and for how long. Many inexpensive pieces of gear simply fail more often, break more readily, or die an early death. One of the most influential factors regarding the quality of audio interfaces is often overshadowed by sexy digital terminology and technospeak: namely, the analog components. This includes connectors, filters, power supplies, and balanced vs. unbalanced operation. Support and implementation of the drivers or control software is also important. There are good values for your money out there, but in general you do get what you pay for.
In TCRM 27 we’ll look further into making your DAW run at its full potential and begin to explore some good working methods.
John Shirley is a recording engineer, composer, programmer and producer. He holds a PhD in music composition from the University of Chicago and is a Professor in the Sound Recording Technology program at the University of Massachusetts Lowell where he serves as chairman of their music department. You can check out some of his more wacky tunes on his Sonic Ninjutsu CD at http://www.cdbaby.com/cd/jshirley.