My Take on Music Recording with Doug Fearn

Audiophiles Guide to Music Recording - Part 2

February 01, 2024 Doug Fearn Season 1 Episode 88
My Take on Music Recording with Doug Fearn
Audiophiles Guide to Music Recording - Part 2
Show Notes Transcript

This is part 2 of the Audiophiles Guide to Music Recording. You can listen to part 1 at
Or you can access part 1 wherever you listen to podcasts.

My Take on Music Recording is primarily aimed at people in the professional recording world, but there are a significant number of listeners who are music lovers and audiophiles. This episode provides an overview of the recording process for them. However, I think even people in our profession might enjoy how I attempt to explain the recording studio process in layman’s terms.

This reflects my experience and how I work as a producer and engineer. I tend to carry over the tools and techniques that I have learned over the last five decades. They work best for me and my style of recording. I know that there are other approaches, and I try to acknowledge and explain those, too. But my focus is on what I do, which isn’t always mainstream.

Your feedback on these episodes is especially interesting to me. Tell me what you think.

As always, thanks for listening, commenting, and subscribing. I can always be reached at


88   Audiophiles Guide to Music Recording    -   Part 2

 I’m Doug Fearn and this is My Take on Music Recording

 This is part 2 of Audiophiles Guide to Music Recording. If you missed the first part, this discussion will make more sense if you listen to part 1 first.

 In the previous episode I talked about the history of music recording, the technical requirements of a recording studio, and microphones and how they are used. This is a continuation of that discussion, covering what actually happens in a typical recording session. I will talk about different techniques to create a finished recording, and look at the mixing and mastering process. I also will provide an overview of subjective loudness.

 The Recording Session Begins

Most pop recording these days starts with a click track. This is simply a constant tempo, which sounds like a metronome. The tempo is chosen by the artist and producer and put down as the first track of the recording. Everyone plays to that precise rhythm.

This simplifies editing, since every part of every take will be at exactly the same tempo. It also makes it easier to add new parts on top of the basic recording. It is possible to have the computer automatically move every note to line up precisely with the click track. This eliminates any sloppiness on the part of the players, but at what cost? The result may sound mechanical.

Playing to a click track works well for many types of music that benefit from an unchanging, steady beat. But other types of music benefit from some changes in the tempo in various parts of the song. If you listen to any of the classic Motown songs, you will find that they all start at a slower tempo than they finish. The song continually speeds up throughout. The studio players were certainly capable of playing at a constant tempo, but the music was enhanced by speeding up through the song.

Personally, I have never recorded to a click track. I like the ebb and flow of the tempo as the musicians feel it.

Once all the players are in place and ready to record, the engineer has to adjust the gain of each mic channel for optimum recording. In the days of tape, that was as close to maximum level as possible, to minimize the inherent background noise of the tape. Tape has about a 60dB dynamic range, which can be extended to about 85dB with the use of complex noise-reduction schemes implemented in external hardware devices.

Digital has a different restriction. The recording level can never exceed the digital maximum, defined as digital zero level. More on that in a minute. In audio, the maximum level for any electronics is defined, rather confusingly, as zero level. That is the level measured by a calibrated VU meter. It is at the boundary between the black and red markings on the meter scale.

Tape recording also had to consider the pre-emphasis curve, which is similar to the RIAA curve for vinyl records. The highs are boosted about 20dB, and attenuated by the same amount on playback. This reduces noise. The same technique is also used on analog FM radio.

Any audio that had a lot of high-frequency content, like the percussive attack of a piano note or cymbals, would saturate the tape. Piano recording to tape had to be done at about 10dB or more below maximum level to keep the piano sound pristine. Digital audio does not use pre-emphasis, so no compensation has to be made for high-frequency content.

The old standard, the VU meter, was a mechanical device that responded to the average level, not the peak level. It conformed very well with perceived loudness. The inability of a VU meter to show the true peak level was not particularly important in the days of recording to tape, since tape does not have a clearly-defined maximum level in the real world. Engineers would take into consideration the peak to average ratio they knew from experience, and set the levels accordingly.

Today, those VU meters are largely replaced by digital meters that can respond to the peak level. On a computer screen, those peaks can be shown with reasonable accuracy.


An experienced engineer can pre-set the levels quite accurately, based on the instrument, player, and microphone used. It is important to set the levels quickly and precisely because studio time, and often musicians’ time, is expensive. And it is important to minimize adjustments when the players are ready to start recording. The engineering requirements must be transparent to the performers. They do not need that distraction.

Generally, every instrument and voice will be assigned to a separate, isolated recording, called a “track.” Some instruments benefit from multiple microphones, each going to a separate track. This is especially true on something like drums, where many engineers use anywhere from two mics to dozens of mics. The many-mics approach is used on most sessions.

Since there will always be a time difference in the arrival time of the sound of any one drum, the multiple mics can smear the sound because of the phase differences.

I prefer to just use one stereo mic to pick up all the drums and cymbals. This requires a drummer that plays with good balance, since the levels of the individual drums cannot be changed later. My technique is only used by a minority of engineers.

Some instruments benefit from recording with a stereo pair of mics. I usually use a stereo pair on a piano, and often on other instruments. I have done sessions where almost every instrument is recorded in stereo, but, again, I am in a minority.

Multiple mics and stereo mic’ing can use up a lot of tracks. Digital recording does not have a restriction on the number of tracks, until the count gets into the hundreds. Tape machines have a finite number of tracks, usually 24.

Once everyone starts playing, the producer and engineer need to determine if what they are hearing in the control room conforms to the goals for the recording. Often, changes in the microphone placement will be required. Sometimes the microphone will be changed to better complement the instrument. Occasionally, it will be necessary to move players around to satisfy the multiple requirements of the recording.

It is important to minimize the disruption and distraction of changing things once everyone is playing. Making changes will use up more time, and probably affect the mood of the session. Consequently, it is important to get it right from the start, which is always a challenge. The conflict between the technical requirements and the quality of the musical performance is ever present. A good engineer knows how to minimize this distraction.


Usually several individual performances, called “takes,” will be recorded. Often everyone involved will come to the control room to listen to an early take. That is a time when the players can get a better sense of how their part fits with the others. It is also an opportunity to discuss the parts, without the awkwardness of listening through headphones, since microphones, except for vocalists, are not placed to pick up the players voices in most cases.

It is necessary to minimize the number of takes because often the more times a piece is played, the less enthusiastic the players become. I find that in most sessions, the third to fifth take is usually the best one, although in some situations there may be 20 or more takes before the optimum point is reached. It is the producer’s job, often along with others involved, to decide when the best performance has been captured. This is frequently not the most precise take, but the average listener will respond to the emotion of the performance, with little awareness of the details.

This is not to say that a sloppy take will be used. Everyone involved wants their name to go on an excellent performance, so this is a tricky balancing act.

Often, a mistake or less-than-inspired performance by a particular player can be fixed later. The producer might fix a wrong note by replicating the same note or passage from another take that was better. In digital recording, this can be done by copying and pasting, like you might do with a word processor. Of course, the replaced part must be precisely placed. Even a few milliseconds off can affect the feel of the music in some genres. This also requires that the tempo is consistent from take to take and within a take. That is not a problem with experienced players, but it could be with others.

Sometimes the entire performance of one instrument will be replaced later with a better rendition.

The producer and engineer have to keep track of any imperfections and make a decision if the problem is minor enough to ignore, acceptable, or whether it possible to fix it later. Or it might be serious enough that another take is the best option. A mistake by one player may influence the playing of the others. Those decisions have to made quickly.

Usually, a serious mistake will cause the players to stop altogether. Perhaps the take was excellent in all regards up to that point. The engineer will keep that incomplete take because it might be useful later to assemble a complete, perfect take.

Another common technique in recording is called “punching in.” If only one musical part is defective in some way, it can be replaced with a new version, with just that instrument or voice performing.

In the days of tape, this was a precision operation because the tape would be playing back and at the exact right spot, the engineer would “punch” the record button. This wiped out what was previously recorded at that point on the track, and replaced by the new performance. It could not be undone. And it was further complicated by the mechanical nature of the tape machine, which had to initiate erasing before the recording could be made. The time delay was short – just the distance between the erase head and the record head, in combination with the speed of the tape. But it did require a slight anticipation of the punch.

Those restrictions are eliminated in digital recording, where the delay time is negligible, and the punch-in is always perfectly clean if you time it properly. If you mess it up, you can simply hit the “undo” button to return to the previous state. Of course, the engineer’s error could prevent a perfect performance from being captured. It is important to stay fully-connected with the punch-in process, both technically and aesthetically.

It is also vital that the punch-in occurs on the right verse or chorus. With digital recording, the engineer can set markers to show the structure of the song. But that implies there is time to add those markers – not always possible during a busy session. Interrupting the artistic flow can be devastating to the capture of the best possible performance.

Sometimes the punching process can proceed linearly, from the beginning to the end of the song. Each section, line, or word can be recorded over and over until the artist and producer are satisfied. That can get tedious and does not always result in the best result.

Often, the punch-in can even be modified after the fact, to set the precise point in time when the recording goes from the original to the new performance.

I prefer to avoid punching-in. I like to capture the cohesiveness of a continuous performance whenever possible.


Once the main musical instruments are recorded, it is common to add additional parts. Sometimes just those basic first tracks are all that are needed. But more commonly there will be additions, called “overdubs.” The additional parts may be needed by the artist or producer to achieve their vision for the song. That could be multiple vocal parts, either lead vocals or background vocals. Perhaps a solo instrument needs to be added. A solo is often deferred to later so that the musician can take chances to create something at the limit of their ability, which would be inadvisable to do when there is a room full of other players who have to play their parts over and over as the soloist perfects his solo.

Other overdubs might be a horn or string session, or additional versions of the original instruments to make their parts more dramatic, or adding things like percussion instruments. Every song is different, so this aspect of recording is highly variable.

It is often necessary to add parts later because the player cannot make it to the original session. And many times, having the basic tracks to listen to will help the musician refine his or her part, resulting in a better performance than if they played with everyone else.

Vocals are a special case. In most songs, the vocal is the central part of the song. It is the thing that most listeners will focus on.

In most recordings, the vocals are done later, usually at a separate session. This gives the singer the opportunity to concentrate on just singing, without concern for playing their instrument, or what the other players are doing. Some singers prefer to do this with as few other people involved as possible, while other singers prefer to have a small audience in the control room. A lot depends on the type of music.

Recording is an unnatural situation for many performers, who are accustomed to the feedback and energy from an audience. The studio environment is quite sterile compared to a performance space. The lack of an audience is disconcerting for many artists. The artist needs some experience in the studio before they can deliver their best performance.

On the other hand, the studio is a place to push one’s performance to the limit. If something they try doesn’t work, they can keep trying, or switch to another approach. That makes the studio a better place for experimentation.

It is not uncommon that the vocalist’s approach to singing the song will change after hearing the creative input of the other musicians. Ideally, the vocal will be recorded after everything else is completed, but this is not always possible.

Modern practice with digital recording permits keeping every vocal performance, good, bad, or indifferent. That is not possible when recording to tape. In digital, a perfect performance can be constructed from all of the various takes.

I prefer to get one good performance from beginning to end. I think the song is more cohesive that way and the singer can inhabit their song and put their energy into conveying the emotion of the song. Once again, I am in a minority with my approach.

Overdubs are often recorded at a different studio, using players that could not make the trip to the original recording studio. The studios doing these overdubs could be anywhere in the world.

 Sampled Instruments

Another recording technique that has become popular is to dispense with the musicians altogether and use what are called “sampled” instruments. To create these “samples,” excellent players of any instrument record every note in the range of the instrument. They will probably record multiple performances of those notes at various intensities and styles. The thousands of samples are compiled into a format that can be played on a keyboard.

In the hands of a talented player, a single musician can create an entire production by himself. In fact, a large proportion of music for video is recorded this way, even for feature films. Big budget movies will probably still use a real orchestra for the film score, but often the “incidental” music in the soundtrack will be created from sampled instruments.

The results can be stunningly impressive. Unless you know what to listen for, most people will find the performance using sampled instruments to be entirely credible.

Entire pop songs can be created using only sampled instruments, except for the main vocal performance.

Many songs use “loops,” which are simply a repeating pattern derived from scratch or from another song. In the old days, we did this with literal loops of tape. But today this is all done in the computer.

Synthesizers are another way of creating sounds. This bridges the gap between “real” instruments and sampled instruments. Sounds that could not exist in the real work can be created.


Once all the parts are recorded, which could take hours to months to complete, the final recording is simply a collection of individual tracks of instruments and voices. It is now time to combine them to make the finished recording.

This is called the “mix.” It is a mysterious process much of the time, relegated to specialists who do nothing but mix the songs others have recorded. They are often very good at what they do, with many hit records they have mixed.

The job of the mix engineer has expanded recently with the need to re-release thousands of old recordings in Atmos and other immersive formats.


One argument says that having someone who has not heard the song hundreds of times is better able to objectively combine the tracks into an effective final product. They may also have the best perspective on how to make a hit record.

Another point of view is, who knows the song better than the producer or engineer who worked on it from day one? The producer should have a plan from the start about what the song is ultimately going to sound like. The producer, and often the engineer, have an overview of the goals of a song and they may know best how the various parts should be blended to achieve their vision.

Either way, the goal is to make an effective recording for the public.

What “effective” is, is highly subjective. There are millions of ways the tracks could be blended and modified. Only a few of these are viable. It is a creative process, but one that is always dictated by the original intent of the artist. Or at least it should be.

Often, the artist is involved in the mixing process, but that is not universal, especially if a third-party mixer is involved. No matter who is doing the mix, it is a creative process, an opportunity to try different approaches. Other mixes are formulaic: the artist had a hit record and it is necessary to maintain that signature sound. Or the goal is to imitate another hit record. For an Atmos mix of a classic song, the original stereo mix must be constantly referenced to make sure the new mix preserves the sound that fans know well.


Ideally, every song deserves its own approach. It is a time for experimentation. Mixing sessions are boring to someone not involved. It can take time to analyze what is happening and try various interpretations.

If the original producer is doing the mix, chances are he or she has lived with the song from inception and knows the parts intimately. Over the course of the recording, it is usually necessary to make intermediate mixes, which might be sent to someone who will be adding a part, or to an arranger. And also to the artist, so that they can hear what has been done. Some artists are heavily involved in every step of the recording process. Others have confidence in the people they are working with and trust their artistry.

Because they have made numerous mixes throughout the creation of the song, the producer probably has a very good idea of how the parts should fit together. That allows them to proceed with the mix without a lot of time spent figuring out what is on each musical track.

A typical pop recording will probably have somewhere between a few and one hundred tracks, depending on the music.

Classical recording is more straightforward in many aspects. Overdubbing is rarely used. However, the approach to classical recording has evolved over the years to be more like pop recording, with dozens of microphones on individual instruments and sections, instead of just a few for an overall stereo recording.


In the pop genres, in addition to adjusting the balance between all the instruments, the mixer can choose to use equalization or compression on an individual part. Unfortunately, this is often a corrective measure, not a creative one. If the tracks are performed and recorded properly from the beginning, little correction should be needed.

There are other tools available, such as digital software to correct an off-pitch note. That works remarkably well in most situations, but nothing is better than playing or singing in tune to begin with. There are always artifacts from the pitch-correction process. In recent years, these artifacts have become a sound that is used creatively in some songs.

If a studio has an appropriate reverberation time for the music, there may be no need to add artificial reverb. But the majority of recordings, even classical, utilize artificial reverb.

This was provided by a variety of acoustic and electromechanical devices in the past, each with its own distinctive sound. But in the past few decades digital reverb is most often used. This gives the mixer hundreds of different acoustical environments to choose from. Often, the reverberation is derived from a real acoustic space, usually one famous for its beautiful characteristics. A mixer might have dozens of different concert halls from around the world to consider.

Other programs emulate the sound of classic reverb devices, like the EMT plate developed in the 1950s. Many major studios had actual echo chambers. A loudspeaker would introduce a version of the recording into a highly reverberant room designed to have a particular sound. The echoes in the room were picked up by one or more microphones, which were combined with the original sound. Software emulations are available for many of these classic echo chambers.


There are many other effects and tools available to the mixer, some of which are bizarre and of limited use for most music, but maybe perfect for a particular song. One common example is called “delay.” This is simply a distinct echo of the original sound. Used with taste, it can add a sense of space to a recording. It is especially useful on voices, or on an electric guitar part.

Through the mix, it may be necessary to adjust the levels of individual tracks. Sometimes this is needed to provide emphasis or drama. Other times, it is needed to correct a problem. This could be as simple as muting a vocal track, for example, if the singer coughed.

Another reason to modify levels is to eliminate extraneous sounds on a track, like turning a page of music, or the sound of something dropped.

In the analog days, this was done by physically moving the faders as required. A complex mix might need several people to handle the various faders. Every mix was a performance.

Today, this is easily handled by a computer, which record the various changes for permanent recall. One person can create all the necessary changes and modify them until they are perfect.

Other things can be automated, such as reverb levels, panning, or equalization.

Often the mix will be re-done multiple times before everyone is satisfied. The mixer will listen to the mix in several different environments, to see how the mix translates in each of them. They might listen on a typical home system of any size, and on their smartphone, and in a car. They will listen on headphones and ear buds. The goal of any mix is to make it sound good for every situation. It is an impossible goal, but experienced mixers find the best compromises. With years of experience, many mixers know exactly how their mix will translate into the various listening situations. They no longer need to actually listen in multiple ways.

A mix might be monitored at a very low volume, to reveal any balance problems between instruments. Conversely, mixes that were done at a very high monitor level generally do not translate well for the average listening environment. Our hearing is very non-linear in many respects, and the absolute volume changes the perception of the instrument balance, especially at loud levels. I find it best to mix at a volume similar to how the ultimate listener will hear it. Whether we like it or not, some people enjoy their music as background, and their needs have to be accommodated, as well as those who listen at hearing-damaging levels.

The mixer has to take into account how the music will be used. The requirements for CD, vinyl, streaming, download, broadcast, and film are quite different, both in the technical specification and in what makes the best reproduction of the music in each medium. A good mix should translate well for all media, but the formats must be changed for the intended use. Sometimes several entirely different mixes are needed to achieve that.

The mixing software can record all the settings, which can be recalled perfectly later. This has led some people to continue to refine a mix, perhaps indefinitely. It is not unusual for a mixer to hear something later that bothers them. If the song has already been released, there is little recourse. But if it hasn’t been made public, it may desirable to go back and do yet another mix to address those issues.

My general approach to mixing has always been to turn up the things I like and turn down the things that annoy me. That is terribly over-simplified, but you get the idea. A mixer has many conflicting things to consider, but ultimately, the goal is to provide the listener with the best possible interpretation of the artist’s intent.


In the days when vinyl records were the only way that a listener would hear any recorded music, the final step was making the lacquer master disc that would be used to create the metal parts used to press the records. This was a challenging art, since the disc process is burdened by dozens of imperfections that need to be addressed. The specialist making this work was called the mastering engineer. 

They were expert at translating the recording into something that not only sounded good on the vinyl disc, but also accommodated the myriad of turntables and phono cartridges, some of which might not handle the extremes in level or frequency content without skipping, distortion, or just sounding bad.

This art has largely disappeared with the decline in vinyl record sales over the past 40 years. But it has not disappeared, fortunately, so we can still get quality sound from LP records.

As in every other aspect of recording, compromises are necessary. Managing those compromises is the special expertise of the mastering engineer.

When the CD was introduced, many of us wondered if there was still a role for mastering engineers. Well, they reinvented themselves for the digital age, mainly specializing in making digital recordings as loud as possible. They also used other audio processing tools to correct or enhance the sound of the mix.


Since the invention of the audio compressor in the 1930s, there has been a never-ending “loudness war.” Most producers, engineers, artists, and record companies want to have the loudest record possible. The pursuit of extreme loudness extends to radio stations, too, often resulting in audio so dense that it is unlistenable.

This ignored the fact that most music, even the most pop of pop music, does have some dynamic range inherent in the music. By eliminating the dynamic range, the sound it certainly loud, but it is also annoying and fatiguing to many people. It can actually sound quieter and less compelling.

One invention that drove the current loudness war was the digital compressor. This software had an advantage over hardware compressors in that it could “look ahead” at what the audio a few milliseconds later was doing before it got to the compression stage. This allowed the compressor to anticipate all the peaks in signal level and bring them down as necessary to make a uniformly dense and loud recording.

Analog audio usually has quite a lot of headroom designed in, to accommodate the sudden peaks in loudness and handle them with grace. At least professional gear is designed that way, as is high-end consumer audio products.

That works because the tape recording medium and the disc medium have no clear-cut maximum level. The distortion may increase, but a small amount of that is acceptable. But digital does have a distinct limit. When you run out of numbers to describe the audio waveform, the result is extreme distortion of the most unpleasant type.

Therefore, it is necessary to be conservative in the amount of audio level you are recording. For example, I aim to keep the level on all instrument and vocal tracks below -6dB, referenced to digital zero level, the absolute maximum. I set the levels according to the type of sound, which usually results in an average level of -12dB.

But releasing a recording with a peak level of -6dB would not be competitive today. It would be considered too “quiet.”

The usual answer is to set the absolute maximum level to 0dB, regardless of the material. A digital limiter will do that perfectly. So most recordings have every peak hitting precisely at that 0 level.

The problem is that most consumer electronics will distort at 0dB. I master all my releases to minus 2dB, which is a compromise between maximum loudness and digital distortion. The actual peak level, interpolated between samples, is around -1dB. This works well for me. But I am in the minority on this issue.


Loudness perception is a complex topic, and measuring it has always been challenging. Analyzing the music in the digital realm has come closer to providing a loudness measurement that corresponds well with what most people perceive. Of course, a lot depends on the actual sound level the listener is exposed to. Listening to music at very loud levels invokes the inherent level limiting action of our ears, and thus imposes another form of audio compression on what we hear.

In recent years, some of the streaming services have established loudness guidelines for the music they provide to listeners. Spotify, for example, specifies -14dB Loudness Units Full Scale. This is based on loudness research done in Europe for TV and radio broadcasting. We use the same measurement methodology to determine the loudness level on recorded music.

Note that this is a loudness standard, not a peak level description. Zero is still the peak level. The -14 is the perceived loudness.

Interestingly, most records made in the analog era have a loudness comparable to this loudness standard.

If a record company, or independent artist submits a song to Spotify, for example, that audio file is analyzed for loudness. If it exceeds -14dB LUFS, the actual level of the entire song is brought down to meet that standard. If the music is below -14dB LUFS, nothing is done to it.

That makes the levels between different songs much more consistent. It also takes away some of the incentive to make a recording as loud as possible. I think that is a positive thing.

But what about music that has very loud and very quiet sections, like most classical pieces? Since the criterion is loudness computed from the entire recording, it is possible to have sections where the loudness level is -25dB for example, and loud sections could be -10dB. As long as the average comes out to -14, it is considered within specification.

But music at -10dB LUFS can sound really bad, so the constraint is to keep the performance sounding good throughout, without excessive level on the loud parts. This is challenging in some music, and the recording world is still struggling to find the best compromises. It can be done, but it may take considerable level manipulation during the mixing process.

The discussion of subjective loudness is a topic for another time. It is complex and the standards adopted by various streaming services and broadcasting are inconsistent and still evolving.


Recording has never been about perfect reproduction of the original music. That is not a realistic goal, given the limitations of all recording and reproducing systems. The lower the quality of the reproduction chain, the more manipulation has to be done to accommodate that lowest common denominator.

And often a perfect reproduction of the original sound would not be an effective recording.

The goal should always be to provide a good experience for the consumer. I often tell people the job of recording is to help make the emotional connection between the music creator and the listener as perfect as possible. And that means doing as little damage to the sound as possible.

But by necessity, we have to deviate from our goal of perfectly rendering the music. We need to make the necessary manipulations to help the artist achieve their musical goals. And we have to provide the listener with the best possible version of the music, using whatever it takes to achieve that, and under a wide range of listening situations and equipment quality.


This has been a brief overview of a very complex topic. If you would like to hear more on any of the things I covered, please let me know.

Your comments, suggestions, and questions are always appreciated. Please keep sending me your questions. When enough of them are received to make a Q&A episode, I will do so. My email is

And thanks for listening and subscribing.


This is My Take on Music Recording. I’m Doug Fearn. See you next time.