Producing a podcast: recording, editing, mixing, mastering

In this thread we can discuss all things related to recording, editing, mixing and mastering audio for podcasts or radio productions.

Since September I started to produce a podcast for the art platform I run. I do the sound and editing, a friend does the mixing and mastering for now, but in time I want to learn to do that myself.

I use the DAW I knew the best before starting with this podcast, Ableton Live. I have a very basic question which I can’t find the answer to after searching on the internet. Can you ‘glue’ two or more audiotracks together, so that the audio doesn’t fase shift in relation to each other? This would make the editing so much more simple!

Other, more general questions:

  • which DAW do you use for editing your podcast?
  • which mic’s do you use (I use shure sm 58’s)
  • can you dissect the editing process? I’m very curious how other people go about doing this. Do you have any tip and tricks?
  • do you mix and master the audio yourself? if so, how did you learn to do this? what are some good learning resources?

I’m really looking forward to any feedback and answers!


Not really a podcast but I used to do a DJ radio show where I spoke between tracks etc., best setup I had was a Shure SM7B into a TG2 pre for the spoken word bits, lovely sound. I would use TDR Kotelinikov compressor and Slick EQ to get it sounding nice and balanced. For the audio tracks I would adjust them all to -20 LUFS integrated before doing the mix in REAPER, which would ensure they were all at similar volume. Then it was really just balancing the spoken word with the music for a natural sound. I’d then pseudo-master it to get to a level of around -16 LUFS integrated for the entire show (the recommended level for podcasts etc.) I’d occasionally have to limit it a little to get there, and I’d sometimes run it through the analogue mastering chain for “vibe” if it was a really special show.

Don’t think any of those are online now, but there are a load of the audio only ones still up on my Mixcloud.


i so much need to write some kind of tutorial on that (for my users at work). The real difficulty is that it is mostly informal knowledge easily explained in a practicing session but kind of convoluted to write about.

imho SM58 are difficult to use for spoken voice. The proximity effect is too pronounced, thus the timber of voice will dramatically shift with every slight change of position from the guests. Some options:

  1. if the recording is made in a good sounding room (which for voice means basically dead, no echoes and very short reverb time), use static cardioïds further away (20-30 cm from each speaker’s mouth). Or why not try a figure-of-eight ribbon between the interviewee and the interviewer, they did it this way in the old days.

  2. Use microphones that have less proximity effect and that pick up a more consistent sound around them. Any dynamic omni can do that very well (i like the Beyer M58 very much on voice). The Electrovoice RE20 (a dynamic cardioid) is also very forgiving on placement.

You want a decent preamp for each microphone. Either in 19" rack (something like the cheap but ok-ish dbx286) or the one in your audio interface if it is a Motu/RME.
To give an idea i consider the mic preamps on the Focusrite scarlett 2i2 are not good enough.

i use Harrison Mixbus just because it is based on Ardour, and Ardour allows you to edit without stopping the playback (and in 2006, that was some novelty!) (splice audio by pointing the mouse and pressing a key, do it twice, select a zone and delete it, etc.) which is great for faster editing.
On top of Ardour, Mixbus provides inline EQ+comp+“tape simulation” that mean i can do most stuff without even opening any plugin dialog. Again, this is fast and convenient. I insist on speed because…
the process of editing
is always longer and more tedious than one wishes. Given the chance, just record things in a “live-to-air” setting, so there is minimal editing to be done.
Also, i tend to record only the output of the mixer, not isolated tracks for each mics. It’s way easier to edit (ie. no need to “glue” tracks together. The only DAW i’ve tried where this feels doable is Reaper in “ripple all tracks” mode.)
You might still want iso tracks to save the day if you are not sure that your room+microphones placement will not lead to phase issues.
So, editing, basically it’s listening to the entire tape, building a mental map of the content, and making decisions to make this content comply with what you want out of it. So it helps to know before hand what you want, and take notes while recording (or markers in daw if you record directly to computer).
Since Mixbus or any other linux daw doesn’t do source/destination editing, i do it manually: i put my source material on one track, and then cut each part i want to keep and put it on a “destination” track, without ever moving stuff relative to time on the source track (ie my source track ends up with only the bits i did not use, scattered across time at their original place).
Levels and mastering
Like @Gregg said. I use slightly different LUFS values, but basically you want to align all “externally produced and mastered” material (ie. music) around -20 (-23 for me) LUFS, so that you have headroom for the voices. You want to compress voices a bit, but not too much (unless you want to sound distorted like top 50 radios).
This depends a lot on your original recording conditions and the feeling you want to convey (your microphone choice is also determinant in this).
You can always slightly “compress” the entire mix to end up around -16 LUFS after everything is done (most likely this will leave the music untouched and will only shave off any leftover peaks from voice). This is where i love Mixbus, because the included leveler/compressor/tape sim/limiter on each track/bus/master make it easy to chain multiple compressions.
Voice color
that’s really a topic of personal and cultural preferences, but imho we don’t need unnatural bass in spoken word. Different languages will need different actions on the spectrum and need/tolerate more excess in certain bands. English is especially weird in that regard so take the internet litterature on “how to get a great sounding voice” with some critical distance ^^’

I’d really focus on the start of the chain (location and microphones) to get great sound. Anything done later won’t fix it (or at the expense of great pain and time!) if there are plosives, excessive sibilances, HVAC noise, etc.
Damn, sorry for the wall of text.


Oh snap, it’s my time to shine. I’ve spent more time editing podcasts than I care to think about…

DAW For any post audio or podcast work I’ve always used Adobe Audition. (Music stuff is all done in Reaper.) If you have the Creative Suite, you have Audition, and its workflow has worked really well for me over the years. This built-in izotope fix-it / cleanup tools are great, but I also use izotope RX.

Any good DAW will work great, though.

Microphones anything besides a headset mic is usually fine. More important is microphone technique and having a pop filter. A pop filter is mandatory, a shock mount is nice. Most of the people I’ve used with use a Blue Yeti since it means they won’t need a preamp or interface. That works well. The Snowball is not as good. An RE20 is really, really, good but more money than even most “professional” podcasters are willing to spend. SM58’s always surprise me with how good they sound.

Plugins I’ve built up a channel strip of plugins that goes on each panelist’s channel, including an EQ (high pass), noise gate, de-esser, the shaping EQ, and then the compressor. Always high pass to get rid of desk thumps, keyboard rumble, and general low frequency nonsense that you don’t need. I usually have a cutoff at 40 Hz. All them are saved in templates.

Some of my most-used plugins for podcasts are:

  • Toneboosters Evoke (great channel stripe with built-in high pass, noise gate, and de-esser. I don’t use the compressor though.)
  • Klanghelm DC83 compressor
  • Fabfilter Pro-Q.
  • All of the izotope RX plugins, they are black magic that have more than paid for themselves over the years.
  • Toneboosters Barricade on the master bus.

@gregg is correct about loudness metering, that’s pretty crucial. I use tonebooster’s loudness plugin and monitor everything against R128, shooting for -17. The goal of loudness monitoring (and level management in spoken word) is that the listener should never, ever have to reach for the volume control on their end.

Editing When editing out “uhms”, “ahs”, and pauses, get friendly with your DAW’s “ripple delete” command, which deletes the selection and shuffles everything else forward by that amount.

Oh yeah, templates are key. They save a lot of set up time and keep things consistent.

Most shows I’ve worked on are multiple panelists at different locations, so we always have a Skype reference track. The host will record the Skype call while everyone does local recordings. Then I get sent all the files and use the Skype call to time align. Because the files are so long, they will get “out of synce” periodically, so I have to split them and line them up again.

This is a personal preference, but I never, ever pan panelists left or right, I keep them centered. Music tracks are kept in stereo and not sent through the “vocal bus”.

The biggest challenge I usually have is in the mundane stuff. The shows that are challenging are those where there are guests every week, so every week you have a new challenge in terms of:

  • I wasn’t using headphones I was listening through speakers, is that a problem lol (yes)
  • I was recording near an airport / hospital / helicopter launchpad, is that a problem
  • There were guys cutting down a tree in my front yard but I didn’t close my window, is that okay
  • I’m eating throughout the whole show, you can take that out, right?
  • etc

Easy mode is when you have the same two hosts and they know what they’re doing. Plop the files in the templates, go through the show and edit as needed, done.

I’ve edited close to a thousand podcast episodes over the last 10 years, most of them over on the Idle Thumbs network. I’ve scaled way back because of freetime constraints and my day job increasing in responsibilities. I have plenty of thoughts on this, but those are some initial thoughts.


If you end up in a situation where you have separate left and right mono WAVs instead of a stereo WAV (or a stereo WAV with two different instruments or voices panned hard left and right) then this free software by Sound Devices is great to split and combine files. I use it all the time on poly WAVs where all tracks are recorded to a single polyWAV file that can later be split up as one sees fit.


i used to use ableton live, but then i started spending a lot more time in my video editing software so i switched over to that (final cut pro x). i like making a compound clip with all audio files synced up and some basic levels done, and then i do the edit, only going back inside the compound clip when a distracting noise occurs in a mic not being used for the current speaker. then when the edit is done i’ll make another compound clip and throw a compressor and some eq and de-essing on the edited conversation. i don’t know much about mixing/mastering, i just try to make all people and music be at lastly i do the intro and outro and put in the music. i always export in mono to reduce the file size.

i tend to edit a lot (about 30 minutes out of an hour and a half conversation) which makes it more interesting throughout, but sometimes can make it tricky to make everything flow and feel natural. i usually do a double time playback for the first pass through and eliminate sections that feel boring or irrelevant. then i go back through and cut out ums uhs and likes and while doing that try to streamline the whole conversation and cut out shorter sections that are inessential. then i’ll listen to it all on a walk or drive and take notes on it and finish it up.

i use a zoom h5 to record and i use the built in mic w/ foam pop filter so i can adjust levels while keeping an earbud in one ear. for my one-on-one interview podcast i use an audio technica at2020 with a foam pop filter for the guest’s mic plugged into my h5. the at2020 has xlr and usb connections, so i usually use it for recording in my intros/outros directly into my computer, too. for my new podcast i have three hosts/panelists and then one guest, so i use the second mic input on the h5 with a handheld condenser that i have had for a while that i like on vocals, so far just with a beer coozy for the pop filter. and then for the guest’s mic i tried using a tascam lav with it’s own recorder thing, and it has actually worked really well because i can just have the guest clip it on their shirt and then they can relax and they don’t have to worry about how they’re holding the mic and i don’t have to worry about how they’re holding the mic. i’ve been thinking about trying lav mics for everyone since the podcast is very relaxed and i think it lends itself better to forgetting that the microphone is there and it could help make the conversation even more natural.

1 Like

yes this is what I experienced as well, thanks so much the advice!

I have an RME Fireface UCX so I’ve decent pre-amps. Although it seems it has only two Mic/Line pre-amps (?), so that’s not enough if I have 3 or more mics I want to use.

i will look into this, sounds really interesting.

I’ll have a look at Reaper to see how the “ripple all tracks” works for me.

this is really helpful, thank you so much!!

thanks for your responses @Gregg @hermbot @fourhexagons @cedarskies
this is all very helpful information!


Doing it for a living more than music now, I had to let go of a lot of my “artiste” habits from music production in the name of expediency and hitting deadlines. It’s refreshing and liberating actually.

-I use Ableton to compose but it would be my last choice for editing speech.
-Highly recommend Hindenburg - it’s designed and streamlined for simple editing and mixing tasks. Worth it alone for the auto-leveling (gets your audio in the target LUFS ballpark) and ease of drawing in fades and levels.
-The majority of my work is in Pro Tools - that and Audition are industry standard for most jobs, with Hindenburg becoming popular amongst more journalist/NPR types who are less technically inclined. Other DAWs are fine if you’re the only one who touches it, but you might as well learn a more standard one so you can get more work. They may be more expensive but they pay for themselves with one day of work.
-Like others, for me Izotope RX is a must-have for fixing problematic audio. I use the whole RX Post Production Suite heavily (Neutron for mixing, Ozone for mastering, Insight for metering), but it’s overkill for most unless you clearly know it will pay for itself.
-I also use the stock Pro Tools plugins a ton, especially if I know the session is going to pass through a lot of hands, and they get the job done totally fine too.
-Before I got the Izotope PPS I used Waves WLM Plus for LUFS metering and L3 Multiband for leveling master volume, both often be had for $29.
-LUFS: everywhere is slightly different but usually whoever hires you will give you a target number. Check out these articles on for some great in-depth explanations. (BTW The whole Transom website is an unbelievable resource; I went a step further and attended their residential workshop, which I highly recommend if you are interested in producing & reporting as well - basically a crash course in audio journalism.)

-ALWAYS print stems, so that you can extend/shorten sections smoothly to fit the speech.
-Often, anything more than 4 tracks gets distracting and steps on the speech. Be careful with sounds that over 2k (esp synths without much filter applied), they can interfere with speech intelligibility.
-common move is to have score enter full volume after a thought you want to accentuate, and then automate it down underneath. letting things breathe a bit with score after something heavy or deep gives the audience a chance to think about what they just heard.
-Automating huge reverb throws is a great transition trick (Gimlet shows do this a ton), I use Valhalla for this.

Editing speech:
-when you’re editing around a breath, try both the one to the “left” and to the “right” of the edit, often one is more natural than the other.
-You can cut down meandering/redundant sentences by listening for the "comma"s in someone’s thought.
-You’d be surprised at the mid-word edits you can get away with by cutting on a consonant, they tend to be more consistent than vowels.
-If someone is talking extraordinarily slowly, I will shamelessly time-stretch them 2-4% faster.


Ha, yes, I’ve definitely gone both slower and faster as a courtesy to the listener. Just enough to make it easier on the ears but not enough that someone would think it was sped up or slowed down.

So much great info in your post, thank you!!

Wanted to check in with a game-changing mixing addition for me, the TC Clarity M:

Connects via USB, with a plugin that talks to it.

I use this to manually automate to my target loudness of -16, (I’m using a limiter as well obviously), and then I do an AudioSuite pass of RX Loudness Control for a final leveling.

Loudness metering removes a huge part of the subjective cognitive burden of mixing, and having something dedicated makes me watch the meters significantly more than before when I was just periodically checking.
(Obviously you could do the same with a small second display and the loudness meter plugin of your choice. edit: I have been doing this with an iPad in Sidecar mode for when I am working on my couch or kitchen table, and the benefits are the same, though for my desk I still really do like having the dedicated hardware)

I find my mixes have radically improved because the meter focuses the brain in a different way than a waveform. The improvements have translated to mixing music as well.


hello lines community!

i have a very basic editing question that is bugging me and that potentially can save me a lot of time. I edit in Ableton live (I know not the best editor for podcasts, but it’s the one I know best).

I have 3 tracks, one for each speaker. I’m cutting out the ‘uuhhs’ and other empty moments and non essential stuff. But when I cut it out, I have to select all the tracks, until the end of the conversation, to slide it to the left in the timeline to fill in the silence*… Is there an easier/faster way to do this? I hope someone can help me out with this!

*otherwise the tracks go out of sync with each other


The keyword here is “ripple” edit/mode. I don’t use Live but this tutorial seems to show how to achieve this in it.

edit. another possibility is to apply eq/comp/gate/mute to each voice as needed, then to mixdown to stereo and edit the mixdown (i like to keep it simple).


oh thank you so much! both very helpful solutions :slight_smile: you made my day

Yes, ripple edit is very important. I have it set to a single hotkey (q) and use it more than any other command while doing podcast work.

1 Like