I do a lot of spoken-word production. This includes stuff like audio books, instructional videos, and voice-overs. In this type of work it’s very common for the narrator (sometimes called “the reader”) to have to read long passages of text. For example, let’s say you’re recording a self-guided audio tour for a historical site. There might be 20 stops on the tour, each one with a couple minutes of audio. For most narrators it’s going to be impossible to read all those words without making a mistake somewhere. Obviously you want all the words to be pronounced correctly, with no accidentally skipped words, and you want the pacing and tone of the narration to be right.
There are multiple approaches to correcting mistakes. In the first approach, whenever the narrator realizes that they’ve made a mistake, they simply pause, then back up to before the mistake occurred and restart. In this approach, the editor will later go back and delete the mistakes, closing up the holes. For example, if the narrator is trying to say “Jack and Jill went up the hill to fetch a pail of water,” but they accidentally say “Jack and Jill went up the hill to catch a pail of water,” it might end up sounding like this: “Jack and Jill went up the hill to catch a… oops… went up the hill to fetch a pail of water.” Then the editor would delete “went up the hill to catch a… oops…” and close up the hole, resulting in the correct “Jack and Jill went up the hill to fetch a pail of water.” The advantage to this approach is that the narrator stays “in the groove,” which can be important if the text is difficult. However, the disadvantage is that it takes a surprising amount of skill to do this. Otherwise, when the mistake is deleted, it ends up sounding like, well, like a mistake was deleted.
The second approach is to correct the mistakes while recording. So when the narrator says “to catch a… oops,” the engineer stops the recording, sets a punch-in point with pre-roll, and restarts the recording.” The advantage to this approach obviously is that there is no editing to be done afterward. The disadvantage is that it requires good communication between the narrator and the engineer, otherwise there can be a lot of wasted time for a conversation like “Where do you want to take it from?” “How about “went up?” “No, there’s not enough space there. Let take it from Jack.” Then the engineer needs to select the right amount of pre-roll so that the narrator can settle into a matching tone, loudness, and pacing. This is not as hard as it might sound; I’ve found that after working with the same narrator on multiple projects, we know each other well enough that we can do the whole process without even talking to each other.
In actual practice, it might not be possible to correct all mistakes while recording. Even if you think you’re doing that, you might discover upon listening to the recording that there are a few mistakes that need to be corrected. In that case you’ll need to mark the regions that need to be replaced, then punch each of them. This can be challenging if multiple days have gone by; you’ll need to recreate the exact same mic setup, and the narrator will have to match their voice from a previous day. Still though, it can be done. It does require skill, but after all, that’s why a narrator gets paid to narrate, and a recording engineer gets paid to record. 🙂
Here’s a tip that I’d like to leave you with. When punching in, I find that it’s best to start the pre-roll at the beginning of a sentence. For example, if you’re punching in at the word “fetch,” it’s better to pre-roll from “Jack” (the start of the sentence) rather than “went.” That’s because a pre-roll starting mid-sentence can throw the narrator off balance momentarily, and they may not be fully “recovered” at the punch-in point. This is a subtle effect, but I’ve noticed that punches sound more natural when the pre-roll starts at the end of the previous sentence. By the way, in case you don’t know it, in the Pro Tools edit window, option click to set the pre-roll point. After you’ve done this kind of work for a while, the keyboard and mouse gestures become automatic; you just do them without thinking, like you speak without thinking of the words.