FM 101, Part Two: The Basics

By: Howard Massey

Check out Part two of our FM101 Series by Howard Massey.

Digital FM: two words that strike fear in the hearts of many synthesists.

Too complicated, some say. Too cold-sounding. And anyway, it’s only good for things like bells and electric pianos.

Well, let’s face facts. It is a little complicated — certainly more so than standard analog subtractive synthesis or additive techniques — but by no means impossible to understand. (And no, you don’t need a degree in mathematics to use it, either!)

On the other two counts: Wrong, and wrong again. Sounds created with digital FM don’t have to be cold or edgy — in fact, there are lots of tools that let you add warmth and movement. And FM sounds are by no means limited to bells and electric piano (though admittedly it’s a technique that does both well) — you can actually craft a broad palette of tonalities such as brass, wood tones, even a decent approximation of the human voice.

In this series of articles, we’ll cover all the basics and tell you what you need to know in order to program FM sounds. While we’ll be referencing two specific Yamaha synthesizers that have digital FM capabilities (MONTAGE and MODX), the information will be applicable to other FM synths too — just check your instrument’s owners manual for the equivalent button-pushes.

Now let’s get started … right at the very beginning!

What Is A Sound?

At first glance, this may seem too basic, but the question of what makes up a sound is actually a lot deeper than it may appear. At its essence, a sound is anything we hear as a result of vibrations in the air. These vibrations in turn, cause our eardrums to vibrate in a similar fashion. The back-and-forth movements of the eardrum are converted by tiny bones in our inner ear into electrical signals that travel up nerves into our brain, where they are finally perceived by us as a sound.

Obviously, there’s an enormous gamut of sounds in existence — everything from a violin to a jackhammer, the wings of a butterfly softly beating to a crack of thunder. What is it that differentiates sounds from one another?

Actually, there are three factors:

The sound’s degree of loudness (in technical terms, this is known as amplitude).
Its pitch or lack of pitch (in technical terms, this is known as frequency).
Its quality (in technical terms, this is known as timbre).

Interestingly, all sounds can be described in terms of these three aspects, and these three only. What’s more, all sounds exhibit all three aspects. Think about it: Can there be a sound that has no loudness? (If so, it’s not a sound.) Can there be a sound that has no particular quality? Not possible. And while there can be (and are) plenty of sounds that have no pitch, they can then be described in terms of their lack of pitch.

Let’s take a look at each of these factors in turn.

Amplitude

The amplitude (loudness) of a sound is easily measurable in a number of different ways. There are, for example, hardware devices and software apps called sound pressure level (SPL) meters that measure amplitude in real time. Or you can record a sound into your favorite DAW (Digital Audio Workstation) and see the resulting waveform on your screen afterwards. Such a display might look something like this:

The height of the waveform shows its amplitude: the louder the sound, the higher the wave. (The “dB” at the top and bottom of the vertical scale are units called decibels — dB for short.) This makes perfect sense, since it’s a reflection of the displacement of air (the backwards and forwards vibrations) created when the sound occurred.

Here’s the same sound a little softer:

… and a little louder:

Obviously, every sound always changes amplitude during its existence (eventually dropping to and remaining at 0 dB) since there is no such thing as a perpetual sound that lasts forever. We’ll be talking a lot more about amplitude in future installments, but for the purposes of this article, let’s stop here and move on to …

Frequency

Just as you can determine the amplitude of a sound by the height of its wave, so too can you determine its pitch by the number of waves that occur in a given period of time — in other words, how frequently those waves occur (hence the name “frequency”). The lower the pitch, the fewer the number of waves in any given time period; the higher the pitch, the more waves occur.

The unit of measurement for frequency is the Hertz, or Hz for short. (There’s also a unit called the kiloHertz, or kHz, that represents a thousand Hertz.) This describes the number of waves that occur in one second of time, so a sound with a frequency of 1 Hz generates one wave (one backwards and forwards movement of air) per second; a sound with a frequency of 100 Hz generates a hundred waves per second; a sound with a frequency of 1 kHz generates a thousand waves per second, and so on.

We humans can only perceive pitches between roughly 20 Hz and 20 kHz (that is 20,000 Hz), though unfortunately the high end of this range gets reduced somewhat as we get older (and the rate of deterioration is accelerated when you’re exposed to lots of loud sounds, so turn those speakers down!). This is known as the audible range. Sounds do exist both above and below this 20 Hz – 20 kHz range; those below 20 Hz are termed subsonic (they’re the rumbling “feel” frequencies that make your chest pound and the dance floor vibrate), and sounds above 20 kHz are termed supersonic (they’re the invisible sounds that make your dog’s ears perk up!).

Frequency has a lot to do with whether or not we perceive a sound as being musical. For the most part, if you can determine a clear pitch (as, for example, in a violin note), the sound is considered musical. If you can’t (as in a jackhammer), it’s non-musical. But there’s a big gray area in-between. Can you clearly make out the pitch of a bass drum or a snare drum? Most people can’t, but drummers spend a lot of time tuning their drums, so maybe they can perceive something that the rest of us are missing. The same goes for other percussive instruments, like cymbals, shakers and tambourines. On the flip side of a coin, some individuals are able to perceive pitches in what most of us would consider non-musical sounds like a breeze blowing or the hum of an engine. The bottom line is that it’s somewhat subjective, though most of us can agree that most (but not necessarily all) musical instruments have a clear pitch component.

Every time a musical sound goes up an octave, its frequency doubles. For example, here’s the waveform of the A above middle C played on a piano:

And here it is played an octave higher:

… and an octave lower:

The Western musical system divides each octave into 12 equal components (more or less) called semitones, and the distance between any two notes is called an interval. The interval known as the fifth (actually seven semitones higher than the starting, or root, note) is the frequency midpoint. In other words, the fifth above the A note (the E) has 1 1/2 times its frequency, which is why it’s so pleasing to our ears. In standard tuning, the frequency of A is 440 Hz, so the E above it has a frequency of roughly 660 Hz (it’s “roughly” and not “exactly” because pianos use an equal temperament tuning system — a subject that’s well beyond the scope of this article, but Google it if you’re curious). These kinds of mathematical relationships will have a great deal of relevance when creating FM sounds, so it’s important to have a good grasp of them.

Now let’s move on to …

Timbre

At first glance, timbre may seem to be the toughest concept to grasp. Sure, we can call a sound “soft” or “warm” or “harsh” or “bright,” but those are all such indistinct, subjective terms, no?

Well, it may surprise you to learn that timbre, like every aspect of music, can actually be described in pure mathematical terms — one reason why computers are so good at synthesizing sounds. To understand this, let’s return to our example of an A440 piano note. When you depress that key, what happens is that, via a series of mechanisms, a hammer strikes a string (actually two or more strings, but that’s not especially relevant to this discussion just yet), which begins vibrating at a rate of — you guessed it — 440 times per second.

But that only happens for a split second, because as soon as those vibrations reach the back of the piano’s sounding board, the string begins rebounding back towards the front of the instrument, again at a rate of 440 times per second. But the original vibration continues, which means that the two vibrations criss-cross … resulting in a vibration of both 440 times per second and twice that amount (880 times per second):

So now our string is vibrating at both 440 Hz and 880 Hz, although the original 440 Hz movement (known as the fundamental frequency) will be stronger and therefore louder. But it doesn’t stop there, because once that 880 Hz vibration reaches the front of the instrument, it begins rebounding back again, causing a third vibration (of 1320 Hz — three times the amount of the original 440 Hz). This process continues over and over again until the string loses energy and the sound dies down to nothingness (in the case of an undamped piano note, this can take 30 seconds or longer, depending upon the length of the string and how hard the hammer strikes it).

So what we end up with is a string vibrating at many different rates, all at the same time, with each of these vibrations a mathematical multiple of the starting (fundamental) frequency. These extra vibrations are called harmonic overtones or partials. The overtone that’s twice that of the fundamental is called the second harmonic; the one that’s three times the fundamental is called the third harmonic, and so on.

But the story doesn’t end there, because no string or musical instrument is perfectly constructed, nor is there any such thing as a perfect acoustic space (don’t forget, the air surrounding the string is set into vibration too, and that in turn affects the string to a small degree). For these reasons, a number of extra vibrations always occur that are not mathematical multiples of the fundamental frequency — say, 441.2 Hz or 967.8 Hz. These “outliers” (which have only random mathematical relationships to the fundamental) are known as inharmonic overtones (or inharmonic partials).

This is not specific to piano (which we used here just for the sake of illustration), or to any particular musical instrument, for that matter; it’s actually a feature of all sounds in existence. And it’s the number and type of overtones, and their relative strength to one another, that determines a sound’s timbre. Musical sounds (piano, flute, birdsong, etc.) will tend to have more harmonic than inharmonic overtones, while non-musical ones (cymbal, tambourine, jackhammer) will have the reverse. But all naturally occurring sounds consist of a particular blend of overtones, and that’s why we can tell the difference between a piano and a flute and a cymbal and a tambourine, even if they’re all played at the same loudness level, and even if we’re in a room with all the lights out.

One of the inexplicable wonders of nature is that the timbre of a sound is reflected in its waveform in an almost poetic manner. The smooth sound of a flute, for example, is displayed as a gentle, rounded waveshape that looks like this:

Yamaha Synth · Part 2 Audio Clip 1 (Flute)

… while the brighter tone of a twanged guitar string has a waveform that’s more jagged:

Yamaha Synth · Part 2 Audio Clip 2 (Guitar)

In contrast, the sizzle of a cymbal is thoroughly spiky and irregular:

Yamaha Synth · Part 2 Audio Clip 3 (Cymbal)

In general, the gentler and smoother the sound (the result of fewer overtones), the more rounded and regular the waveform; the brighter and buzzier the sound (the result of more overtones), the more jagged and irregular the waveform.

If you want to learn more about the relationship between music and mathematics, check out my Yamaha blog “The Numbers Game.” For now, though, let’s start our exploration of digital FM — a great synthesis technique precisely because it gives us extremely fine control over all three aspects of sound: amplitude, frequency and timbre.

The Operator

The basic building block of digital FM synthesis is called an operator. It’s actually quite a simple software device (the fact that it’s “software” means that it doesn’t exist physically, just as a series of numbers — but no need to concern yourself with that), consisting of just three components: an oscillator (“OSC” for short), an amplifier (“AMP” for short) and an envelope generator (“EG” for short). Here’s the way they’re interconnected:

As you can see, the signal starts with the oscillator (same as it does in analog synthesis methods); it’s then sent into an amplifier, under the control of an envelope generator, which enables its amplitude to be varied over time (more about this in Part 4). The signal then leaves the operator, to be routed either to the output of your instrument so you can hear it … or, more intriguingly, to the input of another operator. (We’ll unravel this particular mystery in Part 3.) For now, let’s focus on an operator whose output can be heard directly, either through your synthesizer’s main outputs or its headphone output.

Such an operator is called a carrier, and up until the introduction of FM-X synthesis (available today in MONTAGE and MODX), most digital FM carriers were only capable of producing one kind of waveform — the simplest one known to man. This is the humble sine wave, which contains no overtones at all.

Wait, you say! Didn’t I tell you a just few paragraphs back that all sounds consist of a particular blend of overtones? Well, check the fine print, because what I actually said was that all naturally occurring sounds have overtones. (Sneaky, I know.) And sine waves do not exist in nature; in fact, they can only be generated by electrical circuits (such as the ones in analog synthesizers) and in digital emulations of electrical circuits (such as the ones in digital synthesizers.)

We’ve seen that the timbre of a sound is reflected in its waveform, with gentler sounds having more rounded shapes, so you might expect that the waveform of a sine wave would look something like that of a flute, only even more rounded … and you’d be absolutely right. Here’s what a sine wave looks like:

Ready to listen to a sine wave on your MONTAGE or MODX? It’s super-easy, since both instruments offer a simple procedure for calling up basic “initialized” sounds, including one for FM. Simply perform the following steps:

Press [CATEGORY SEARCH]
Touch Init

Select “Init Normal (FM-X)”

You’re nearly there, but in order to be able to more clearly hear the various exercises we’ll be giving you, you’ll want to turn off the reverb. This is also a very simple process, but if you’d rather skip the button-pushes, just go to Soundmondo (our way-cool social website that allows you to discover new synth sounds, as well as organize and share your own sounds) and click here to download the Performance named “Part 2_01.” (This is simply “Init Normal (FM-X),” saved without reverb.) For more information about Soundmondo, check out this blog article.

For you do-it-yourselfers, here’s the step-by-step procedure:

Press [PERFORMANCE (HOME)]
Press [EDIT]
Press [COMMON]
Touch Effect
Touch ON (green box) next to Rev (circled in red in the illustration below) so that it turns OFF (gray)

Here’s what it sounds like — the smooth, gentle tone of a pure sine wave, without any reverb:

Yamaha Synth · Part 2 Audio Clip 4 (Sine)

Whether you downloaded this from Soundmondo or created it with button-pushes, press the Store button to store it in your MONTAGE/MODX. (If you created it with button-pushes, rename it “Part 2_01” — the name under which it appears in Soundmondo).

Now we’re ready to move on to another basic concept, called …

Ratio

To begin this discussion, let’s go back to the illustration of an FM operator we presented earlier:

Obviously, this graphic is a bit lacking since an operator doesn’t decide on its own when to start playing, nor does it decide what pitch to play — those decisions are made by you, via control and pitch inputs. These two signals are generated by your synth’s keyboard when you play a note (or notes), or by an external MIDI device (such as a MIDI sequencer or external MIDI controller), or both. So here’s a more accurate picture of an operator:

The pitch that an operator plays is actually determined by three factors: the note you play, any transposition being done by your instrument (in MONTAGE or MODX, by pressing either of the front-panel TRANSPOSE buttons), and a crucial setting called Ratio. The Ratio number is used as a multiplier, so that, for example, if it is set to 1.00 and you play A above middle C (assuming no keyboard transposition), the operator multiples 440 (that note’s frequency — see above) by 1.00 and the resulting output will have a frequency of 440 Hz. If, on the other hand, you set the Ratio to 2.00 and play A above middle C, the frequency of the operator’s output will be 880 Hz. You can also insert fractional Ratio values; if an operator’s Ratio is set to 1.5 and you play A above middle C, you’ll hear a frequency of 660 Hz (440 times 1.5).

It will of course be easier to understand this concept if you can hear it, so call up the “Part 2_01” Performance you downloaded or created earlier and do the following:

Press [PERFORMANCE (HOME)]
Press [EDIT]
Press [PART SELECT 1/1]

This calls up the Edit – Part 1 – Common screen, which looks like this:

Next, either touch the OP1 tab at the bottom of the screen or press the [MOTION SEQ SELECT 1] button (this button calls up Operator 1 when working with an FM-X part). This will bring up the Edit – Part1 – Operator1 screen, which looks like this:

FM101P2 21

Play a few notes, then touch the Coarse box, shown in the red circle

Use the INC/YES button or data dial to change the Coarse setting to 2. The value in the Ratio box (circled in red) changes to 2.00:

Play the same notes again, and notice that they are now transposed up an octave. That’s because the operator is applying a Ratio of 2.00 (in other words, double) to each note being played. Entering in a Ratio of 3.00 multiplies each frequency played by 3 … which means you’ll hear not the note you’re playing on your keyboard, but that note an octave and a fifth higher. (In other words, if you play A above middle C, you’ll hear a sine wave of not 440 Hz, but 1,320 Hz [1.32 kHz], which is not an A but an E.) Try entering in different Ratio values — you’ll soon get the hang of how it works.

The Fine box next to the Coarse box allows you to enter in fractional Ratio values in 1/100th (.01) increments, from .01 to .99. To see how this works, return the Coarse value to 1.00, then touch the Fine box and move the data dial clockwise. Set the Fine value to 27, for example, and you’ll see (and hear) a Ratio of 1.27, which will result in some in-between microtuning notes not found in the Western scale:

Sharp-eyed readers will notice that there are two different “Freq Mode” (Frequency Mode) options — Ratio and Fixed — on the bottom row. We’ll be discussing the difference between these modes in the next installment.

So far, we’ve just been listening to a single Operator – Operator 1. But FM synthesizers always offer multiple operators; in the case of modern instruments like the MONTAGE and MODX, there are eight of them, all functionally identical, labeled OP1, OP2, OP3, etc. (Even early FM instruments like the original Yamaha DX7 offered six operators.) Let’s see what happens sonically when we bring in these extra operators.

Begin by again calling up the “Part 2_01” Performance (this will remove any changes you made and restore the sound the way it was) and then do the following:

Press [PERFORMANCE (HOME)]
Press [EDIT]
Press [PART SELECT 1/1]
Touch the OP1 tab at the bottom of the screen or press the [MOTION SEQ SELECT 1] button to once again bring up the Edit – Part1 – Operator1 screen.
Touch the Level tab on the left-hand side of the screen to bring up the OP1 level screen, then touch the Level box on the bottom row (circled in red).

Note that the level of OP1 is currently set at 99 (maximum). Use the DEC/NO button or the data dial to lower its level, playing some notes on the keyboard and listening as you do so. Obviously we are only hearing Operator 1 at the moment.
Restore the OP1 Level to 99, then touch the OP2 tab at the bottom of the screen or press the [MOTION SEQ SELECT 2] button to switch to Operator 2. Note that its level is currently at 0 — this explains why we haven’t been hearing it (until now, that is).
Use the INC/YES button or the data dial to increase the level of Operator 2, playing some notes on the keyboard and listening as you do so. Other than being a bit louder, it doesn’t sound any different, does it? Touch the Form/Freq tab on the left-hand side of the screen to discover the reason why:

Aha! Operator 2’s Ratio is 1.00, same as Operator 1. Touch the Coarse box and use the INC/YES button or data dial to change the Coarse setting to 2, then play a few notes and listen. You’ll hear two notes an octave apart.
Experiment by changing the Ratio of Operator 2 to various Coarse values and listen. When you set its Ratio to 3.00, for example, you’ll hear two notes an octave and a fifth apart, which sounds a lot like an organ. Setting it to 4.00 means that you’ll hear two notes two octaves apart, etc.
Restore the Coarse value to 1 and then experiment with changing the Fine value, playing the keyboard as you do so. With each new setting, you’ll hear the note you’re playing plus a second note, which, most of the time, will be completely unrelated and dischordant. The exceptions occur when you set Ratio values of 1.25 (which will result in your hearing the root note plus an interval of a third, which is very pleasing to the ear) and 1.50 (which will result in the root note plus an interval of a fifth, which is even more pleasing to the ear).
Continue your experimentation by changing various Coarse and Fine values for Operator 2, keeping Operator 1 at its root note Ratio setting of 1.00. When you’re done, restore the Ratio for Operator 2 to 2.00.
Now it’s time to bring in the other Operators; simply touch the OP3/OP4/OP5/OP6/OP7/OP8 tabs at the bottom of the screen or press the equivalent [MOTION SEQ SELECT] buttons to access each in turn. Set the Ratios for Operators 3 – 8 so that each has a Ratio one integer higher than the preceding one (i.e., Operator 3 should have a Ratio of 3.00, Operator 4 = 4.00, Operator 5 = 5.00, etc.).
Here comes the fun part — listening to them all together. But rather than having to laboriously go to each Edit screen in turn and use the data dial or INC/DEC buttons to alter the level of each Operator in turn, the MONTAGE/MODX offers a wonderful shortcut: the Control Sliders on the left-hand side of the instrument. Whenever you are working with an FM-X Part, these sliders are preset to control the levels of Operators 1 – 8 respectively. And because the settings you’ve entered in this case will result in a sound that’s very organ-like, the Control Sliders will essentially act like organ drawbars. Try it! If you go to the level screens of any of the operators (accessed by touching the Level tab on the left-hand side of the screen), you’ll notice that the Level value for each Operator changes as you move its associated Control Slider, and vice versa (changing the Level value in the screen alters the LEDs to the right of each slider). Note that you may have to move each Control Slider past its “zero” point to activate it — if a slider doesn’t seem to be working at first, just jiggle it up and down for a moment, and it will quickly become operational.
When you’re done experimenting with adjusting the relative levels of each Operator, set them all to a level of 78 and then store the Performance, renaming it “Part 2_02.” (Alternatively, you can find this Performance on Soundmondo by clicking here.)

Finally, let’s talk about the subtle pitch changes you can make with the Fine control. It’s easy to pick out the sound of two notes played an octave apart, or two completely different notes (say, an A and an E), but whenever you play two sounds that are close in frequency to one another (such as, for example, the two or three strings for each piano key), the human ear can no longer perceive them as being separate. Instead, we hear one sound, and if the frequencies are even slightly different, the sound may have a pulsation (a “beating”) occurring at a rate equivalent to the difference between the two frequencies. (This, by the way, is the technique used by piano tuners as well as many guitarists and bassists — at least in the days before electronic tuners and software tuning apps!)

To demonstrate this phenomenon, load the “Part 2_01” Performance and then do the following:

Use the Fine control to set Operator 2 to a Ratio of 1.01, then store the Performance and rename it “Part 2_03.” (You can also find this Performance on Soundmondo by clicking here.)
Play a single note on the keyboard and raise Control Slider 2 so that you can hear Operator 2 along with Operator 1. Because they are so close in pitch, you’ll hear a single sine wave with a gentle pulsation — and this beating is something you can even see in the waveform:

Yamaha Synth · Part 2 Audio Clip 5 (Sine Single)

Now play the octave above that same note. The movement is more rapid now — in fact, it’s occurring twice as fast. The reason this is happening is because the Ratio value is used as a multiplier, so with these settings, if you play A above middle C, for example, you hear a sine wave with a frequency of 440 Hz (Operator 1), plus another sine wave at 444.4 Hz (Operator 2); the difference between the two is 4.4 Hz, which is the rate of the movement you’re hearing. But if you play that same note an octave higher, you hear a sine wave at 880 Hz (Operator 1), plus another sine wave at 888.8 Hz (Operator 2), for a difference of 8.8 Hz — double the rate of movement you heard when playing the note an octave lower.

Yamaha Synth · Part 2 Audio Clip 6 (Sine Octave)

These multiple movements in the sound make for a very pleasing effect. For example, try playing a C major triad (middle C and the E and G above it), which sounds (and looks) like this:

Yamaha Synth · Part 2 Audio Clip 7 (Sine Chord)

Last but not least, let’s talk about a similar function in that same screen, called Detune, shown here circled in red:

This works similarly to the Fine parameter, except that it’s even finer … plus it allows you to alter the frequency of operators negatively (that is, to Ratios below the Coarse setting) as well as positively. You’ll find that a Detune value of 13 (plus or minus) is roughly equivalent to a Fine value of .01.

You can, of course, add in the other operators and set each to different Fine and/or Detune values to create intricate movements within your sound … even if that sound just consists of simple sine waves. Now imagine the possibilities with complex waves (that is, waves that have overtones). Tune into Part Three (“The Magic of Modulation”) to see just how that works … and a whole lot more!

If you missed the first part of the series and want to know more about the history of FM synthesis check out Part One: Discovering Digital FM…John Chowning Remembers.

Want to share your thoughts/comments? Join the conversation on the Forum here.