[Preface: There is no argument for an objective superior. Steaks, hamburger and sloppy joes are all great. But to not know what you’re eating is only letting yourself down ….]
Any avid music fan has probably had the argument with a friend (or foe) about what the best way is, in terms of format, to listen to music. Since Napster shattered the customs of the music world in the late 90’s mp3s have become synonymous with contemporary music. The iPod has since come along and informed us we no longer needed shelves for our music collection, just a pocket. These developments are currently pushing the CD format closer and closer to its inevitable extinction. Yet ironically, as the CD slowly dies, vinyl records are storming back into popularity. So it appears that while the MP3 has unquestionably made music more portable and “share-able” (it is truly awesome to be able to bring your entire music collection on a plane ride!), it doesn’t seem to have what it takes to wipe out other formats completely.
So lets take a look at the science behind music formats and how we hear in general. An educated listener is a better listener indeed, and you may be surprised by what you didn’t know. We must start by examining sound in general.
All right, lets get some simple things straight about the way sound works for us humans and our brains. In general the human ear picks up frequencies between 20 hertz (Hz) and 20,000 Hz; hertz meaning the number of vibrations per second (“sound” is simply our brains perceiving minuscule air pressure changes, or vibrations). Yet the truth is most adults are only capable of hearing up to around 16k Hz (a little higher for females, you lucky ladies) because we lose the ability to perceive higher frequencies as we age. Sounds do indeed exist below 20 Hz (think of when you feel deep bass without actually hearing it) and upwards well beyond 20K Hz (think of a dog whistle, we don’t hear it but the pups sure do). So while we can pick up the most important swath of the sound-spectrum, there does exists a great deal of sonic information we just never hear because of the limits of our ears & brain. [Note: this phenomena also exists with our eyes, we only see a tiny portion of the electro-magnetic spectrum, which we call light & color]
So why care about these sounds our brains’ cannot even perceive, what the heck does that have to do with musical formats and listening to your tunes? Again, we have to look at some science basics (bear with me!). Sound is mathematical. Lets say you play an A major chord on an instrument. The fundamental frequency of an A major is 440 Hz, so that will be the most present frequency we hear, yet it will not be the only. Here is the math; that A note will also create and sound out its harmonics (or “overtones”), which are always multiples of itself. This means that 440 Hz A note will create another “harmonic” at 880 Hz (440 x 2), another at 1320 Hz (440 x 3), and another one at 1760 Hz (440 x 4) and it goes on and on. Harmonics are a large part of what make notes played by instruments interesting to our ears. Because different instruments (or vocal chords for that matter) will inherently create different harmonic relations to the fundamental frequency, this is in turn the reason why there exists a difference in sound from instrument to instrument, even when they play the same mathematically identical musical note. This difference is referred to as an instrument’s “timbre”. Think of a computer created “true tone”, one with no harmonics; it’s a shrill and sterile sound. So, consider this question; if the chords and notes that make up our music all create harmonics that are out of our hearing range, do those sounds have any affect upon what we do hear? Hold onto that thought, however, we can now begin our discussion upon music formats.
Lets start with the coasters. A CD, due to its bit-rate, can only reproduce sounds between 20Hz – 22.05k Hz. The bit-rate of a CD is 44.1k, which means a CD plays 44,100 digital samples of sound a second. And due to a mathematical phenom known as the Nyquisk Theorem the highest frequency that a digital audio source can create is equal to half of the bit rate (it take at least 2 samples to recreate a frequency), hence a CD can only recreate up to 22,050 Hz (44.1K/2 = 22.05k). Yet as one might infer, the more samples per second of a sound source the higher fidelity (sound quality) the resulting sound is. So on a CD the higher frequencies, while being present, are of a lower fidelity than the lower frequencies. So in short; even though the original recorded music contains sonic information above 22k Hz, that information is forever lost. But at least all of the sonic information between 20-22k Hz is present and not altered by a compression algorithm. However, because there is no compression is also why an album on a CD may take upward of 700+ MB. What does compressed mean? Well meet our friend the mp3.
Mp3s are great because they don’t take up as much memory as a CD, making it possible to collect an amazing amount of music on one’s hard drive. But how is the necessary memory reduced? It is done by taking out information from the original CD recording, both sonic information and mathematical redundant information. In the end about 91% of the sonic information is removed. The mathematical algorithm that change WAV files (CD’s format) to mp3s rely on some tactics to do what they do. Basically the algorithms take into account some of the principles of sound and human hearing we have been discussing, known as psychoacoustics.
Here are some of the basics of mp3’s psychoacoustic compression techniques:
#1; They compress the sound information in a “lossy” way, which means that it can never be uncompressed back into the original file again. They are essentially taking samples of the samples from the CD version. Sonic information is simply lost forever, hence “lossy” compression.
IMPORTANT NOTE: This means when you burn a CD from an mp3, you are not getting CD quality sound. Also, the bit-rate of the mp3 greatly affects the fidelity of the music. The mp3 algorithm is essentially taking samples of information from a CD, which is just samples of sound. So that means there are less samples of the sound per second, so that sounds which have lots of information in a short amount of time (known as transients) will not be reproduced as well. What kinds of sounds have these “transients”? For one, percussive ones; the attack of a snare drum is more difficult for an mp3 to replicate than acoustic guitar, this should be kept in mind.
(Side note: There are plenty of lossless compression techniques available that don’t use psychoacoustic data removal techniques such as; Free Lossless Audio Codec (FLAC), Apple’s Apple Lossless, MPEG-4 ALS, Monkey’s Audio, and TTA.)
#2; It is known that the human ear hears certain sounds better than others (for survival purposes our brains learned to value frequencies in which voice occurs over others), so all frequencies above 15.5k Hz are permanently removed from mp3s. Also, much of the very low end is flattened (20-80 Hz), the thinking being that few stereos or headphones will accurately reproduce them anyway. They slightly boost the frequencies between 1-4k Hz as well (which are those the human ear is most sensitive to).
IMPORTANT NOTE: This means sonic information from the original recording is being thrown away, and altered, before being presented to your ears and brain.
#3; It is known that the human ear is better at picking up which direction a sound is coming from the higher the frequency the sound is (think of an ambulance siren vs a car with loud subs, which one can you pinpoint the location of better?). So to take advantage of this fact and save memory the mp3 algorithm reduces the stereo information of non-high frequency sounds into mono information.
IMPORTANT NOTE: This means there is just less stereo information. Music from an MP3 is just more “mono’ish”.
#4; It is known that the human ear has trouble hearing certain sounds above other, louder sounds. Usually when one sound is 6dB (or 4 times as loud) louder than another, the human ear doesn’t initially pick it up, it is “masked.” Mp3s notice when there are sounds quieter than 6db relative to the loudest sound being play, and will reduce that sonic information to focus on the main sounds you can easily hear.
IMPORTANT NOTE: This means the quiet subtle sounds are being cut out on mp3s, and everything must be about as loud as everything else to be heard. This will reduce the dynamic range that can be heard in an mp3, and reduce the ability to have quiet and loud sounds presented at the same time.
#5; It is known that the human ear can have trouble perceiving sounds that are closely related in time. So while you will notice if two sounds occur (think of two drum taps) 10 milliseconds apart, you will not notice them 2 milliseconds apart (you will perceive it as the same sound). Mp3s take advantage of this by scanning for alike sounds that are so similar in time, and then removing that sonic information.
IMPORTANT NOTE: This has an implication for sounds that have pre-echo, and echo (commonly known as reverb), as well as for tightly timed choruses (as when someone sings over their own singing). This means that reverb or cymbal sounds (which have a pre-echo type sound) will be reproduced differently for an mp3 than for a CD.
So what is all this fuss you ask? In the end it seems as if the mp3 is then just a customized & streamlined way to listen to audio; indeed, the mp3 algorithm is an amazing invention! It is truly amazing that that so much fidelity can be conserved while removing 91% of the sonic information, and the invention of the mp3 should not be lamented. It is wondrously magnificent to be able to bring one’s entire music collection in their pocket when traveling, or commuting, do not doubt it! But knowing that these psychoacoustic techniques of the mp3 algorithm take out sounds we usually consciously cannot perceive the question arises – does the mp3 affect the quality, and correspondingly the level of enjoyment, of our listening to our music?
To assist in addressing that question let’s take a look at two Japanese studies done on sounds that people cannot perceive. The first study ( which can be found at http://jn.physiology.org/cgi/content-nw/full/83/6/3548) was done by examining people’s EEG, or the location and intensity of electric signals in the brain, when exposed to the same recording played with different inherent frequency information. Take a look at the pictures below. Baseline = our brains without any sound. LCS = our brains when just sound information above 20k Hz is played (which means they are inaudible sounds). HCS = our brains when sound is played which exists only between the normal hearing range (as with a CD). FRS = our brains when all sounds are played, the normal range plus everything above that we are unable to perceive.
One can clearly see that it is when the recording with the entire range of frequencies (even with those we can’t hear on their own) is played, that our brains gets the most excited! To quote the study’s results, “When the conditions with audible sounds (i.e., FRS or HCS) were compared with those without audible sounds (i.e., LCS or baseline), the bilateral temporal cortex, presumably the primary and secondary auditory cortex, always showed significantly increased rCBF [activity] as expected. More importantly, when FRS was compared with HCS, deep-lying structures in the brain were significantly more activated during the presentation of FRS than during that of HCS.”
The second study, also done in Japan (http://www.jstage.jst.go.jp/article/ast/24/4/197/_pdf), took a different approach. They exposed people to music twice. Each exposure consisted of the volume being set for a first run, then on a second run the participants were allowed to set the volume level themselves to whatever they deemed a comfortable listening level. On one exposure they were listening to music existing in the HCS range (as with a CD), on the other they were exposed to the FRS (same music with the additional frequencies we “can’t hear”). On average (the experiment was done many times on many people) when people were listening to the FRS music they considered their comfortable listening level to be close to 1 dB higher. While this may seem like very little change to us, scientifically if we cannot hear a difference, there shouldn’t be any at all in the human response to it. To quote the study, “The averaged comfortable listening level of the sounds containing HFC above 22 kHz was significantly higher than that of the sounds from which HFC above 22 kHz have been removed.”
In short (maybe actually in long), you don’t have to take my word for it; these inaudible frequencies being present do change the experience of the listener, whether they are directly aware of it or not. This is stated not to make some suggestion we all should go out and buy exorbitant audio systems that can reproduce sounds up to 100k Hz and use only fancy Super-Audio CDs; it is just being argued that sonic information that lies beyond the normal hearing range still affects the end product, our music. Just because we can’t consciously perceive those frequencies alone, they seem to have at least a subconsciously effect when played along with the frequencies we can hear (remember our harmonics talk..?). Is this actually surprising? Isn’t music a subconscious experience in many ways, what genres we inherently like or what memories or thoughts music brings up, this is all decided by the subconscious! We should not confuse subconscious with un-important.
So now what about that mp3… Do you feel any different about taking out frequencies that actually lie within the realm we can hear when we see that taking out frequencies we cannot hear has an effect? Yes, maybe we can do a psychoacoustic removal technique here and there without degrading the fidelity much, but do you really now think that taking out that much sonic information (remember our 5 points from above) and not create an effect on the music and therefore the listening experience?
Again, we should still support using mp3s, just as long as you know what you are listening to. We have to be honest with ourselves about what the mp3 is. It is a flattener of music, there is little to no information in the very low and high end. It is a mono’izer of music, there is less stereo information and a reduction of the stereo image. It is less dynamic, quiet sounds or similar sounding sounds are thrown out. This means you will have to “bass boost”, you won’t hear the two different acoustic guitars as well, reverb will be reproduced very poorly, drums will sound a little more bland, and whether the lead singer whispers or screams you’ll probably hear it at the same volume.
But it’s not just that the mp3 changes how music is digitally stored, its that it now changes the way it is recorded. Why have a great stereo image? Why have complicated delicate sounds? Why have dynamics, where a song may be quiet at one point and louder later? Why use delicate or complex reverbs? If the public is just going to take in the music on an mp3, then why get complicatedly creative, when its going to sound simple. HERE is the crux, when the public eats mp3s and doesn’t even realize what they are taking in, then we are feeding them burgers and they think they are getting steak! Again, I love burgers; you can get em anywhere, you can eat them in your car or on a plane, they are cheap and plentiful. But when I am home and have time and I really want a good meal, I want some filet mignon! Imagine not even knowing that steak was an option… this is what I fear our listening culture is moving towards; having the simpler option becoming the only option.
There is another music format we have yet to touch upon… wax. Vinyl records can reproduce frequencies well above 20k Hz. Depending on the quality of the system they can reach 50k Hz, and some say even higher. And because vinyl is a literal copy of a sound wave, and is in no way comprised of “samples”, it reproduces all frequencies across the sound-spectrum with greater quality and equality than CDs or mp3s. This means the high frequency information is replicated with as much fidelity as the low frequency information. And personally, with a decent audio system, there is nothing like listening to sound from a needle vibrating in a wax groove. Music is comprised of complex vibrations and is “analog” by nature, therefore sounds originating from analog sources will have a certain “realness” that gets lost in converting sound into 0s and 1s, and then back again.
But one must realize this doesn’t mean “records win every time”. Vinyls have to be taken care of, you have to flip them to listen to an album, and you need a decent sound system/set up to truly appreciate them. You can’t play them in your car, and they need physical storage space, and you should even store them a certain way. The sound can be altered by repeated playback. There are undoubtedly certain drawbacks one cannot get around with vinyl, and it is not realistic way for everyone to listen to music. But vinyl LPs do make you care about the quality of your sound, something that seems to be lost on many people these days.
So what have we learned from all this bickering about audio format fidelity? Most importantly; the more musical information there is, the more our brains enjoy it. Vinyl records have more information than CDs, and CDs much more than mp3s. Mp3s are compressed and derived from CDs, and that they are not equals, yet that’s okay (but know that when you decide to purchase one or the other). However, one must remember that this also implies that no matter what format you use to listen to a recording of a guitar playing, they all pale in comparison to the amount of sonic information – and therefore enjoyment you can derive – by having one’s ear actually be in the guitar’s presence; i.e. no music format is of higher quality than live music!
But the real crux of this discussion is that it is a great shame when people eat only hamburgers and sloppy-joes without ever knowing steak exists! If you love music, then learn about how you are listening to it, and you will love it all the more!
Written by Sean Poynton Brna
OurVinyl | Editor
In case you want to check some of the stats and where I got some of my information. Although this was also written with information I have gained from my schooling and career.