The world beyond 20kHz
by David E. Blackmer
Using a study of the human hearing mechanism as his foundation, Earthworks' founder
David E Blackmer presents his arguments for, and his vision of, high-definition audio.
THERE IS MUCH controversy about how we might move forward towards higher quality reproduction
of sound. The compact-disc standard assumes that there is no useful information beyond 20kHz and
therefore includes a brick-wall filter just above 20kHz. Many listeners hear a great difference when 20kHz
band-limited audio signals are compared with wide band signals. A number of digital systems have been
proposed which sample audio signals at 96kHz and above, and with up to 24 bits of quantisation.
Many engineers have been trained to believe that human hearing receives no meaningful input from
frequency components above 20kHz. I have read many irate letters from such engineers insisting that
information above 20kHz is clearly useless, and any attempts to include such information in audio signals is
deceptive, wasteful and foolish, and that any right-minded audio engineer should realize that this 20kHz
limitation has been known to be an absolute limitation for many decades. Those of us who are convinced
that there is critically important audio information to at least 40kHz are viewed as misguided.
We must look at the mechanisms involved in hearing, and attempt to understand them. Through that
understanding we can develop a model of the capabilities of the transduction and analysis systems in human
audition and work toward new and better standards for audio system design.
What got me started in my quest to understand the capabilities of human hearing beyond 20kHz was an
incident in the late eighties. I had just acquired a MLSSA system and was comparing the sound and response of a group of high quality dome tweeters. The best of these had virtually identical frequency response to 20kHz, yet they sounded very different.
When I looked closely at their response beyond 20kHz they were visibly quite different. The metal-dome
tweeters had an irregular picket fence of peaks and valleys in their amplitude response above 20kHz. The
silk-dome tweeters exhibited a smooth fall off above 20kHz. The metal dome sounded harsh compared to
the silk dome. How could this be? I cannot hear tones even to 20kHz, and yet the difference was audible
and really quite drastic. Rather than denying what I clearly heard, I started looking for other explanations.
WHEN VIEWED FROM an evolutionary stand point, human hearing has become what it is because it is a
survival tool. The human auditory sense is very effective at extracting every possible detail from the world
around us so that we and our ancestors might avoid danger, find food, communicate, enjoy the sounds of
nature, and appreciate the beauty of what we call music. Human hearing is generally, I believe, misunderstood to be primarily a frequency analysis system. The prevalent model of human hearing presumes that
auditory perception is based on the brain's interpretation of the outputs of a frequency analysis system
which is essentially a wide dynamic range comb filter, wherein the intensity of each frequency component
is transmitted to the brain. This comb filter is certainly an important part of our sound analysis system, and
what an amazing filter it is. Each frequency zone is tuned sharply with a negative mechanical resistance
system. Furthermore, the tuning Q of each filter element is adjusted in accordance with commands sent
back to the cochlea by a series of pre-analysis centers (the cochlear nuclei) near the brain stem. A number
of very fast transmission-rate nerve fibers connect the output of each hair cell to these cochlear nuclei. The
human ability to interpret frequency information is amazing. Clearly, however, something is going on that
1
cannot be explained entirely in terms of our ability to hear tones.
The inner ear is a complex device with incredible details in its construction. Acoustical pressure waves are
converted into nerve pulses in the inner ear, specifically in the cochlea, which is a liquid filled spiral tube.
The acoustic signal is received by the tympanic membrane where it is converted to mechanical forces
which are transmitted to the oval window then into the cochlea where the pressure waves pass along the
basilar membrane. This basilar membrane is an acoustically active transmission device. Along the basilar
membrane are rows of two different types of hair cells, usually referred to as inner and outer.
The inner hair cells clearly relate to the frequency analysis system described above. Only about 3,000 of the
15,000 hair cells on the basilar membrane are involved in transducing frequency information using the
outputs of this travelling wave filter. The outer hair cells clearly do something else, but what?
There are about 12,000 'outer' hair cells arranged in three or four rows. There are four times as many outer
hair cells as inner hair cells(!) However, only about 20% of the total available nerve paths connect them to
the brain. The outer hair cells are interconnected by nerve fibers in a distributed network. This array seems
to act as a waveform analyzer, a low-frequency transducer, and as a command center for the super fast
muscle fibers (actin) which amplify and sharpen the travelling waves which pass along the basilar membrane
thereby producing the comb filter. It also has the ability to extract information and transmit it to the
analysis centers in the olivary complex, and then on to the cortex of the brain where conscious awareness
of sonic patterns takes place. The information from the outer hair cells, which seems to be more related to
waveform than frequency, is certainly correlated with the frequency domain and other information in the
brain to produce the auditory sense.
Our auditory analysis system is extraordinarily sensitive to boundaries (any significant initial or final event
or point of change). One result of this boundary detection process is the much greater awareness of the
initial sound in a complex series of sounds such as a reverberant sound field. This initial sound component
is responsible for most of our sense of content, meaning, and frequency balance in a complex signal. The
human auditory system is evidently sensitive to impulse information imbedded in the tones. My suspicion
is that this sense is behind what is commonly referred to as 'air' in the high-end literature. It probably also
relates to what we think of as 'texture' and 'timbre' - that which gives each sound it's distinctive individual
character. Whatever we call it, I suggest that impulse information is an important part of how humans hear.
All the output signals from the cochlea are transmitted on nerve fibers as pulse rate and pulse position
modulated signals. These signals are used to transduce information about frequency, intensity, waveform, rate
of change and time. The lower frequencies are transduced to nerve impulses in the auditory system in a
surprising way. Hair cell output for the lower frequencies are transmitted primarily as groups of pulses
which correspond strongly to the positive half of the acoustic pressure wave with few if any pulses being
transmitted during the negative half of the pressure wave. Effectively, these nerve fibers transmit on the
positive half wave only. This situation exists up to somewhat above 1kHz with discernable half wave peaks
riding on top of the auditory nerve signal being clearly visible to at least 5kHz. There is a sharp boundary at
the beginning and end of each positive pressure pulse group, approximately at the central axis of the
pressure wave. This pulse group transduction with sharp boundaries at the axis is one of the important
mechanisms which accounts for the time resolution of the human ear. In 1929 Von Bekesy published a
measurement of the human sound position acuity which translates to a time resolution of better than 10µs
between the ears. Nordmark, in a 1976 article, concluded that the intramural resolution is better than 2µs;
intramural time resolution at 250Hz is said to be about 10µs which translates to better than 1° of phase at
this frequency.
The human hearing system uses waveform as well as frequency to analyze signals. It is important to maintain accurate waveform up to the highest frequency region with accurate reproduction of details down to
5µs to 10µs. The accuracy of low frequency details is equally important. We find many low frequency
2