2.04 September 10, 2002 Add support for 24 bit samples
2.01 August 2, 1997 Add specification for Modulators and standard NRPN implementation
2.00b May 2, 1997 Change nomenclature from layer/split to zone. See glossary
Fix a few typos
2.00a October 18, 1995 First publicly released draft
0.2 Disclaimers
THIS SPECIFICATION IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER INCLUDING ANY
WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTEE
OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION, OR SAMPLE.
A LICENSE IS HEREBY GRANTED TO COPY, REPRODUCE, AND DISTRIBUTE THIS SPECIFICATION FOR
INTERNAL USE ONLY. NO OTHER LICENSE EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
OTHER INTELLECTUAL PROPERTY RIGHTS IS GRANTED OR INTENDED HEREBY.
AUTHORS OF THIS SPECIFICATION DISCLAIM ALL LIABILITY, INCLUDING LIABILITY FOR
INFRINGEMENT OF PROPRIETARY RIGHTS, RELATING TO IMPLEMENTATION OF INFORMATION IN THIS
SPECIFICATION. AUTHORS OF THIS SPECIFICATION ALSO DO NOT WARRANT OR REPRESENT THAT
SUCH IMPLEMENTATION (S) WILL NOT INFRINGE ON SUCH RIGHTS.
This preliminary document is being distributed solely for the purpose of rev iew an d solicitation of comments. It will be
updated periodically. No products should rely on the content of this version of the document.
SoundFont® and the SoundFont logo is a registered trademark of E-mu Systems, Inc. E-mu Systems licenses a “SoundFont
Compatibility” logo for a nominal fee; please contact E-mu’s SoundFont administrator by FAX at (408) 439-0392 for more
information. Users of the information contained herein should refer to files conforming to the specification as “SoundFont
Compatible,” with appropriate acknowledgment of trademark ownership.
0.3 Updates and Comments
Please visit http://www.soundfont.com for specification updates, and please send comments via e-mail to
soundfont@emu.com.
0 ABOUT THIS DOCUMENT ..............................................................................................................................................1
0.3UPDATES AND COMMENTS...............................................................................................................................................1
1.5FUTURE ENHANCEMENTS TO THE SOUNDFONT 2STANDARD ..........................................................................................6
2 TERMS AND ABBREVIATIONS .....................................................................................................................................6
3.2THE SOUNDFONT 2CHUNKS AND SUB-CHUNKS ............................................................................................................12
3.3REDUNDANCY AND ERROR HANDLING IN THE RIFF STRUCTURE...................................................................................12
4 SOUNDFONT 2 RIFF FILE FORMAT ..........................................................................................................................12
4.1SOUNDFONT 2RIFFFILE FORMAT LEVEL 0..................................................................................................................12
4.2SOUNDFONT 2RIFFFILE FORMAT LEVEL 1...................................................................................................................12
4.3SOUNDFONT 2RIFFFILE FORMAT LEVEL 2...................................................................................................................13
4.4SOUNDFONT 2RIFFFILE FORMAT LEVEL 3...................................................................................................................14
4.5SOUNDFONT 2RIFFFILE FORMAT TYPE DEFINITIONS ..................................................................................................15
5 THE INFO-LIST CHUNK................................................................................................................................................16
6 THE SDTA-LIST CHUNK ...............................................................................................................................................20
6.1SAMPLE DATA FORMAT IN THE SMPL SUB-CHUNK.........................................................................................................20
6.2SAMPLE DATA FORMAT IN THE SM24SUB-CHUNK .........................................................................................................20
6.3SAMPLE DATA LOOPING RULES.....................................................................................................................................20
7 THE PDTA-LIST CHUNK...............................................................................................................................................21
THE PGENSUB-CHUNK ................................................................................................................................................24
8.4.1 MIDI Note-On Velocity to Initial Attenuation......................................................................................................41
8.4.2 MIDI Note-On Velocity to Filter Cutoff................................................................................................................42
8.4.3 MIDI Channel Pressure to Vibrato LFO Pitch Depth..........................................................................................42
8.4.4 MIDI Continuous Controller 1 to Vibrato LFO Pitch Depth...............................................................................42
8.4.5 MIDI Continuous Controller 7 to Initial Attenuation...........................................................................................43
8.4.6 MIDI Continuous Controller 10 to Pan Position .................................................................................................43
8.4.7 MIDI Continuous Controller 11 to Initial Attenuation.........................................................................................43
8.4.8 MIDI Continuous Controller 91 to Reverb Effects Send......................................................................................44
8.4.9 MIDI Continuous Controller 93 to Chorus Effects Send......................................................................................44
8.4.10 MIDI Pitch Wheel to Initial Pitch Controlled by MIDI Pitch Wheel Sensitivity................................................44
8.5PRECEDENCE AND ABSOLUTE AND RELATIVE VALUES...................................................................................................45
9 PARAMETERS AND SYNTHESIS MODEL .................................................................................................................45
9.1.4 Final Gain Amplifier.............................................................................................................................................47
9.1.6 Low Frequency Oscillators...................................................................................................................................47
9.4THE SOUNDFONT GENERATOR MODEL ..........................................................................................................................50
9.5.1 Controller Model Theory of Operation .................................................................................................................51
9.5.2 Pictorial Examples of Source Types......................................................................................................................55
9.5.3 Mappings of Modulator Sources to the SoundFont Controller Input Domain......................................................58
9.6.1 The NRPN Message...............................................................................................................................................59
9.6.2 The NRPN Select Values........................................................................................................................................59
9.6.3 The Default Data Entry Ranges.............................................................................................................................60
IGURE 2:GENERATOR BASED MODULATION STRUCTURE.......................................................................................................48
F
IGURE 3:SOUNDFONT MODULATOR BUILDING BLOCK...........................................................................................................52
F
IGURE 4:DETAILED SOUNDFONT MODULATOR BUILDING BLOCK..........................................................................................53
F
IGURE 5:POSITIVE UNIPOLAR LINEAR PLOT ...........................................................................................................................55
F
IGURE 6:POSITIVE BIPOLAR LINEAR PLOT..............................................................................................................................55
This document is the definitive source for the SoundFont 2 standard. This document should provide complete and accurate
information to allow any user to correctly construct and interpret SoundFont 2 compatible banks. This document is not
intended to provide any information on the design or implementation of music synthesizers.
1.2 Document Organization
This document is organized such that sections 1 and 2 give introductory information about the SoundFont 2 standard. Both
new and seasoned musical engineers will get value from the review of terminology provided in section 2. Sections 3
through 8 provide increasingly detailed descriptions of the SoundFont 2 standard data structures. The sections will
ultimately serve as reference, but can be scanned in order to provide sufficient detail for any level of understanding. Section
9 deals with the Synthesis model supported by the SoundFont standard, and will be of interest to anyone involved with the
synthesis engine or bank creation. Section 10 specifies error handling when dealing with SoundFont compatible banks, and
will be of interest primarily to programmers using the SoundFont standard. The alphabetical glossary in section 11 can be
used as a reference for any unfamiliar or confusing terminology.
1.3 SoundFont 2 Objectives
The SoundFont 2 standard is intended to provide an extensible, portable, universal interchange format for wavetable
synthesizer “samples” and articulation data. The standard is made extensible largely by the use of enumerated “generators”
and “modulators” so that additional function units can be added as requirements dictate. The standard is made portable and
universal by the use of precisely defined and hardware independent parameters, as well as by specific practices designed to
provide support to a broad range of technologies.
1.4 SoundFont 1.x
The SoundFont standard was originally released in its 1.0 embodiment with the Creative Labs AWE32 product using the
EMU8000 music synthesis chip. This proprietary format proved very successful, but experience brought a number of
refinements. These initially were performed in an upward compatible manner to revision 1.5.
However, due to increasing demand for a public downloadable sound interchange format, Creative Technology determined
that a public disclosure of the SoundFont format would be in its best interest. Because there were still more improvements
required, many of which could not be supported in a completely compatible manner, Creative decided to combine public
disclosure with the step to a revised format. The result is the SoundFont 2 standard.
There are several key enhancements contained in the SoundFont 2 standard. The first is the use of relative parameters in the
Preset level. This allows instruments to be adjusted without altering their self-consistency, providing easy and effective
user editing of instruments. The second is an improvement in the data structures associated with the samples themselves,
again providing key information which will allow the sound designer to re-use samples with a minimum of difficulty. An
increased specificity in the rules for sample data produces enhanced portability across various sound engines. Finally, the
addition of modulators produces a robust structure which can express all the typical function in current and future wavetable
synthesizers.
1.5 Future Enhancements to the SoundFont 2 Standard
The SoundFont 2 standard is designed to allow for enhancements based on future wavetable synthesis technology
capabilities by additional enumerations of generators and modulators. This will be done as required in an upwardly
compatible manner. Suggestions for additions can be made via e-mail to soundfont@emu.com. In general, our policy for
updating the specification will be based on consumer need, rather than technological idealism.
It is our expectation to maintain bi-directional compatibility within the SoundFont 2 standard for some years.
2 Terms and Abbreviations
The following sections introduce terms used within this specification in a logical order. They are provided both as an
introduction to readers unfamiliar with wavetable synthesis implementation details, as well as a review and reference for the
expert. These and other terms and abbreviations can also be found arranged alphabetically for reference in the glossary at
the end of this specification.
2.1 Data Structure Terminology
bag - A SoundFont data structure element containing a list of preset zones or instrument zones
big endian - Refers to the organization in memory of bytes within a word such that the most significant byte occurs at the
lowest address. Contrast “little endian.”
byte - A data structure element of eight bits without definition of meaning to those bits.
BYTE - A data structure element of eight bits which contains an unsigned value from 0 to 255.
case-insensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as identical.
Contrast “case-sensitive.”
case-sensitive - Indicates that an ASCII character or string treats alphabetic characters of upper or lower case as distinct.
Contrast “case-insensitive.”
CHAR - A data structure of eight bits which contains a signed value from -128 to +127.
chunk - The top-level division of a RIFF file.
doubleword - A data structure element of 32 bits without definition of meaning to those bits.
DWORD - A data structure of 32 bits which contains an unsigned value from zero to 4,294,967,295.
enumerated - Said of a data element whose symbols correspond to particular assigned functions.
global - Refers to parameters which affect all associated structures. See “global zone”
global zone - A zone whose generators and modulators affect all other zones within the object.
header - A data structure element which describes several aspects of a SoundFont element.
hydra - A. A nine-headed mythical beast. B. The nine “pdta” sub-chunks which make up the SoundFont articulation data.
instrument - In the SoundFont standard, a collection of zones which represents the sound of a single musical instrument or
sound effect set.
instrument zone - A subset of an instrument containing a sample reference and associated articulation data defined to play
over certain key numbers and velocities.
layer - An obsolete SoundFont term, now called a Preset Zone.
level - In the SoundFont structure, this refers either to the preset and preset zones (the preset level) or the instrument and
instrument zones (the instrument level).
little endian - A method of ordering bytes within larger words in memory in which the least significant byte is at the lowest
address. Contrast “big endian.”
object - Either an instrument or a preset, depending on what level (preset or instrument) is being discussed.
orphan - Said of a data structure which under normal circumstances is referenced by a higher level, but in this particular
instance is no longer linked. Specifically, it is an instrument which is not referenced by any preset zone, or a sample which
is not referenced by any instrument zone.
preset - A keyboard full of sound. Typically the collection of samples and articulation data associated with a particular
MIDI preset number.
preset zone - A subset of a preset containing an instrument reference and associated articulation data defined to play over
certain key numbers and velocities.
record - A single instance of a data structure.
RIFF - Acronym for Resource Interchange File Format. The recommended form for interchange files such as SoundFont
compatible files within Microsoft operating systems.
SHORT - A data structure element of sixteen bits which contains a signed value from -32,768 to +32,767.
split - An obsolete SoundFont term, now called an Instrument Zone.
sub-chunk - A division of a RIFF file below that of the chunk.
terminator - A data structure element indicating the final element in a sequence.
WORD - A data structure of 16 bits which contains an unsigned value from zero to 65,535.
word - A data structure element of 16 bits without definition of meaning to those bits.
zone - An object and associated articulation data defined to play over certain key numbers and velocities.
2.2 Synthesizer Terminology
articulation - The process of modulation of amplitude, pitch, and timbre to produce an expressive musical note.
artifact - A (typically undesirable) sonic event which is recognizable as not being present in the original sound.
attack - That phase of an envelope or sound during which the amplitude increases from zero to a peak value.
attenuation - A decrease in volume or amplitude of a signal.
AWE32 - The original Creative Technology Sound Blaster product which contained an EMU8000 wavetable synthesizer
and supported the SoundFont standard.
balance - A form of stereo volume control in which both left and right channels are at maximum when the control is
centered, and which attenuates only the opposite channel when taken to either extreme.
bank - A collection of presets. See also MIDI bank.
chorus - An effects processing algorithm which involves cyclically shifting the pitch of a signal and remixing it with itself to
produce a time varying comb filter, giving a perception of motion and fullness to the resulting sound.
cutoff frequency - The frequency of a filter function at which the attenuation reaches a specified value.
data points - The individual values comprising a sample. Sometimes also called sample points. Contrast “sample.”
decay - The portion of an envelope or sound during which the amplitude declines from a peak to steady state value.
delay - The portion of an envelope or LFO function which elapses from a key-on event until the amplitude becomes nonzero.
DC gain - The degree of amplification or attenuation a system presents to a static, or zero frequency, signal.
digital audio - Audio represented as a sequence of quantized values spaced evenly over time. The values are called “sample
data points.”
downloadable - Said of samples which are loaded from a file into RAM, in contrast to samples which are maintained in
ROM.
dry - Refers to audio which has not received any effects processing such as reverb or chorus.
EMU8000 - A wavetable synthesizer chip designed by E-mu Systems for use in Creative Technology products.
envelope - A time varying signal which typically controls the pitch, volume, and/or filter cutoff frequency of a note, and
comprises multiple phases including attack, decay, sustain, and release.
flat - A. Said of a tone that is lower in pitch than another reference tone. B. Said of a frequency response that does not
deviate significantly from a single fixed gain over the audio range.
interpolator - A circuit or algorithm which computes intermediate points between existing sample data points. This is of
particular use in the pitch shifting operation of a wavetable synthesizer, in which these intermediate points represent the
output samples of the waveform at the desired pitch transposition.
key number - See MIDI key number.
LFO - Acronym for Low Frequency Oscillator. A slow periodic modulation source.
linear coding - The most common method of encoding amplitudes in digital audio in which each step is of equal size.
loop - In wavetable synthesis, a portion of a sample which is repeated many times to increase the duration of the resulting
sound.
loop points - The sample data points at which a loop begins and ends.
lowpass - Said of a filter which attenuates high frequencies but does not attenuate low frequen cies.
MIDI - Acronym for Musical Instrument Digital Interface. The standard protocol for sending performance information to a
musical synthesizer.
MIDI bank - A group of up to 128 presets selected by a MIDI “change bank” command.
MIDI continuous controller - A construct in the MIDI protocol.
MIDI key number - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies
the key of the musical instrument keyboard to which the command refers.
MIDI pitch bend - A special MIDI construct akin to the MIDI continuous controllers which controls the real-time value of
the pitch of all notes played in a MIDI channel.
MIDI preset - A “preset” selected to be active in a particular MIDI channel by a MIDI “change preset” command.
MIDI velocity - A construct in the MIDI protocol which accompanies a MIDI key-on or key-off command and specifies the
speed with which the key was pressed or released.
mono - Short for “monophonic.” Indicates a sound comprising only one channel or waveform. Contrast with “stereo.”
oscillator - In wavetable synthesis, the wavetable interpolator is considered an oscillator.
pan - Short for “panorama.” This is the control of the apparent azimuth of a sound source over 180 degrees from left to
right. It is generally implemented by varying the volume at the left and right speakers.
pitch - The perceived value of frequency. Generally can be used interchangeably with frequency.
pitch shift - A change in pitch. Wavetable synthesis relies on interpolators to cause pitch shift in a sample to produce the
notes of the scale.
pole - A mathematical term used in filter transform analysis. Traditionally in synthesis, a pole is equated with a rolloff of
6dB per octave, and the rolloff of a filter is specified in “poles.”
preset - A keyboard full of sound. Typically the collection of samples and articulation data associated with a particular
MIDI preset number.
Q - A mathematical term used in filter transform analysis. Indicates the degree of resonance of the filter. In synthesis
terminology, it is synonymous with resonance.
release - The portion of an envelope or sound during which the amplitude declines from a steady state to zero value or
inaudibility.
resonance - Describes the aspect of a filter in which particular frequencies are given significantly more gain than others.
The resonance can be measured in dB above the DC gain.
resonant frequency - The frequency at which resonance reaches its maximum.
reverb - Short for reverberation. In synthesis, a synthetic signal processor which adds artificial spaciousness and ambience
to a sound.
sample - This term is often used both to indicate a “sample data point” and to indicate a collection of such points comprising
a digital audio waveform. The latter meaning is exclusively used in this specification.
soft - The pedal on a piano, so named because it causes the damper to be lowered in such a way as to soften the timbre and
loudness of the notes. In MIDI, continuous controller #66 which behaves in a similar manner.
sostenuto - The pedal on a piano which causes the dampers on all keys depressed to be held until the pedal is released. In
MIDI, continuous controller #67 which behaves in a similar manner.
sustain - The pedal on a piano which prevents all dampers on keys as they are depressed from being released. In MIDI,
continuous controller #64 which behaves in a similar manner.
SoundFont - A registered trademark of E-mu Systems, Inc, indicating files, data, synthesizers, hardware or software
produced by E-mu that conform to the SoundFont Technical Specification.
SoundFont Compatible - Indicates files, data, synthesizers, hardware or software that conform to the SoundFont Technical
Specification.
stereo - Literally indicating three dimensions. In this specification, the term is used to mean two channel stereophonic,
indicating that the sound is composed of two independent audio channels, dubbed left and right. Contrast monophonic.
synthesis engine - The hardware and software associated with the signal processing and modulation path for a particular
synthesizer.
synthesizer - A device capable of producing ideally arbitrary musical sound.
tremolo - A periodic change in amplitude of a sound, typically produced by applying a low frequency oscillator to the final
volume amplifier.
triangular - A waveform which ramps upward to a positive limit, then downward at the opposite slope to the symmetrically
negative limit periodically.
unpitched - Said of a sound which is not characterized by a perceived frequency. This would be true of noise-like musical
instruments and of many sound effects.
velocity - In synthesis, the speed with which a keyboard key is depressed, typically proportionally to the impact delivered
by the musician. See also MIDI velocity.
vibrato - A periodic change in the pitch of a sound, typically produced by applying a low frequency oscillator to the
oscillator pitch.
volume - The loudness or amplitude of a sound, or the control of this parameter.
wavetable - A music synthesis technique wherein musical sounds are recorded or computed mathematically and stored in a
memory, then played back at a variable rate to produce the desired pitch. Additional timbre adjustments are often made to
the sound thus produced using amplifiers, filters, and effect processing such as reverb and chorus.
2.3 Parameter Terminology
absolute - Describes a parameter which gives a definitive real-world value. Contrast to relative.
additive - Describes a parameter which is to be numerically added to another parameter.
attenuation - A decrease in volume or amplitude of a signal.
bipolar - Said of a controller which has a minimum value of -1 and a maximum value of 1. Contrast “unipolar”
cent - A unit of pitch ratio corresponding to the twelve hundredth root of two, or one hundredth of a semitone,
approximately 1.000577790.
centibel - A unit of amplitude ratio corresponding to the two hundredth root of ten, or one tenth of a decibel, approximately
cutoff frequency - The frequency of a filter function at which the attenuation reaches a specified value.
decibel - A unit of amplitude ratio corresponding to the twentieth root of ten, approximately 1.122018454.
octave - A factor of two in ratio, typically applied to pitch or frequency.
pitch - The perceived value of frequency. Generally can be used interchangeably with frequency.
pitch shift - A change in pitch. Wavetable synthesis relies on interpolators to cause pitch shift in a sample to produce the
notes of the scale.
relative - Describes a parameter which merely indicates an offset from an otherwise established value. Contrast to absolute.
resonance - Describes the aspect of a filter in which particular frequencies are given significantly more gain than others.
The resonance can be measured in dB above the DC gain.
sample rate - The frequency, in Hertz, at which sample data points are taken when recording a sample.
semitone - A unit of pitch ratio corresponding to the twelfth root of two, or one twelfth of an octave, approximately
1.059463094.
sharp - Said of a tone that is higher in pitch than another reference tone.
timecent - A unit of duration ratio corresponding to the twelve hundredth root of two, or one twelve hundredth of an octave,
approximately 1.000577790.
unipolar - Said of a controller which has a minimum value of 0 and a maximum value of 1. Contrast with “bipolar”
3 RIFF Structure
3.1 General RIFF File Structure
The RIFF (Resource Interchange File Format) is a tagged file structure developed for multimedia resource files, and is
described in some detail in the Microsoft Windows SDK Multimedia Programmer’s Reference. The tagged-file structure is
useful because it helps prevent compatibility problems which can occur as the file definition changes over time. Because
each piece of data in the file is identified by a standard header, an application that does not recognize a given data element
can skip over the unknown information.
A RIFF file is constructed from a basic building block called a “chunk.” In ‘C’ syntax, a chunk is defined:
typedef DWORD FOURCC; // Four-character code
typedef struct {
FOURCC ckID; // A chunk ID identifies the type of data within the chunk.
DWORD ckSize; // The size of the chunk data in bytes, excluding any pad byte.
BYTE ckDATA[ckSize]; // The actual data plus a pad byte if req’d to word align.
};
Two types of chunks, the “RIFF” and “LIST” chunks, may contain nested chunks called sub-chunks as their data.
The ordering requirements of chunks and sub-chunks within a RIFF file is not well documented in the RIFF file format. In
SoundFont 2.0, the order of the sub-chunks within the INFO chunk is arbitrary, but for consistency it is recommended that
the sub-chunks be ordered as presented in this document. The order of the all other chunks and sub-chunks is strictly
defined and must be maintained as presented in this document.
3.2 The SoundFont 2 Chunks and Sub-chunks
A SoundFont 2 compatible RIFF file comprises three chunks: an INFO-list chunk containing a number of required and
optional sub-chunks describing the file, its history, and its intended use, an sdta-list chunk comprising a single sub-chunk
containing any referenced digital audio samples, and a pdta-list chunk containing nine sub-chunks which define the
articulation of the digital audio data.
The SoundFont 2 standard allows that the sub-chunks within the INFO-list chunk may appear in arbitrary order. However,
the order of the three chunks, and the order of the sub-chunks within the pdta-list chunk, is fixed.
The SoundFont 2 specification requires that implementations ignore unknown sub-chunks within the INFO-list chunk.
Note, however, that until such sub-chunks become defined in the specification, inclusion of additional INFO-list sub-chunks
will preclude the file from conforming to the SoundFont standard.
A detailed description of the SoundFont 2 RIFF structure is provided in Section 4.
3.3 Redundancy and Error Handling in the RIFF structure
The RIFF file structure contains redundant information regarding the length of the file and the length of the chunks and subchunks. This fact enables any reader of a SoundFont compatible file to determine if the file has been damaged by loss of
data.
If any such loss is detected, the SoundFont compatible file is termed “structurally unsound” and in general should be
rejected. SoundFont compatible software developers may produce utilities to recover data from structurally unsound files,
producing with or without user assistance a corrected and structurally sound SoundFont 2 compatible file.
4 SoundFont 2 RIFF File Format
4.1 SoundFont 2 RIFF File Format Level 0
<SFBK-form> -> RIFF (‘sfbk’ ; RIFF form header
{
<INFO-list> ; Supplemental Information
<sdta-list> ; The Sample Binary Data
<pdta-list> ; The Preset, Instrument, and Sample Header data
}
)
4.2 SoundFont 2 RIFF File Format Level 1
<INFO-list> -> LIST (‘INFO’
{
<ifil-ck> ; Refers to the version of the Sound Font RIFF file
<isng-ck> ; Refers to the target Sound Engine
<INAM-ck> ; Refers to the Sound Font Bank Name
[<irom-ck>] ; Refers to the Sound ROM Name
[<iver-ck>] ; Refers to the Sound ROM Version
[<ICRD-ck>] ; Refers to the Date of Creation of the Bank
[<IENG-ck>] ; Sound Designers and Engineers for the Bank
[<IPRD-ck>] ; Product for which the Bank was intended
[<ICOP-ck>] ; Contains any Copyright message
[<ICMT-ck>] ; Contains any Comments on the Bank
[<ISFT-ck>] ; The SoundFont tools used to create and alter the bank
}
)
<sdta-ck> -> LIST (‘sdta’
{
[<smpl-ck>] ; The Digital Audio Samples for the upper 16 bits
}
{
[<sm24-ck>] ; The Digital Audio Samples for the lower 8 bits
}
)
<pdta-ck> -> LIST (‘pdta’
{
<phdr-ck> ; The Preset Headers
<pbag-ck> ; The Preset Index list
<pmod-ck> ; The Preset Modulator list
<pgen-ck> ; The Preset Generator list
<inst-ck> ; The Instrument Names and Indices
<ibag-ck> ; The Instrument Index list
<imod-ck> ; The Instrument Modulator list
<igen-ck> ; The Instrument Generator list
<shdr-ck> ; The Sample Headers
}
)
4.3 SoundFont 2 RIFF File Format Level 2
<ifil-ck> -> ifil(<iver-rec>) ; e.g. 2.01
<isng-ck> -> isng(szSoundEngine:ZSTR) ; e.g. “EMU8000”
<irom-ck> -> irom(szROM:ZSTR) ; e.g. “1MGM”
<iver-ck> -> iver(<iver-rec>) ; e.g. 2.08
<INAM-ck> -> INAM(szName:ZSTR) ; e.g. “General MIDI”
<ICRD-ck> -> ICRD(szDate:ZSTR) ; e.g. “July 15, 1997”
<IENG-ck> -> IENG(szName:ZSTR) ; e.g. “John Q. Sounddesigner”
<IPRD-ck> -> IPRD(szProduct:ZSTR) ; e.g. “SBAWE64 Gold”
<ICOP-ck> -> ICOP(szCopyright:ZSTR) ; e.g. “Copyright (c) 1997 E-mu Systems, Inc.”
<ICMT-ck> -> ICMT(szComment:ZSTR) ; e.g. “This is a comment”
<ISTF-ck> -> ISFT(szTools:ZSTR) ; e.g. “:Preditor 2.00a:Vienna SF Studio 2.0:”
<smpl-ck> -> smpl(<sample:SHORT>) ; 16 bit Linearly Coded Digital Audio Data
The SFSampleLink is an enumeration type which describes both the type of sample (mono, stereo left, etc.) and the whether
the sample is located in RAM or ROM memory:
The INFO-list chunk in a SoundFont 2 compatible file contains three mandatory and a variety of optional sub-chunks as
defined below. The INFO-list chunk gives basic information about the SoundFont compatible bank that is contained in the
file.
5.1 The ifil Sub-chunk
The ifil sub-chunk is a mandatory sub-chunk identifying the SoundFont specification version level to which the file
complies. It is always four bytes in length, and contains data according to the structure:
struct sfVersionTag
{
WORD wMajor;
WORD wMinor;
};
The WORD wMajor contains the value to the left of the decimal point in the SoundFont specification version, the WORD
wMinor contains the value to the right of the decimal point. For example, version 2.11 would be implied if wMajor=2 and
wMinor=11.
These values can be used by applications which read SoundFont compatible files to determine if the format of the file is
usable by the program. Within a fixed wMajor, the only changes to the format will be the addition of Generator, Source and
Transform enumerators, and additional info sub-chunks. These are all defined as being ignored if unknown to the program.
Consequently, many applications can be designed to be fully upward compatible within a given wMajor. In the case of
editors or other programs in which all enumerators should be known, the value of wMinor may be of consequence.
Generally the application program will either accept the file as usable (possibly with appropriate transparent translation),
reject the file as unusable, or warn the user that there may be uneditable data in the file.
If the ifil sub-chunk is missing, or its size is not four bytes, the file should be rejected as structurally unsound.
The isng sub-chunk is a mandatory sub-chunk identifying the wavetable sound engine for which the file was optimized. It
contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte
count even. The default isng field is the eight bytes representing “EMU8000” as seven ASCII characters followed by a zero
byte.
The ASCII should be treated as case-sensitive. In other words “emu8000” is not the same as “EMU8000.”
The isng string can be optionally used by chip drivers to vary their synthesis algorithms to emulate the target sound engine.
If the isng sub-chunk is missing, or is not terminated with a zero valued byte, or its contents are an unknown sound engine,
the field should be ignored and EMU8000 assumed.
5.3 The INAM Sub-chunk
The INAM sub-chunk is a mandatory sub-chunk providing the name of the SoundFont compatible bank. It contains an
ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count even.
A typical INAM sub-chunk would be the fourteen bytes representing “General MIDI” as twelve ASCII characters followed
by two zero bytes.
The ASCII should be treated as case-sensitive. In other words “General MIDI” is not the same as “GENERAL MIDI.”
The inam string is typically used for the identification of banks even if the file names are altered.
If the inam sub-chunk is missing, or not terminated in a zero valued byte, the field should be ignored and the user supplied
with an appropriate error message if the name is queried. If the file is re-written, a valid name should be placed in the
INAM field.
5.4 The irom Sub-chunk
The irom sub-chunk is an optional sub-chunk identifying a particular wavetable sound data ROM to which any ROM
samples refer. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to
make the total byte count even. A typical irom field would be the six bytes representing “1MGM” as four ASCII characters
followed by two zero bytes.
The ASCII should be treated as case-sensitive. In other words “1mgm” is not the same as “1MGM.”
The irom string is used by drivers to verify that the ROM data referenced by the file is available to the sound engine.
If the irom sub-chunk is missing, not terminated in a zero valued byte, or its contents are an unknown ROM, the field should
be ignored and the file assumed to reference no ROM samples. If ROM samples are accessed, any accesses to such
intruments should be terminated and not sound. A file should not be written which attempts to access ROM samples
without both irom and iver present and valid.
5.5 The iver Sub-chunk
The iver sub-chunk is an optional sub-chunk identifying the particular wavetable sound data ROM revision to which any
ROM samples refer. It is always four bytes in length, and contains data according to the structure:
struct sfVersionTag
{
WORD wMajor;
WORD wMinor;
};
The WORD wMajor contains the value to the left of the decimal point in the ROM version. The WORD wMinor contains
the value to the right of the decimal point. For example, version 1.36 would be implied if wMajor=1 and wMinor=36.
The iver sub-chunk is used by drivers to verify that the ROM data referenced by the file is located in the exact locations
specified by the sound headers.
If the iver sub-chunk is missing, not four bytes in length, or its contents indicate an unknown or incorrect ROM, the field
should be ignored and the file assumed to reference no ROM samples. If ROM samples are accessed, any accesses to such
instruments should be terminated and not sound. Note that for ROM samples to function correctly, both iver and irom must
be present and valid. A file should not be written which attempts to access ROM samples without both irom and iver
present and valid.
5.6 The ICRD Sub-chunk
The ICRD sub-chunk is an optional sub-chunk identifying the creation date of the SoundFont compatible bank. It contains
an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make the total byte count
even. A typical ICRD field would be the twelve bytes representing “May 1, 1995” as eleven ASCII characters followed by
a zero byte.
Conventionally, the format of the string is “Month Day, Year” where Month is initially capitalized and is the conventional
full English spelling of the month, Day is the date in decimal followed by a comma, and Year is the full decimal year. Thus
the field should conventionally never be longer than 32 bytes.
The ICRD string is provided for library management purposes.
If the ICRD sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
5.7 The IENG Sub-chunk
The IENG sub-chunk is an optional sub-chunk identifying the names of any sound designers or engineers responsible for
the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of
value zero, so as to make the total byte count even. A typical IENG field would be the twelve bytes representing “Tim
Swartz” as ten ASCII characters followed by two zero bytes.
The IENG string is provided for library management purposes.
If the IENG sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
The IPRD sub-chunk is an optional sub-chunk identifying any specific product for which the SoundFont compatible bank is
intended. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to make
the total byte count even. A typical IPRD field would be the eight bytes representing “SBAWE32” as seven ASCII
characters followed by a zero byte.
The ASCII should be treated as case-sensitive. In other words “sbawe32” is not the same as “SBAWE32.”
The IPRD string is provided for library management purposes.
If the IPRD sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
5.9 The ICOP Sub-chunk
The ICOP sub-chunk is an optional sub-chunk containing any copyright assertion string associated with the SoundFont
compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators of value zero, so as to
make the total byte count even. A typical ICOP field would be the 40 bytes representing “Copyright (c) 1995 E-mu
Systems, Inc.” as 38 ASCII characters followed by two zero bytes.
The ICOP string is provided for intellectual property protection and management purposes.
If the ICOP sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
5.10 The ICMT Sub-chunk
The ICMT sub-chunk is an optional sub-chunk containing any comments associated with the SoundFont compatible bank.
It contains an ASCII string of 65,536 or fewer bytes including one or two terminators of value zero, so as to make the total
byte count even. A typical ICMT field would be the 40 bytes representing “This space unintentionally left blank.” as 38
ASCII characters followed by two zero bytes.
The ICMT string is provided for any non-scatological uses.
If the ICMT sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
5.11 The ISFT Sub-chunk
The ISFT sub-chunk is an optional sub-chunk identifying the SoundFont compatible tools used to create and most recently
modify the SoundFont compatible bank. It contains an ASCII string of 256 or fewer bytes including one or two terminators
of value zero, so as to make the total byte count even. A typical ISFT field would be the thirty bytes representing “Preditor
2.00a:Preditor 2.00a” as twenty-nine ASCII characters followed by a zero byte.
The ASCII should be treated as case-sensitive. In other words “Preditor” is not the same as “PREDITOR.”
Conventionally, the tool name and revision control number are included first for the creating tool and then for the most
recent modifying tool. The two strings are separated by a colon. The string should be produced by the creating program
with a null modifying tool field (e.g. “Preditor 2.00a:), and each time a tool modifies the bank, it should replace the
modifying tool field with its own name and revision control number.
The ISFT string is provided primarily for error tracing purposes.
If the ISFT sub-chunk is missing, not terminated in a zero valued byte, or for some reason incapable of being faithfully
copied as an ASCII string, the field should be ignored and if re-written, should not be copied. If the field’s contents are not
seemingly meaningful but can faithfully reproduced, this should be done.
6 The sdta-list Chunk
The sdta-list chunk in a SoundFont 2 compatible file contains a single optional smpl sub-chunk which contains all the RAM
based sound data associated with the SoundFont compatible bank. The smpl sub-chunk is of arbitrary length, and contains
an even number of bytes. The sm24 sub-chunk, if present, is exactly ½ the size of the smpl sub-chunk, plus 1 byte if
necessary to meet the RIFF 16-bit alignment specification.
6.1 Sample Data Format in the smpl Sub-chunk
The smpl sub-chunk, if present, contains one or more “samples” of digital audio information in the form of linearly coded
sixteen bit, signed, little endian (least significant byte first) words. Each sample is followed by a minimum of forty-six zero
valued sample data points. These zero valued data points are necessary to guarantee that any reasonable upward pitch shift
using any reasonable interpolator can loop on zero data at the end of the sound.
6.2 Sample Data Format in the sm24 Sub-chunk
The sm24 sub-chunk, if present, contains the least significant byte counterparts to each sample data point contained in the
smpl chunk. Note this means for every two bytes in the [smpl] sub-chunk there is a 1-byte counterpart in [sm24] sub-chunk.
These sample waveform points are to be combined with the sample waveform points in the smpl sub-chunk, to collectively
create a single sample data pool with up to 24 bits of resolution.
If the smpl Sub-chunk is not present, the sm24 sub-chunk should be ignored. If the ifil version of the format is less than that
which represents 2.04, the sm24 sub-chunk should be ignored. If the size of the sm24 chunk is not exactly equal to the ½ the
size of the smpl chunk (+ 1 byte in the case that ½ the size of smpl chunk is an odd value), the sm24 sub-chunk should be
ignored.
In any and all cases where the sm24 sub-chunk is ignored, the synthesizer should render only those samples contained
within the smpl sub-chunk.
6.3 Sample Data Looping Rules
Within each sample, one or more loop point pairs may exist. The locations of these points are defined within the pdta-list
chunk, but the sample data points themselves must comply with certain practices in order for the loop to be compatible
across multiple platforms.
The loops are defined by “equivalent points” in the sample. This means that there are two sample data points which are
logically equivalent, and a loop occurs when these points are spliced atop one another. In concept, the loop end point is
never actually played during looping; instead the loop start point follows the point just prior to the loop end point. Because
of the bandlimited nature of digital audio sampling, an artifact free loop will exhibit virtually identical data surrounding the
equivalent points.
In actuality, because of the various interpolation algorithms used by wavetable synthesizers, the data surrounding both the
loop start and end points may affect the sound of the loop. Hence both the loop start and end points must be surrounded by
continuous audio data. For example, even if the sound is programmed to continue to loop throughout the decay, sample
data points must be provided beyond the loop end point. This data will typically be identical to the data at the start of the
loop. A minimum of eight valid data points are required to be present before the loop start and after the loop end.
The eight data points (four on each side) surrounding the two equivalent loop points should also be forced to be identical.
By forcing the data to be identical, all interpolation algorithms are guaranteed to properly reproduce an artifact-free loop.
7 The pdta-list Chunk
7.1 The HYDRA Data Structure
The articulation data within a SoundFont 2 compatible file is contained in nine mandatory sub-chunks. This data is named
“hydra” after the mythical nine-headed beast. The structure has been designed for interchange purposes; it is not optimized
for either run-time synthesis or for on-the-fly editing. It is reasonable and proper for SoundFont compatible client programs
to translate to and from the hydra structure as they read and write SoundFont compatible files.
7.2 The PHDR Sub-chunk
The PHDR sub-chunk is a required sub-chunk listing all presets within the SoundFont compatible file. It is always a
multiple of thirty-eight bytes in length, and contains a minimum of two records, one record for each preset and one for a
terminal record according to the structure:
struct sfPresetHeader
{
CHAR achPresetName[20];
WORD wPreset;
WORD wBank;
WORD wPresetBagNdx;
DWORD dwLibrary;
DWORD dwGenre;
DWORD dwMorphology;
};
The ASCII character field achPresetName contains the name of the preset expressed in ASCII, with unused terminal
characters filled with zero valued bytes. Preset names are case sensitive. A unique name should always be assigned to each
preset in the SoundFont compatible bank to enable identification. However, if a bank is read containing the erroneous state
of presets with identical names, the presets should not be discarded. They should either be preserved as read or preferably
uniquely renamed.
The WORD wPreset contains the MIDI Preset Number and the WORD wBank contains the MIDI Bank Number which
apply to this preset. Note that the presets are not ordered within the SoundFont compatible bank. Presets should have a
unique set of wPreset and wBank numbers. However, if two presets have identical values of both wPreset and wBank, the
first occurring preset in the PHDR chunk is the active preset, but any others with the same wBank and wPreset values
should be maintained so that they can be renumbered and used at a later time. The special case of a General MIDI