[To be published in Computer Music Journal 18:4 (Winter 94)]

A Comparison of MIDI and ZIPI 

Matthew Wright
Center for New Music and Audio Technologies (CNMAT)
Department of Music, University of California, Berkeley
1750 Arch St. 
Berkeley, California 94720 USA
Matt@CNMAT.Berkeley.edu

A main factor in the design of ZIPI was frustration with MIDI, the
well-established standard for communication among electronic musical
instruments. This article lists some of those frustrations and explain how
ZIPI overcomes them.  Basic knowledge of both MIDI and ZIPI are assumed
(e.g., [IMA 1988] and the related articles on ZIPI in this issue).


Real Networks
=============

Each MIDI device has separate MIDI plugs labeled "in," "out," or "thru." A
MIDI user must think carefully about which devices will be sending
information to which other devices, and arrange that the "MIDI-out" of the
device sending the information is connected with its own cable to the
"MIDI-in" of the device receiving the information, or that the signal is
properly daisy-chained via the "MIDI-thru" of intermediate devices.

Computer networks, in contrast, have the characteristic that any device
connected to the network can send and receive packets to and from any other
device connected to the network. ([Tanenbaum 1989] is an excellent
introduction to computer networks.) In Ethernet, for example, every device
has only one plug, and a single cable connects the computer to the entire
network, allowing it to talk to any device.

In this respect, ZIPI is more like a computer network than like MIDI. Each
ZIPI device has only one ZIPI plug, and a single ZIPI cable connects it to a
hub. Any devices connected to the same hub can send packets to each other.
If multiple hubs are connected with ZIPI cables, than any device connected
to any hub can talk to any other device on any hub.

This means that musicians will never have to rewire their ZIPI studios,
unless they add or remove devices. If you want to use your ZIPI synthesizer
as a keyboard controller one day and as a timbre module the next day, no
wiring has to change. You never have to worry about the number of ZIPI
outputs your computer has, because connecting your computer's one ZIPI port
to the hub will allow the computer to control any number of ZIPI
synthesizers. You never have to build complicated wiring structures with
in/out/thru, because you don't have to think about which data will flow
through which wire.

This also means that all ZIPI communication can be two-way instead of
one-way. A ZIPI controller can ask questions of a ZIPI synthesizer, e.g., to
find out its capabilities, and the synthesizer can respond by sending a
message back to the controller (Loy 1985).


Bandwidth and Efficiency
========================

MIDI has a data rate of 31.25 kBaud, which it uses 80 percent efficiently.
(MIDI has 10-bit bytes, with a start bit, 8 data bits, and a stop bit.) This
is more than enough for note on and note off events; consider the extreme
case of a keyboard player playing 10-voice 16th-note chords at 120 beats per
minute. That is 80 notes per second. A MIDI note on and note off each take
only 2 bytes to transmit (using running status), so that's 320 bytes, or
3,200 bits per second, which is just over a tenth of MIDI's bandwidth. 

MIDI cannot keep up, however, with continuous controllers. A guitar
controller soon to be released by Zeta Music tracks pitch, loudness,
brightness, even/odd ratio, and noise amount on each of six strings,
updating each parameter every 8-10 msec. Pitch and loudness are 2-byte
values; the other three are 1-byte, so to send these over MIDI we would have
to use seven continuous controllers. (Even this is generous, since MIDI
continuous controllers are only 7 bits and we compute 8-bit bytes.) This
results in 4200 control updates per second:

	7 control updates * 6 strings / 10 msec = 4200 control updates/sec

Sending continuous controllers on separate channels rules out MIDI running
status, so we assume each of these would take three bytes. How much
bandwidth does this require? 

4200 control updates/sec * 3 bytes/controller * 10 bits/byte = 126 kBaud.

This is over four times MIDI's data rate, without even considering note on
and note off messages.

ZIPI's data rate is variable, with no maximum, so as technology improves and
data rates increase, ZIPI will never be a bottleneck. ZIPI's minimum data
rate is 250 kBaud, eight times MIDI's rate, which is a comfortable speed
even for this kind of continuous information. Currently available
communication chips allow a maximum data rate of 20 MBaud. (ZIPI includes a
mechanism for automatically picking the fastest speed that all connected
devices can handle, so it's no problem to mix ZIPI devices with different
data rates.)

Also, ZIPI's data format allows it to transmit high-bandwidth information
more efficiently than MIDI.  For example, the information produced by the
guitar controller mentioned above would require 126 kBaud to transmit via
MIDI continuous controllers.  Via ZIPI, the same controller could transmit
all the same data, with slightly higher resolution, using only 85.6 kBaud.
(See the derivation for this in the "Examples of ZIPI Applications" article
in this issue.)  Thus, in addition to being faster than MIDI, ZIPI uses its
bandwidth more efficiently.


Flexibility in Message Addressing
=================================

MIDI messages fall into two categories. The first category consists of the
messages whose first data byte specifies a particular note number: note on,
note off, and polyphonic after-touch. All other MIDI controller messages,
such as pitch bend, pan, and modulation, apply to an entire channel, not a
single note.

Imagine that you are controlling a synthesizer from a guitar via MIDI. Each
of the six guitar strings might be bent by the guitarist by different
amounts, so to have individual pitch-bend control of six voices, you'd have
to put them on six different MIDI channels; all MIDI guitars do this. That
is awkward and needlessly complicated, and it uses up over a third of the
MIDI channels for one instrument. In ZIPI, it is possible to address a pitch
message to a single note instead of an entire channel. In fact, any message
can be sent to a single note, so this entire category of problem can never
arise.

MIDI has the opposite problem too. It would be nice to turn off all of the
notes on a channel all at once, but since note off commands cannot be sent
to a channel, this is impossible. Every note off message has to be sent to
only one note. There is a separate "All Notes Off" message, but it has a
decidedly second-class status; "In no case should [all notes off messages]
be used in lieu of note off commands to turn off notes which have been
previously turned on. Therefore any all notes off command (123-127) may be
ignored by receiver with no possibility of notes staying on, since any note
on command must have a corresponding specific note off command" (IMA 1988).

For after-touch, there are also two separate messages: polyphonic
after-touch, applicable only to a single note, and channel after-touch,
applicable only to an entire channel. The MIDI standard doesn't explicitly
discourage either of these messages, but in practice the channel version of
the message is generally favored---few MIDI controllers send polyphonic
after-touch. Again, MIDI has separate controllers that mean the same thing,
except for their addressability. 

The last note-addressed MIDI message is note on. It would be nice to be able
to articulate an entire chord in one message, avoiding temporal "smearing"
of the onsets of the notes in the chord (Moore 1988) and saving bandwidth.
This is impossible in MIDI. There isn't even a second-class channel message
for note-on, because MIDI has no way to specify what notes the chord would
contain. In ZIPI, every message can be sent either to a single note or to a
group of notes. Anything you can tell a note to do you can also tell a group
of notes to do.


Address Space
=============

In MIDI, a note's address is the same as the note's pitch. If you want to
specify which note to apply after touch to, or which note to release, you
have to name that note by giving its pitch. You cannot say "note number 55"
without it meaning "the note whose pitch is G below middle C."

In real life, though, a note's pitch might change over time, or there might
be two notes played on the same instrument with the same pitch. Both of
these situations are awkward to express in MIDI. You can't say "that note
that is G below middle C; slide it up a whole step to A below middle C." You
can send a pitch-bend message to the channel containing that note, but then
when you want to release the A you still have to call it a G, because the
note number is the name as well as the pitch.

Similarly, imagine a MIDI guitar controller in which the guitarist is
fretting an E on the fifth fret of the B string, and also letting the open E
ring on the high E string. The guitar is playing two notes at the same time,
with the same pitch. But the note on the E string might be a lot quieter
than the note on the B string, or the note on the B string might be bent up
a half step, or one of them could end while the other keeps sounding. When
you send a typical MIDI synthesizer two note-on messages with the same
pitch, it plays two copies of the same note.  But then it's hard to send
messages to a particular one of the two notes. If you send polyphonic after
touch to MIDI note number 64 (the E being played by two strings), it might
affect both the sounding notes, or just one of them, but there is no way to
specify which one. If you send a note-off to note number 64, either note
might release, even if one is much louder than the other. It is possible to
get around these problems by using separate MIDI channels for each note.
Then you could have a loud E on channel 1, and a quieter E, with after
touch, on channel 2. But this solution is inelegant and awkward, and it soon
leads to running out of MIDI channels.

In ZIPI, the notions of address and pitch are separate. ZIPI note number 64
doesn't have to be the E above middle C; it is simply a number. When you
want a note to sound, you pick an address, give it a pitch, loudness, etc.,
and tell it to start. Then whenever you want to make changes to this note,
you send the address of this note and the note descriptors that change it.


Distinguishing Between Controller and Synthesizer Messages
==========================================================

When a musician controls a synthesizer, there are four steps: (1) the
musician performs some action, like blowing into a mouthpiece or pressing
keys; (2) these gestures are somehow measured, producing parameters such as
"how fast the key was going" and "which fret was fingered"; (3) these
measurements are translated into parameters to control a synthesizer. For
example, key velocity might map to amplitude and brightness, and fret
position would map to pitch; and (4) a synthesizer takes these control
parameters and produces sound. Figure 1 illustrates these steps. Note that
there are two streams of information. One is a stream of measurements about
the musician's gestures; the other is a stream of control parameters for a
synthesizer.

[Figure 1 would go here if this weren't the ASCII version]

In MIDI, these two streams are confused. There is no way to directly set the
pitch of a note in MIDI. You can say which key was pressed, and what the
position of the pitch bend wheel is, but those are both descriptions of what
the musician's hands are doing, not measurements of pitch. In other words,
MIDI's notion of pitch only goes as far as describing the gestures produced
by a keyboard player, not explicitly controlling a synthesizer.

Obviously, failing to make a distinction between these two ideas does not
prevent music from being made with MIDI. For example, MIDI users understand
that the way to send pitch via MIDI is to pretend that a keyboard player is
pressing a certain key and holding the pitch bend wheel in a certain
position, even if they would rather control pitch directly. (Non-keyboard
MIDI controllers start by knowing the desired pitch; then they have to go
through extra steps to translate the desired pitch into a MIDI key number
plus a pitch bend amount.) Likewise, people use the term "velocity," which
is a measure of how fast a key is pressed, to mean loudness or amplitude.

ZIPI has a distinction between these two kinds of information. Standard
messages, which ZIPI synthesizers expect to see, are descriptions of sounds
that should be produced, not descriptions of gestures that the musician is
producing. So instead of having "key number" and "velocity," ZIPI has
"pitch" and "loudness." But ZIPI also has a second set of parameters
explicitly for describing musicians' gestures. These include keyboard
measurements like key number and velocity, but also parameters that come
from other controllers, e.g., bow position, wind pressure, and striking
position on a drum head.


Controlling Drum Machines
=========================

Many MIDI drum machines and drum timbre modules allow the user to
pitch-shift and pan drum samples.  This can be useful to create what seems
like a large number of instruments out of one single sample.  But since
MIDI's pitch is the same as its address, it is common for each key number to
be assigned to a different sound altogether, as in ``middle C is ride
cymbal, C# above that is closed hi-hat...'' With this scheme, it's
impossible to use MIDI's pitch mechanism to specify the pitch of a drum
sound.  Some MIDI drum machines get around this by letting the user assign
the same sample, with different pitches and pan locations, to multiple MIDI
note numbers (Kawai 1986, Smith 1990), but that easily results in running
out of note numbers.

Furthermore, this mapping from MIDI note numbers to various drum sounds
isn't standard, and can't be set via MIDI.  This makes it difficult for two
drum machines to communicate via MIDI, because MIDI note number 37 might
mean snare drum to one instrument and crash cymbal to another.  Using
different pitch and pan values for the same sound on different MIDI key
numbers just makes this worse, because even if MIDI note 68 is a crash
cymbal on both drum machines, it might be pitch shifted up on one of them
and down on the other.

This can even be a nuisance when sequencing drum tracks from the same drum
machine that will play them back.  For example, suppose your drum machine
lets you specify the pitch and pan of each note as you add it to a drum
pattern.  Once your pattern is complete you want to load it into your
sequencer along with the keyboard parts.  But on many drum machines, the
MIDI note numbers chosen for outgoing MIDI data are determined only by the
instrument being played, not by the pitch of that instrument.  So
translating a drum sequence to MIDI loses the work spent specifying the
pitches.

Drums under ZIPI would be much easier, because pitch and address are
separate concepts, and because each note can have its own pitch, program
change, and pan.  A typical configuration would be to think of a drum kit as
a family, with instruments like snare drum, timpani, cowbell, etc.  Each of
these instruments could be sent a program change message selecting the
appropriate percussion timbre, so there is no ambiguity about the mapping of
instrument numbers to drum sounds.  A percussion sound could be selected by
choosing an instrument, and pitch or pan could be changed by sending a pitch
or pan message to a note in that instrument.

This means that a ZIPI drum machine wouldn't have to provide so much
structure for assigning sounds, pitches, and pans to each key number.
Instead, all of the setup can be done over ZIPI.  To get a new set of
sounds, your controller or sequencer can just send program change, pan, and
pitch messages to each instrument of the drum kit.

ZIPI's MPDL also has note descriptors reserved for drum-specific control
parameters like position on the drum head, and velocity and acceleration.
Continuous hi-hat pedal position, varying from fully depressed to fully
open, would be encoded in ``continuous pedal'' messages.  Hopefully, the
next generation of drum pads and drum machines will take advantage of these
parameters to give electronic drums a level of expressivity closer to that
of acoustic drums.


Data Resolution
===============

Each MIDI byte begins with a status bit that tells whether it is a data byte
or a control byte, so each byte really only has seven user-settable bits.
Seven bits is not enough resolution for a variety of applications, and it is
awkward to send larger amounts of information. It is possible to partition a
14-bit quantity into two separate MIDI controllers, but this is messy and
rarely done. Also, even 14 bits is not enough for many applications; it
would take 3 MIDI bytes (30 bits transmitted) to send a 16-bit word. ZIPI
parameters can have any number of 8-bit data bytes; there is no per-byte
overhead in ZIPI.

MIDI uses only four bits to encode a channel, giving 16 channels. This major
weakness has given rise to kludges like multiple MIDI outputs on a computer,
each with an associated letter. This would give, e.g., 32 MIDI channels,
which could be referred to by special software as A1-A16 and B1-B16 (Roberts
1992). ZIPI addresses are 20 bits, giving over a million possible addresses.


High-Level Parameter Control
============================

Suppose you are playing something on a multi-timbral synthesizer via MIDI,
and that you want to turn down the entire output of the synthesizer via
MIDI. The only way to do it is to send continuous controller 7, volume, to
all 16 MIDI channels. In ZIPI, messages can be sent to any level of the
address space hierarchy, so it would be possible to turn down a group of
instruments all at once (and with only one network message) by sending a
loudness message to the family that contains those instruments. It is even
possible to send a message to all families at once. This should make it
unnecessary to duplicate the same ZIPI message many times to control
different notes.

MIDI also requires a large number of messages to apply a simple function to
a parameter. For example, suppose you would like to exponentially decrease
the volume of a MIDI channel. The only way to do this is to send a stream of
volume controller messages. In ZIPI, it is possible to request that a
certain function modulate a parameter. You could say, for example, "begin an
exponential decay of loudness that takes 2.3 seconds to go to silence" in a
single message, and the decrescendo would then happen without any further
messages. There are some useful pre-defined functions in ZIPI, and a way for
you to send your own tables over the network if you would like to make up
your own functions "on the fly."


Support for Pitch Trackers
==========================

The theoretical lower bound to find the pitch of an arbitrary signal is one
period. The lowest note of a 5-string bass guitar, the B three octaves and a
half step below middle C, is 30.9 Hz. One period at 30.9 Hz is 32 msec. A
MIDI bass guitar can know that the musician is playing a note well before
one msec, just from looking at the amplitude of the signal coming from the
pickup. But it can't know the pitch for at least 32 msec, probably more.

In MIDI, it is impossible to start a note without a commitment to the note's
pitch, since pitch (i.e., key number) is part of a note-on message. The
synthesized note cannot start for quite a long time after the musician plays
it on the bass. A 30 msec delay here is very easily detected by the ear;
that is why most MIDI bass and guitar controllers feel "spongy" or
unresponsive to many musicians.

What can the synthesizer do for the 30 msec between when the note starts and
the pitch tracker knows the pitch? The ear is very forgiving about exactly
what it hears for those 30 msec. Many non-electronic timbres begin with lots
of noise-like sound for at least 30 msec, for example, the hammer noise on a
piano or the wind turbulence on a flute. The pitch can sometimes vary a
great deal during the onset of a note. An examination of brass tones, for
example, shows that there is often an extensive glissando during the attack,
yet we hear the note as having a definite, fixed pitch (Risset and Wessel
1994). It is not that the glissando is imperceptible; it is just that the
glissando is heard as part of the attack characteristic of the tone rather
than as part of the pitch.

The solution therefore would be for the bass guitar controller to send a
note-on message as soon as it knows there is a note. The synthesizer can
play mostly noise, or the wrong pitch, for 30 msec or so, while the pitch
tracker is waiting to find the pitch. When the pitch is determined, the
controller can update the synthesizer, and from then on the synthesizer will
play the right pitch. 

This is easy in ZIPI, since it is possible to articulate a note and then
later correct the pitch of that note. ZIPI also has a way to set the balance
of a sound's pitched and noise portions.


References
==========

International MIDI Association (IMA). 1988. *MIDI 1.0 Detailed
Specification, Document Version 4.0*. Los Angeles, California, IMA.

Kawai. 1986. *R-100 Digital Drum Machine Owner's Manual*. Tokyo, Japan:
Kawai Corp..

Loy, D. G. 1985. "Musicians Make a Standard: The MIDI Phenomenon." *Computer
Music Journal* 9(4): 8-26.

Moore, F. R. 1988. "The Dysfunctions of MIDI." *Computer Music Journal*
12(1): 19-28.

Risset, J. C., and D. Wessel. 1994. "Analysis-Synthesis Methods for Sound
Synthesis and the Study of Timbre." In D. Deutsch, ed. 1994. *The Psychology
of Music*, 2nd Edition. London: Academic Press.

Roberts, A. 1992. "Devices for Increasing the Number of MIDI Channels."
*Computer Music Journal* 16(4): 101-104.

Smith, R. 1990. *PROCUSSION 16 bit Percussion Sound Module Operation
Manual.* Scotts Valley, California: E-Mu Systems.

Tanenbaum, A. S. 1989. *Computer Networks*. Englewood Cliffs, New Jersey:
Prentice Hall.