[To be published in Computer Music Journal 18:4 (Winter 94)]

ZIPI: Origins and Motivations

Keith McMillen
Zeta Music/Gibson Western Innovation Zone
2560 9th St. Suite 212
Berkeley, California 94710 USA
McMillen@CNMAT.Berkeley.edu

The success of alternate controllers has been less than overwhelming in the
history of electronic music. The predominant controller for electronic music
synthesizers has been the piano or organ keyboard. Beside the widespread
availability of pianos and organs and the people who play them, the very
nature of the keyboard makes it an ideal choice from an implementor's point
of view.

Keyboard-style instruments decouple the player from the sound generating
element. The strike of a finger on a key starts a chain of events that
produces a sound. After a key is struck the greatest creative choice left to
the musician is when to release it. This series of key closures and releases
is the simplest form of information that can be used to control a
synthesizer.

The early commercially available synthesizers (Moog's MiniMoog, the ARP
Odyssey, or the EMS Putney VCS-3) were monophonic and non-dynamic. As the
technology evolved, instruments became polyphonic and capable of wide
dynamic response (e.g., the Yamaha DX-7). The control information fed to the
synthesis engine grew to include how fast the key was struck (the MIDI key
velocity). Joysticks, modulation wheels, after-touch, and foot-pedals added
the continuous element to keyboard control.

Historically, this is not unfamiliar to keyboard players. Pipe organs are
non-dynamic but volume can be controlled by foot pedal. In many ways the
connection of keyboards to synthesizers resulted in very little loss of
familiarity of control with a large gain in timbral choice.

For musicians trained on other instruments, the option of synthesis has not
been attractive. Woodwinds, bowed strings, and brass instruments all place
the player in direct physical contact with the vibrating element---reeds,
strings, or columns of air. Instead of limited control of dozens of notes
these instruments offer subtle and intimate control over one or a few notes.
Whether to "trade in" this control for a wider tonal palette is a difficult
decision.


MIDI and Keyboards
==================

MIDI has been serving our interface needs for over a decade. Although many
have criticized MIDI (Loy 1985; Moore 1988; Scholz 1991), no one has done
much about its obvious problems. Alternate controllers have not been a major
factor in the business of electronic music, and therefore have not been well
accommodated by the industry. They represent a challenging problem both
technically and economically. The persistence of an interface standard that
makes the necessary extensions for nuance and control difficult if not
impossible has not helped.

The connection of keyboards to sample playback sound modules is well served
by MIDI. Even the speed of MIDI (31.25 kBaud) is adequate for transmitting
data using the event-based nature of a keyboard. A ten-note chord can be
sent in 6.7 msec, a delay which is on the borderline of being imperceptible.
The continuous controller information generated from a keyboard usually has
no more than three parameters (pitch bend, modulation, and after-touch),
keeping the bit count low.

Problems occur, however, when trying to connect alternate controllers to
synthesizers (Muir and McMillen 1986). Polyphonic instruments such as guitar
and violin can easily "flood" a MIDI channel with data. For example, simply
updating 7-bit pitch bend and volume 100 times a second for six guitar
strings exceeds MIDI's bandwidth:

6 strings * (3 pitch bytes + 3 volume bytes) * 10 bits / 0.01 sec = 36.0 kBaud

(MIDI takes 10 bits to transmit a 7 bit value).

Independent of bandwidth, MIDI also represents data in a manner that assumes
the controller is a keyboard or at least a percussive device. The MIDI
"note-on" command is an indivisible integration of timing, pitch, and
loudness (velocity) information. This is completely appropriate for a
keyboard; every time a key is struck the information for pitch, velocity,
and the timing of the "note" is known simultaneously and is sent out over
MIDI. Every modification of one of these three values is accompanied by a
change, or at least a reassertion, of the other two.

For an instrument of continuous nature, such as a violin, these parameters
are often decoupled. One hand generally determines timing and loudness and
the other decides pitch. They can and do change independently of each other.
Furthermore, the timing of a note is not as simple as the pressing of a
button. Notes can come into audibility gradually as in a crescendo. MIDI
requires that an on/off decision be made at some volume threshold. When this
threshold is met, the velocity value sent in a MIDI command will usually be
the value of this threshold, making the velocity data useless.

MIDI does provide some facility for continuous volume change (controller
#7), and for pitch change without articulation. Some synthesizers respond to
legato-style commands. Pitch bend can vary pitch up and down up to one
octave but with a resolution of only 5.1 divisions per semitone (19.6 cents).


The Good Old Days
=================

Do you remember the days before MIDI? Most available synthesizers were
analog and used analog voltages to represent musical values. Articulation
was separate from pitch and all controllable parameters were on equal
footing. Bandwidth and resolution were not concerns but good intonation was
a perpetual effort---a lot like playing a violin.

The integration of the 8-bit microprocessor into synthesizers largely solved
the tuning issues. The division of the octave so as to be easily represented
with 8 bits produced a strong bias for equal-tempered semitones. Combine
this with the irresistible desire of CPUs to communicate and you soon get
MIDI.

Connecting violins and guitars to digitally controlled analog synthesizers
was still possible. With some cooperation from the manufacturers (Sequential
Circuits, Oberheim, and Moog), analog controls extracted from the string and
injected into control voltage sum nodes could produce an intimate connection
between the controller and the synthesizer. Pitch bends and dynamic changes
were smooth and responsive. Pitch stability remained a problem as this
method bypassed some of the automatic tuning functions.

The emergence of frequency modulation (FM) and other methods of digital
synthesis made voltage-controlled oscillators (VCOs) and filters seem
quaint, and won great popularity because of the clarity and timbral variety
that these techniques offered. Unfortunately, these forms of sound
generation closed off many of the control points to synthesis. At this time,
alternate controller manufacturers were forced out of the hardware, and the
only practical point of entry was through MIDI. While simplifying the
connection, the loss of control was disappointing. At best, the style of
playing guitar or violin was forced into the language of the keyboard.

This did not preclude processing of the audio signal out of the synthesizer.
Several synthesizers have individual outputs for each voice. These could be
mapped to a specific string from the controller and modulated in the analog
domain based on information extracted from the string.

One of the most satisfying examples of this was the connection of a Zeta
Mirror 6 model fret-scanning guitar controller (Wait 1989) to a Yamaha
TX-802 FM synthesizer operating in legato mode. Each of six outputs from the
TX-802 was routed back to the Mirror 6 where it passed through a
voltage-controlled amplifier (VCA). This VCA was then controlled by an
envelope follower tracking the energy of one of the strings. The six VCA
outputs were summed and fed to an amplifier. Guitarists marveled at how
smoothly responsive the instrument felt and the great intimacy of the
control over synthesis. The simple use of continuous dynamic control
returned much of the musical nuance to the interface.

This did, however, limit the choice of synthesizers for the users of
alternate controllers. The emergence of sampling and its eventual
monopolization of the synthesizer market created new problems for
interfaces. The frequency of FM oscillators and VCOs are continuously
variable over the entire range of pitch. Samples, as the name implies, are
not, and require swapping of files that cover specific pitch ranges in order
to cover a wide range of pitches. Playing a trill across a sample boundary
results in discontinuous spectral envelopes for many sampled sounds.

Articulation for FM and VCOs come from external envelopes that can be varied
based upon input parameters from controllers. With sampling, the attack
character of the sampled instrument is inherent in the wave table. Timbral
changes are restricted to, at best, simple filtering or cross-fading between
fixed sounds. Even something as personal as vibrato is often captured by the
sample and is not under the performer's control.

The "skin-deep" beauty of sampling has left many musicians longing for a
more meaningful conversation with their instruments. Nostalgia has even
created a demand for older voltage-controlled synthesizers. Our group
recently completed the design of the Oberheim OBMx, an analog twelve-voice
subtractive synthesizer. One reviewer said of it, "it has what synthesizers
have been missing---a personality" (Aiken 1994). This willingness to give up
the accurate reproduction of acoustic instruments for control is
understandable but the situation demanding this choice is regrettable.


Breaking the Chain
==================

The loudest complaint about alternate controllers that extract information
from traditional instruments is the time delay between the performer's
gesture and the audible response from the synthesizer. However, the Zeta
Mirror 6 guitar, using a combination of switched frets and pitch analysis,
restricted latencies to less than 6 msec over most of its range. With the
delay issues removed, continuous amplitude control became the next, most
obvious, requirement. The technique previously described of controlling the
audio in the post-MIDI analog domain met this need.

Amplitude control is essential but not sufficient; many instrumentalists can
change the timbre of a note as it evolves over time. The mapping of timbral
information extracted from the instrument onto the synthetic voice or voices
is the next step for returning control to the performer. This too could be
handled in the post-MIDI audio path, but the requirement for greater
flexibility and more elaborate processing of control information is best
solved in the digital domain. The need for a new music description language,
and the means to transport this information in a high speed deterministic
network, became clear to us.

The first concepts for what was to become the ZIPI musical data language
started in the fall of 1989, coinciding with the start of intense
collaboration of Zeta Music and the Center for New Music and Audio
Technology (CNMAT) at the University of California, Berkeley. In order to
successfully improve the keyboard-MIDI-sampler path, replacements were
needed for all three of the elements in the chain. Since that date, research
has focused on the Infinity Box (a gesture-guided pitch and timbre to ZIPI
converter), the ZIPI network and its specification, and Frisco (an additive
resynthesis engine with a control structure that will respond to ZIPI MPDL
commands).


The Status of ZIPI and Related Projects
=======================================

As of this writing (June 1994) the physical layer of ZIPI has been
implemented using a ring of Intel 80386-based personal computers with Zilog
8530 cards with ZIPI PALs and current loop hardware installed. Software for
the monitor and nodes has been written for interrupt, polled, and DMA access
methods. Data link and basic network services are functioning. The polled
approach is the only viable method for an Intel 80386-based machine since
the MS-DOS operating system's interrupt latencies are too great to allow
even 250 kBaud operation. DMA requires the capture of an entire packet by
the monitor before it can be parsed, thus slowing the ring. All code
development for ZIPI is written in the C programming language with an
emphasis on portability to other processors.

By the time this article is published (December 1994), we will post some
ZIPI-related software to the Computer Music Journal's ftp site,
mitpress.mit.edu, in the directory /pub/Computer-Music-Journal/Code/ZIPI.

A stand-alone ZIPI hub is in development. This will use a Motorola MC68302
MPU as the ring monitor with four ZIPI connections, 2 MIDI ports (in and
out), and an interface to a computer bus. Additional ZIPI ports can be added
with "dumb" hubs up to the 253 device limit.

The "Infinity Box" (described in the "ZIPI Examples" article) will be the
first ZIPI controller. Planned release time for this is early 1995. Several
prototypes are already in operation, yielding FFT updates for 6 guitar
strings every 8 msec. (Infinity can be driven with one to six audio inputs
allowing it to work with most instruments, i.e., violin, saxophone, cello,
flute, etc.) This spectral data is passed from a signal processor (Motorola
DSP56002) to a general-purpose processor (Motorola MC68332) where it
generates ZIPI-formatted data for use by its internal sound engine and for
transmission out the ZIPI port. Additionally, a subset of the ZIPI data is
extracted and sent out a MIDI port.

The "Frisco" software is at this point running on Silicon Graphics, Inc.
Indigo workstations. Polyphonic "sample quality" sounds are being generated
in real time under a ZIPI-like control structure. Since this technique uses
hundreds of oscillators to resynthesize analyzed sounds, intelligent
high-level control over these oscillators yields a powerful synthesis
technique that is capable of great realism as well as subtle, intuitive
musical control. Presently a Macintosh computer running the Max software
communicates MPDL control information using IP/UDP packets over Ethernet. A
planned self-contained rack-mount version should be available in 1995. This
device will have ZIPI, MIDI and Ethernet inputs.

Just as the RS-232 serial connection standard continues to exist along with
Ethernet, we have no illusions that ZIPI will replace MIDI. Likewise, just
as MIDI has been pressed into service in areas never intended by its
designers (machine control, mixer automation, lighting), we can't fully
anticipate other manufacturers' networking needs.

ZIPI was presented to the industry at the winter NAMM shows in January 1993
and 1994. Participation and suggestions of many companies have added much to
the scope and practicality of what is presented here. We encourage readers
to suggest details for additional application layers, such as machine
control, studio automation, and sample dump and audio standards.

We hope this series of articles provides an understanding of the basics of
ZIPI, and, even moreso, a stimulus for comments, additions, and discussion
of user concerns. A more detailed specification of the ZIPI network is
available upon request. Please direct comments to the following address:

ZIPI Group
G-WIZ
2560 9th St., Suite 212
Berkeley, California 94710, USA
electronic mail: zipi@CNMAT.Berkeley.edu


Acknowledgments
===============

This work was supported in part by Grant C92-048 from the California State
Department of Commerce Competitive Technologies Program to CNMAT and Zeta
Music Systems, Inc. We would also like to thank the following individuals
for their valued comments, sometimes very critical, regarding the ZIPI and
MPDL specification: Jim Aiken, David Anderson, Marie-Dominique Baudot,
Richard Bugg, Tim Canning, Chuck Carlson, Lynx Crowe, Rob Currie, Steve
Curtin, Peter Desain, Kim Flint, Adrian Freed, Guy Garnett, Mark Goldstein,
Henkjan Honing, Dean Jacobs, Henry Juszkiewicz, Michael Land, Carl Malone,
Dana Massie, Bill Mauchly, Peter McConnell, F. Richard Moore, Chris Muir,
David Oppenheim, Stephen Travis Pope, Rob Poor, Miller Puckette, John
Senior, Warren Sirota, John Snell, Michael Stewart, Tovar, and David
Zicarelli.


References
==========

Aiken, J. 1994. "Oberheim OBMX Keyboard Report." *Keyboard* 20(8).

Loy, D. G. 1985. "Musicians Make a Standard: The MIDI Phenomenon." *Computer
Music Journal *9(4): 8-26.

Moore, F. R. 1988. "The Dysfunctions of MIDI." *Computer Music Journal
*12(1): 19-28.

Muir, C., and K. McMillen. 1986. "What's Missing in MIDI?" *Guitar Player*
June 1986.

Scholz, C. 1991. "A proposed extension to the MIDI specification concerning
tuning." *Computer Music Journal* 15(1): 49-54.

Wait, B. 1989. *Mirror-6 MIDI Guitar Controller Owner's Manual* Oakland,
California: Zeta Music Systems, Inc.