Byte Article Page 3

Photo 4 (Click to Enlarge)

Fine-Tuned for High Throughput

The cube's internal construction mirrors the simplicity of its exterior (see photo 4). The main unit's cubic housing is made of lightweight magnesium. Inside are four 32-bit NuBus slots, one of which holds the system's main CPU board. All the cube's system electronics reside on this densely packed CPU board, which makes heavy use of surface mount devices; the cube is essentially a single-board computer. With the exception of a bipolar array used to manage the video display and perform Manchester encoding/decoding for Ethernet communications, all the CPU board's parts use low-power CMOS components.

The power supply mounts inside the housing on two screws; the entire box is cooled by a large, quiet, low-speed fan. The nonswitching power supply can handle voltages ranging anywhere from 90 volts to 260 V, and frequencies from 50 Hz to 60 Hz. This means that you can plug in the same hardware almost anywhere in the world without having to set switches. The cube should also prove resistance to the vagaries of commercial electrical power. It's power supply generates 200 watts, of which the monitor uses 50 W, and 25 W is allocated for each slot.

NeXT's design for a workstation for the nineties used four important strategies. First, when possible, high-performance components were used. The CPU board is built around the 68030 processor and 68882 floating-point unit, both running at 25 MHz. For SCSI peripherals, the NCR 53C90 SCSI interface chip provides a maximum 4-megabyte-per-second transfer rate. That's considerably faster than the 1.5-megabyte-per-second rate of the older NCR 5380 chip. For mass storage, an optional high-speed hard disk drive using the SCSI bus is available. This hard disk holds 670 megabytes of formatted data and has an average seek time of 18 milliseconds.

However, even a high-performance processor can be slowed to a crawl if it must service every I/O call, or wait on slow peripherals. (Steve Jobs put it this way: " MIPS is only one-third of the equation; sustained system throughput is the key.") So, the second part of NeXT's design strategy was to minimize the overhead of communicating to the outside world by offloading as much I/O from the CPU as possible onto smart I/O processors managing each peripheral (see figure 1). This happens to be a matter of necessity given the amount of I/O the cube is doing. Consider that the cube's synthesized digital sound is handled by a Motorola DSP56001, 20 megahertz digital signal processing (DSP) chip. The DSP56001 provides the cube with its ability to synthesize compact disc-quality stereo sound -- no mean feat when you consider it must handle two channels of 16-bit data sampled at 44.1 kHz. Although the primary function of the DSP is to minimize system overhead while processing high-quality sound, you can program the DSP56001 to manipulate any sort of digital data, say, signal filtering or image processing (see the text box "The Cube's Digital Signal Processor"). The DSP makes the cube an excellent machine for laboratory and experimental work.

The Cube's Digital Signal Processor

The cube comes equipped with a Motorola DSP 56001, and a 88-pin CMOS chip designed for data-intensive real-time signal processing applications. At the core of the chip are three execution units-- data arithmetic logic unit (ALU), address-generation unit, and program-control unit-- that operate in parallel to provide the necessary throughput.

The DSP works with 24-bit digital data, providing 144 decibels of dynamic range. Two internal 56-bit accumulators provide 336 dB of dynamic range during arithmetic operations so the precision of the intermediate results is retained during data-processing.

The DSP56001 is programmable, allowing it to be tailored for a specific purpose. The 16-bit address-generation unit combined with hardware select lines for program code or data, can access three separate 64K words of an external memory space (192K words total, where a word is 24 bits of data).

The DSP56001 has one-chip program memory composed of 512- by 24-bit-wide RAM cells, of which the bottom 64 cells are used for interrupt vectors. To DSP programs can occupy the remaining memory, or if they're large, they can reside in the external program space. In the latter case, the on-chip program memory can serve as a fixed cache. Program instructions are 24 bits wide, and each bit is significant.

On the cube, the DSP56001 is clocked at 20 MHz, and instructions execute every two clock cycles to give the chip a 10-MIPS (millions of instructions per second) rating. The DSP instruction set consists of 62 mnemonics that include math, logical, bit-manipulation, loop, and program-control instructions. The math instructions in compass such operations as absolute value, add, subtract, shift the left/write, shift left/write and add (useful for implementing the butterfly computation in certain fast Fourier transforms), compare, signed multiply, signed multiply and accumulate, and signed multiply accumulate and round (MACR).

All these instructions-- notably some of the mass instructions just mentioned-- are not high blind and execute in one instruction cycle (two clock cycles). For example, as the MACR instruction executes, and instruction pre-fetch, 24- by 23-bit multiply, 56-bit add with convergent rounding, two data moves, and two pointer updates are performed, and all within one instruction cycle. Such powerful instructions are possible because of the peril operation of the three execution units. These powerful arithmetic instructions, coupled with its high throughput, allows the DSP56001 to literally process data on the fly.

Inside the DSP 56001 are four 24-bit bidirectional data buses: X, Y, program, and global. Digital data is split into X and Y components and can be treated as such in two separate 64K-word external memory spaces. On the cube, 24K bytes of static RAM provides 8K words of contiguous scalar data, or 4K words of X and Y data. How this data is ordered in SRAM on the cube is determined by what range of addresses you write into the chip's external memory space.

The two 56-bit accumulators in the data ALU can operate on the X and Y data sets in parallel. Breaking the data into X and Y components provide certain advantages. For example, the data can be treated as X and Y coordinate data for image processing or graphics, or his real and imaginary components for complex math, or his coefficients and data for digital filtering. Each X and Y data bus has an on-chip memory composed of 256- by 24-bit cells that is used to improve performance. The program bus pre-fetch his DSP program instructions into the on-chip program memory. The global bus is used for internal data routing in within the DSP.

The DSP 56001 has three I/O ports: A, B and C. Port A has a 24-bit bidirectional data blocks, and the address unit can access external memory for off-chip program code or data. Various control lines determine operations such as whether to access program or data memory, X and Y data, and if the operation is a read or a write.

Port B handles 8-bit data to and from a host processor that could be a CPU, DMA (direct memory access) hardware, or even another DSP. Control signals for this bus permit interrupt-driven or DMA transfers of data.

Port C consists of two full-duplex serial ports. The first port is the serial communication interface (SCI) that provides standard asynchronous rates up to 312.5K bits per second, and up to 2.5 megabits per second for synchronous data transmission. Although these signal timings are RS-232C-compatible, the voltage levels range from 0 volts to 5 V, so line driver is required to produce a true RS-232C signal.

The second port is the synchronous serial interface (SSI) and is a programmable serial interface. You can set the number of bits per word, protocol, clock rate, and mode is required to transfer data that up to 5 megabits per second to and from a variety a peripheral devices.

An example of the DSP56001's processing capability is given by one of Motorola's application notes, with a chip is used as a 10-band graphic equalizer for a digital stereo system. In this document, a compact-disk digital stereo signal (two channels of 16-bit data sampled at 44.1 kHz or 88,200 16-bit digital samples a second) goes through the DSP56001's SSI on port C. Next, real-time digital filtering is performed on 20 bands (10 bands per channel), and filtered data returns to the stereo system, again via the C port's SSI. This admittedly down-to-earth example shows the processing power that the DSP56001 can bring to bear on a problem. The sampling rate of the DSP 56001 depends on the amount of data processing going on that the same time, but it can reach a maximum of 1.66 megawords per second.

As a computer peripheral, you could use the chip in a number of applications; speech synthesis, voice recognition, high-speed modems, image processing, two-dimensional graphics, and real-time filtering of digital data. Although the signed 24-bit resolution may seem limiting for some scientific and engineering applications, you can always use the cube's math coprocessor. But for those problems that do fall within this range, the DSP56001 will be more than adequate.

System Schematic (Click to Enlarge)


  • A DB-19 monitor port carries all video signals, video data, control signals, mouse movement, stereo sound, and 12 V DC power to the NeXT monitor. Both the sound I/O data and video data (1 pixel every ten microseconds) are managed by dedicated DMA (direct memory access) channels.
  • A " thin" coaxial Ethernet Port operating at 10 megabits per second and is driven by an AM7996 Ethernet transceiver chip.
  • A DB-9 serial printer port drives the NeXT laser printer (see the text box "The NeXT Laser Printer"). This port transfers data at 1.8 Mbps when printing at 300 dots per inch, and 3.2 Mbps when printing at 400 dpi.
  • A DB-25 SCSI port. Its signals are identical to those of the Apple Macintosh SCSI port. As mentioned earlier, the SCSI bus can transfer data to a peripheral at up to 4 megabytes per second.
  • Two serial ports that use the Macintosh mini DIN-8 serial connectors and signals. Both serial ports can handle up to 230.4K bits per second synchronously (the same as Apple's LocalTalk), and 38.4 bps a synchronously.
  • A DB-15 DSP port connects to both the asynchronous (SCI) and synchronous serial (SSI) channels on Port C of the digital signal processing chip. This port can be used to receive or output digital data.
Looking inside the case, the main CPU board has two more ports: a 20-pin connector for the optical disk drive, and a 50-pin SCSI connector for a hard disk drive. Finally, inside the cube's housing are four 32-bit NuBus slots. Each slot uses a Eurocard type C connector. NeXT has implemented a CMOS NuBus with twice the data rate of the standard NuBus for its backplane bus. The CPU board assumes the ID of the slot it occupies. Although they're not used for outside communications, each of these devices can make demands on the system.

For digital sound synthesis, there happened to be an off-the-shelf component--the DSP56001--that could be assigned the job. Unfortunately, there aren't high-speed processors available that could deal with the rest of the system's I/O, and certainly none that could handle the magneto-optical drive. Two custom VLSI chips were designed to manage the cube's remaining I/O subsystems. These chips handle the SCSI interface, the magneto-optical drive (including error-correction logic), the serial ports, and Ethernet transfers.

Both these chips pack a lot of components: According to NeXT, each chip contains about 10 times the amount of logic circuitry used by an entire Mac II.

But there's still a problem lurking here, subtly related to I/O: how to manage data to and from these I/O processors. If the CPU must periodically transfer data between memory and various I/O processors, the system's performance is still degraded.

NeXT's third design strategy was to improve data throughput within the system itself by managing these transfers with custom DMA hardware. This DMA hardware is implemented in one of the same VLSI chips that helps manage the system I/O. There are no less than 12 DMA channels on the main CPU board. They include the following:

  • two Ethernet channels (one for transmitted data, one for received data),
  • one video channel,
  • one serial channel (for both serial ports),
  • one DSP channel,
  • two disk channels (one for the magneto-optical drive, one for a SCSI hard disk drive),
  • one printer channel,
  • one memory-to-DMA register channel, and
  • two sound channels (one for input, one for output).

For the memory-to-register and register-to-memory DMA channels, "register" corresponds to a 16-byte register buffer in the DMA hardware. The contents of these registers can be copied repeatedly under DMA control to memory. An example of this would be to copy a background pattern for the video display into the DMA registers, and then use the register-to-memory DMA channel to copy the pattern into all of the video memory.

The final aspect of NeXT's overall design strategy to improve throughput is that when the 68030 processor must access memory, it attempts to do it efficiently. The 68030's burst read cycle is used where possible, since this mode allows four long words (128 bits) to be transferred in 9 clock cycles -- roughly twice as fast.

NeXT Laser (Click to Enlarge)


The NeXT Laser Printer

Let's face it: There are certain situations in your computer work where you must have printed output. NeXT's answer to this problem is a low-cost 400-dot-per-inch laser printer. There's no entry-level dot-matrix printer offered; NeXT is banking on users preferring laser-printed output. Since the cube handles screen imaging with Display PostScript, it also makes sense to take advantage of a high-resolution PostScript-compatible printer. The printer costs $1995.

The NeXT printer is built around a custom-designed laser engine based on the Canon LBP-SX laser engine. It can print eight pages per minute and uses the same toner cartridge as the Apple LaserWriter II printers. A user-selectable printing mode lets the printer produce pages at either 300 or 400 dpi. The printer has its own power cord, and the power supply is set for 110 volts or 220 V levels with a switch.

The printing process involves imaging the page inside the cube using Display PostScript, and then bit-blasting it to the printer. This is similar to the method used by Apple's LaserWriter IISC, except that the cube uses Display PostScript, and the Mac uses QuickDraw. Since massive amounts of data must be transferred to the printer to produce a page, the printer port has its own direct-memory-access channel.

One limitation of the printer is that it will only work with the cube. Also, you cannot network it like PostScript printers that use Apple's LocalTalk, although you could use a cube with a NeXT laser printer to act as a print server on a network. The cube can print to non-NeXT PostScript printers using its serial ports and Unix printer drivers.

NeXT Page