## TA 6.1: A CMOS 160Mb/s ## Phase Modulation VO Interface Circuit Kazutaka Nogami\*, Abbas El Gamal Information Systems Laboratory, Stanford University, Stanford, CA \*On leave from Toshiba Corporation To achieve high data-transfer rates in digital systems, both bus widths and clock rates are increased. Increasing bus widths costs more pins, thus increasing power consumption and cost. On the other hand, increasing clock rates requires more complex circuit and system implementations. This paper proposes an approach to data transfer that can potentially achieve high rates without requiring the ultra fast clocks of other approaches, or extra wide buses [1]. The idea is to transfer multiple bits over each pin within each clock cycle using modulation techniques common in communication systems. Phase modulation is simple in implementation and demonstrates 160Mb/s peak transfer rate per pin using a 20MHz clock. This scheme may prove effective for chip-to-chip or system bus interface both on printed circuit boards (PCBs) and multi-chip modules (MCMs). The phase modulation scheme is shown in Figure 1. Data is transferred both on the rising and the falling edges of the clock. Each edge can be at one of $2^{\rm n}$ possible positions. Thus 2n bits can be transferred per cycle. Detection of edge positions is accomplished via the reference cycle, defined by the $\overline{\rm REF}$ signal, that provides the reference for determining the positions of the rising and falling edges of subsequent data cycles. The use of the reference cycle eliminates the effect of I/O buffer delay, interconnection delay and waveform distortion caused by parasitics. The number of data cycles per reference cycle, that determines how close the effective data transfer rate is to the peak, must be low enough to achieve reliable transfer rate. The number of bits transferred per transition, n, is chosen to maximize the peak data transfer rate. The peak transfer rate, $TR_{peak}$ , is: $$TR_{peak}$$ =bits/time=n/{(2^n-1) $\Delta$ T+T<sub>min</sub>} where $\Delta T$ and $T_{\rm min}$ are the interval between adjacent phase positions and the minimum pulse width, respectively. Therefore, the maximum number of bits per transition, $n_{\rm out}$ , is: $$2^{n_{opt}}(n_{opt}ln2-1) = (T_{min}/\Delta T) - 1$$ For a TTL interface with $\Delta T$ =0.3~1.0ns and $T_{min}$ =5~10ns, $n_{opt}$ is 3~4. To optimize the effective transfer rate, the maximum number of data cycles per reference cycle must be taken into consideration. This analysis is not presented here. To demonstrate the feasibility of the approach, a test chip that transfers 8b/cycle is fabricated in a 1.2µm CMOS technology. A functional block diagram of the phase modulation I/O (PMIO) circuit is given in Figure 2. An I/O circuit consists of two identical circuits for generating and detecting rising and falling edge data. To realize the stable transition positions, the PLL-based circuit technique of Reference 2 is used for time-interval digitization. The VCOs provide the phase signals $(\overline{\phi0} \sim \overline{\phi15}/\phi0 \sim \phi14)$ to the encoders and the decoders. The control voltages for the VCOs are provided by the PLL. Therefore, the interval between adjacent phase signals is determined by the input clock (CLKin). This circuit enables chips sharing the bus to have identical delays independent of process, temperature, and supply voltage variations. The phases of VCOs are reset either by an output reset signal (ORSTr/ORSTf), or by an input reset signal (IRSTr/IRSTf) generated during the reference cycle. The PLL and the REF buffer are shared by multiple I/Os. The decoder detects which of the 16 possible phase positions is received, and outputs the 4 data bits. A schematic of the decoder circuit is given in Figure 3. A data input signal (DINr/DINf) is compared with the 15 phases ( $\phi 0-\phi 14$ ) produced by the VCO using the arbiter circuit. The word line selectors detect the phase position of the data input signal (DINr/DINf) by comparing the states of two adjacent arbiters. At most one of the 15 word-line selector outputs is pulled high at any clock edge and the corresponding 4b data is latched. The encoder for each edge selects one of 16 phase positions depending on the 4b data to be sent. A schematic of the encoder is given in Figure 4. The encoder circuit is a falling-edge modulation circuit with 16 delay elements. The 16 delay elements needed to generate the output phase position are again determined by the VCO. The rising edge encoder is connected serially to the falling edge encoder so as to modulate both rising and falling edges. The VCO is a 41-stage ring oscillator with an asynchronous reset. Of the 41 stages, 32 generate 16 clocks and the rest guarantee a minimum pulse width for the data signal. The design of the PLL is based on Reference 3. Frequency drift of the PLL determines the maximum number of data cycles per reference cycle. A micrograph of the chip is given in Figure 5. All PLL components, including the loop filter, are included on the test chip. The I/O measures 872x301µm² and the PLL measures 723x436µm². These are sufficiently small for use in an I/O ring. The test chip operates at 20MHz and achieves 160Mb/s peak transfer rate. Figure 6 shows that 6 data cycles per reference cycle are transferred, corresponding to a 137Mb/s effective transfer rate. Chip characteristics and measured performance are summarized in Table 1. The maximum operating frequency is determined not by the TTL interface, but by the maximum frequency of the VCO. This suggests that with a TTL interface and a PMIO implemented in a more advanced CMOS technology, higher transfer rates can be achieved. ## Acknowledgments The authors thank B. A. Fowler and M. D. Godfrey for support, and the MOSIS service for fabrication. ## References [1] Horowitz, M. et al., "PLL Design for a 500MB/s Interface," ISSCC Digest of Technical Papers, pp. 160-161, Feb. 1993. [2] Loinaz, M. et al., "A BiCMOS Time Interval Digitizer for High-Energy Physics Instrumentation," CICC Digest of Technical Papers, pp. 28.6.1-28.6.4, May, 1993. [3] Jeong, D. et al., "Design of PLL-Based Clock Generation Circuits," IEEE J. Solid-State Circuits, vol. SC-22, no. 2, pp. 255-261, Apr., 1987. REF REFOUT REF Buffer DO7~4 DO3~0 REFIN DOUT OCLK Encoder Encoder IRSTr ORST ORST IRST DATA $\overline{\phi}0\sim\overline{\phi}15$ Output $-\boxtimes$ 1/0 Control VCO VCO Buffer CLKin PLL φ0~φ14 Decoder Decoder DINf DINT Rising Falling DI7~4 DI3~0 Edge Edge Control Control Figure 2: Phase modulation I/O block diagram. Figure 3: Decoder schematic. Figure 4: Encoder schematic. Figure 5: See page 318. Figure 6: Measured waveforms. Technology 1.2µm n-well CMOS Chip size $1.6x1.9\mu m^2$ I/O 872x301µm² PLL 723x436µm² $3525 (1827 \, \text{nMOS}, 1698 \, \text{pMOS})$ Transistors Supply voltage 5V Operating frequency 20MHz160Mb/s Peak transfer rate Effective transfer rate 137Mb/s (6 data cycles/ref. cycle) 12.5mW (excluding bus driving) Power dissipation 8mWI/O PLL 4mW Table 1: Chip characteristics and performance.