Abstract— This paper presents an investigation of Signal Integrity of the address bus in a DDR4 memory application. The fly-by topology is simulated at the highest DDR4 switching rate of 1.6 Gbps in a configuration comprising 8 memory devices. Interconnect impedance optimization is carried out to maximize eye opening. A new termination scheme that results in a reduced pattern dependent jitter is described. It consists of the use of series end termination resistors. Numerical results of the effect of trace impedance and termination resistor location are presented.
Presented at the EMC & SI conference, 2015.
Keywords— DDR4, Fly-By topology, series termination, Signal Integrity, Eye diagram, Jitter.
DDR4 technology  has enabled single ended signaling at data rates as high as 3.2 Gbps. The two main category of buses involved are the data and address, command and control buses. The data bus comprises several byte lanes. Each byte lane includes 8 data bits (termed DQ), a data mask bit (termed DM) at data inversion bit (termed DBI) and a differential strobe (termed DQS). The differential strobe will operate at a frequency of 1.6 GHz. Both the rising and falling edges of the differential DQS bit are used to latch the remaining bits in the byte lane.
The address, command and control bus comprises a differential clock and a number of address, command and control signals. The differential clock operates at a frequency of 1.6 GHz. All address command and control signals are latched only at the rising edge of the clock and consequently their effective rate is 1.6 Gbps.
All signals are connected from a memory controller (Cont.) to each memory device (U1, U2,…). In a single rank memory system, the data bus is a point-to-point bi-directional connection between the memory controller and each memory device. Both the controller and memories have On Die Termination (ODT) and an output impedance that is controllable in discrete steps. Consequently, when a controller is writing data, it can be programmed to use an optimum output impedance and ODT value. This results in a near ideal situation of perfect match enabling extremely high speed data transfer.
The address bus, on the other hand is a multipoint uni-directional connection from a controller to the memory. Therefore, unlike the data bus, the design of this interconnect can be challenging. Signal integrity is ensured by using what is termed a “fly-by” topology with a far end pull-up termination as shown in Fig. 1. The “fly-by” topology is essentially a daisy chain connection with a very short stub. A similar topology is used for the differential clock net as shown in Fig. 2. The eye diagram of the address signal at each memory device must have adequate amplitude and width for an unambiguous detection. It must also be synchronous with respect to the rising edge of the clock signal. The clock signal is required to have an adequate amplitude and a monotonic rising edge. The goal of the interconnect design is to ensure that these requirements are met by a proper choice of interconnect impedance, trace length and termination values.
First, the case of an interconnect with uniform trace impedance is simulated and is used as the reference. Next, an optimized interconnect impedance case is simulated. In the last two examples, series end-terminators are used to illustrate the benefit. All simulations in this paper use a linear model for the controller with an output impedance of 40 Ohms and a high input impedance model that is typical of memory input pins. PCB thickness is 80 mils and via stub length = 50 mils.
II. COMMON IMPLEMENTATION
A straightforward implementation consists of routing the entire interconnect with a uniform trace impedance of 50 Ohms. Eye diagrams at all the memories are shown in Fig. 3. It can be seen that the waveform integrity is usually the best at the last device closest to the pull up resistor. The first and intermediate devices are more strongly affected by reflections and will exhibit increased jitter and a reduced amplitude.
Fig.3. Case 1: 1.6 Gbps Eye diagrams at the 8 memories (TL1 = 3500 mils, impedance = 50 Ohms, TL2 = TL3 = 1000 mils, impedance = 50 Ohms, STUBS = via with a long stub and a trace length of 100 mils, 50 Ohms trace.
III. INTERCONNECT IMPEDANCE OPTIMIZATION
It is easily possible to optimize by ensuring that stubs and the trace segments TL2-TL3 have high impedance and keeping the impedance of TL1 at a low value. A practical value of low impedance is 25-40 Ohms and high impedance is 50-60 Ohms. Eye diagrams at all the memories are shown in Fig. 4. An improvement in Eye opening, in particular amplitude, is noticeable as compared to Figure 3.
Fig. 4. Case 2: 1.6 Gbps Eye diagrams at the 8 memories (DTL1 = 3500 mils, 40 Ohms impedance, TL2 = TL3 = 1000 mils, 50 Ohms impedance, and , STUBS = via with a long stub and 100 mils of 50 Ohms trace.
IV. OPTIMIZATION USING SERIES END TERMINATIONS
A typical interconnect involving 4 memory devices is shown in Fig. 5 for display clarity. Breakout from the controller and at each memory device requires a via. A short trace segment is also invariably required at each memory pin except in situations where blind vias are used. This constitutes a stub as shown in the inset of Fig. 5. Reflections from each memory device degrade waveform integrity.
Reflections in transmission lines can be reduced or eliminated by using passive termination elements . Commonly used terminations consist of series resistors placed close to the source, and shunt terminations placed close to a receiver. In this work, the use of a series resistance placed close to a receiver is investigated. Basically a resistance of a value equal to the transmission line impedance will combine with input capacitance of the receiver and act as an RC termination reducing high frequency reflections.
Therefore, if a discrete resistance can be placed at the precise location of the stub, namely R1 (hypothetical), one would expect an attenuation of the reflected signal. Alternatively the resistor can also be placed at R2 which is more realistic. Simulated results for both locations for the resistance of a value = 40 Ohms are tabulated in Table I. The waveforms for Case 3 only are shown in Fig. 6. Those for Case 4 cannot be visibly distinguished and are omitted.
It can be seen that the use of resistive end termination is most effective at the position R1. Placement at the position R2 will also yield an improvement in the eye opening although with a slightly reduced performance. Both cases show a substantial reduction in jitter. The value of the termination resistance also plays an important role. If the value is too large, amplitude reduces and affects noise margin adversely although there will be a reduction in jitter. If the value is too small, both jitter and amplitude increase. A value in 40-50 Ohms range is found to be best suited.
It is also of importance to determine the effect of end terminators on the clock waveform. In simulations, the topology of the clock net is identical to that of the address net and trace lengths are also identical, i.e., DTL1 = TL1, DTL2 = TL2, and DTL3 = TL3, and DSTUB = STUB in Figures 1-2. This is required to ensure synchronism between the clock and address net. The differential transmission lines are treated as two single ended uncoupled lines. The differential impedance of DTL1-3 and DSTUB is simply twice that of TL1-3 and STUB. Figures 7 and 8 show one cycle of the differential clock waveform for cases 2 and 3. Other cycles are suppressed for clarity. It can be seen that the waveform amplitude is more attenuated at the last memory device for case 3 although the requirements are still met.
Lastly, the impact of optimization on the relative delay between clock and data nets is analyzed. Figure 9 shows the strep response of both clock and data nets for case 2. It can be seen that the clock is delayed with respect to the address net and the delay increases as one moves away from the controller. In this case the delay difference is 106-12 = 94 pS.
Figure 10 shows the step response of both clock and data nets for case 3. It can be seen that the clock is delayed more due to the resistive loading. In this case the delay difference is 120-32 = 88 pS, which is less than that of case 2.
In both cases, the clock can centered by making DTL1 to be ~50 pS longer than TL1. There is no significant signal integrity benefit in using series end terminations on the clock net although it would certainly help in reducing radiation.
TABLE I. EYEOPENING FOR THE CASES SIMULATED
Fig. 6. Case 3: 1.6 Gbps Eye diagrams at the 8 memories (TL1 = 3500 mils, 40 Ohms impedance, TL2 = TL3 = 1000 mils, 50 Ohms impedance, and , STUBS = via with a long stub and 100 mils of 50 Ohms trace, 40 Ohms series end termination resistors placed at position R2 at each memory device.
In this work, signal integrity of the address bus in a DDR4 memory system is investigated. It is shown that jitter can be reduced by using series end terminators at the memory devices. Implementation of such a scheme using discrete resistors would become impractical due to space constraints. This can be circumvented with the use embedded passive resistors. The termination technique is also useful in other applications involving multi-point buses. In particular, it can ensure monotonic clock edges, and also help in reducing radiation without compromising waveform integrity significantly.
Syed Bokhari of Fidus Systems
 DDR4 SDRAM Standard, JEDEC JESD 79-4, September 2012.  Brian Young, Digital Signal Integrity, New Jersey: Prentice Hall, 2001, Chapter 2.