A flying-adder architecture of frequency and phase synthesis with scalability

Most of today's digital designs, from small-scale digital block designs to system-on-chip (SoC) designs, are based on "synchronous" design principle. Clock is the
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002 637 A “Flying-Adder” Architecture of Frequency andPhase Synthesis With Scalability Liming Xiu  , Member, IEEE,  and Zhihong You  , Member, IEEE   Abstract— Most of today’s digital designs, from small-scaledigital block designs to system-on-chip (SoC) designs, are basedon “synchronous” design principle. Clock is the most importantissue in these designs. Frequency and phase synthesis is closelyrelated to the clock generation. A frequency and phase synthesistechnique based on phase-locked loop is proposed in [1] thatdelivers high performance, easy integration, and high stability.However, there are problems associated with this architecture,such as: 1) its highest deliverable frequency is limited by the speedof the accumulator and 2) the phase synthesis circuitry will notwork well in certain ranges (dead zone) and in certain conditions(dual stability). This paper presents an improved architecturethat addresses these problems. The new frequency synthesiscircuitry has scalability for higher output frequency. It also hasan internal node whose frequency is twice that of output signal.When duty cycle is not a concern, this signal can be used directlyas clock source. The new phase synthesis circuitry is free of “deadzone” and “dual stability.” The improved architecture has betterperformance, is simpler to implement, and is easier to understand.  Index Terms— Clock generation, CMOS circuit design, fre-quency synthesis, phase-locked loop, phase synthesis, spreadspectrum, voltage-controlled oscillator. I. I NTRODUCTION T HE technique of frequency and phase synthesis has wideapplication in today’s consumer electronic and telecom-municationsystems.ThearchitectureproposedbyMairandXiu[1] has many unique features.1) The synthesized frequency can be changed instantly innext cycle without any dynamic process found in tradi-tional phase-locked-loop (PLL)-based technique.2) Any frequency within a certain range can be achievedwith controllable accuracy.3) Various phase-shifted and various duty-cycle versions of the output signal can be generated.4) The required voltage-controlled oscillator (VCO)/PLL isrunning at a single fixed frequency; therefore its design ismuch simplified.5) The frequency and phasecontrol words can be modulatedeasilytoproduceahighlyaccurateandpredictablespreadspectrum clock source, etc.Thisarchitecturehasbeenusedwidelyinmanydesignssinceitsinventionandiscommonlycalled“FlyingAdder”architectures.In the process of implementing this architecture into thesedesigns, it is found that the architecture can be improved in sev- Manuscript received July 26, 2001; revised January 8, 2002.Th authors are with Texas Instruments, Dallas, TX 75024 USA (e-mail:; Object Identifier 10.1109/TVLSI.2002.801607 eral places to make it better and faster. The improvements arefocused on the following areas:1) making the frequency synthesis circuitry simpler andfaster;2) making the architecture of frequency synthesis scaleablefor higher output frequency;3) making the phase synthesis circuitry simpler and dead-zone, dual-stability free.In this paper, Section II presents the problems of the currentarchitecture. Section III is the improvement to the current fre-quency synthesis circuitry. The scalability of the new frequencysynthesis architecture is discussed in Section IV. The improve-ment to the phase synthesis circuitry is described in Section V.SectionVIistheimplementationguideline.SectionVIIisanex-ample of how this frequency synthesizer can be used as a digitalcontrolled oscillator (DCO) in an all-digital PLL. Section VIIIis the conclusion.II. T HE  P ROBLEMS OF  C URRENT  A RCHITECTURE This section will bedevoted to thediscussion of theproblemsassociated with current architecture.  A. The Problem of One-Path Frequency Synthesis Circuitry Fig. 1 shows the principle idea of the Flying Adder frequencysynthesis architecture. VCOOUT[31:0] is the 32 outputs fromVCO. The 10-bit adder in the figure is responsible for gener-ating the address for the MUX, which will select oneout of 32 available VCO outputs to trigger the D-flip-flop. TheD-flip-flop is configured as a toggle flip-flop to generate theoutput frequency. This adder is responsible for both the risingand falling edges of the output . Therefore, the highest pos-sible frequency of output is half the speed of this adder.Theoretically, this mechanism is straightforward and worksperfectly. In reality, there is a problem associated with theMUX. As shown in Fig. 2, for any MUX other thanMUX, there is a potential glitch problem on the outputwhen the address bits are switching. The physical reason forthis glitch is that when multiple address bits are changed fromone combination to another, there is no guarantee that all theindividual bits can switch at the same time. This will result insome intermediate values in the process of switching. For ex-ample, in Fig.2, the address isswitching from“00000”to “11111” at time . The waveform is the ideal output. If forsome reason we get an intermediate value of “10101” duringthe process of switching, the actual waveform will be . Thispotential glitch will falsely trigger the D-flip-flop and generate 1063-8210/02$17.00 © 2002 IEEE Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:06 from IEEE Xplore. Restrictions apply.  638 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002 Fig. 1. The principle idea of Flying-Adder architecture.Fig. 2. The glitch of MUX. wrong signal waveform. Hence, the circuitry of Fig. 1 cannotbe used directly for frequency synthesis. Fig. 3. The two-path frequency synthesizer.  B. The Problems of Two-Path Frequency Synthesis Circuitry The two-path frequency synthesis circuitry was shown in [1,Fig. 8], which is repeated here as Fig. 3. Although this circuitryworks very well, it can still be improved in certain areas. Oneof the challenges of implementing this architecture is that theworking mechanism is very hard to understand due to the  inter-locking of the two paths,  the  self-clocking of the registers,  andthe  pipeline operation . This mechanism is inherently complexand requires designers’ attention. 1) Two-Path Interlocking:  As explained in [1], this fre-quency synthesis circuitry is composed of two paths, whichmake the 32 VCO ticks look like 64 ticks. The two paths areinterlocked through two  AND  gates and  XOR  /  XNOR  gates. Thetwo  AND  gates and the feedback self-clocking of the registersensure that at any given time, there is one path switched on(MUX output can be sensed by CLK pin of D-flip-flop) andone path switched off.As addressed in Section II-A, if the output of MUX is di-rectly connected to the clock pin of the D-flip-flop, a potentialglitch can falsely trigger the flip-flop and generate the wrongfrequency. This problem is solved elegantly through the use of interlock, as shown in Fig. 3 and demonstrated in Fig. 4. Whentheaddressbitsofanyofthetwo MUXsareswitching,the output of that MUX is locked by the  AND  gate through thefeedback. When the  AND  is in unlocked state, the mechanismof registers self-clocking ensures that the address bits are al-ready stable. Therefore, the two D-flip-flops will never see any Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:06 from IEEE Xplore. Restrictions apply.  XIU AND YOU: “FLYING-ADDER” ARCHITECTURE OF FREQUENCY AND PHASE SYNTHESIS 639 Fig. 4. The two-path interlock. glitch. In Fig. 4, “ANDx locked” means that the feedback inputof the  AND  gates is zero. “ANDx unlocked” is for that input at“1.” “MUXx decoding” means that the address bits of MUX areswitching; “MUXx stable” is for address bits in steady state.This configuration of two D-flip-flops and  XOR  /  XNOR  gatesworks well. But it prevents this architecture from being able tobe expanded to four paths (or even more) to increase the highestpossible output frequency. 2) Gates’ Delay:  One of the speed-limiting factors of thiscircuitry is the delays of   AND , D-flip-flop, and  XOR  /  XNOR  ( ,, and ) [1]. Reducing or eliminating some of them will po-tentially make the output frequency higher. 3) The Speed of the Accumulator:  As addressed in [1, Sec-tion V], the speed of the big accumulator is the bottleneck forhigher output frequency. Differential VCO can deliver an  even number of outputs (ticks) for the synthesizer. This can help toeliminate the need for modulus 31 check and increase the speedof accumulator significantly. C. Problems Associated With Phase Synthesis Fig. 5 shows the principle idea of phase synthesis. The adderin PHASE_GEN is used to add  PHASE[4:0]  to the current ad-dress of the MUX in FREQ_GEN to generate the proper delaysignal  Z_SHIFT  . Since a finite time delay is associated with anyphysical adder, certain phase range cannot be achieved in thisconfiguration (dead zone). The dead zone corresponds to thelow value range of   PHASE[4:0] . It is the time needed for theadder to finish its addition. One solution to this problem is toalways use the second half (high value) of   PHASE[4:0]  and toinvert the  Z_SHIFT   when the first half is needed.When the two-path architecture is used for phase synthesis,as shown in [1, Fig. 12], there is an additional problem of “dualstability” when  PHASE[4:0]  and  FREQ[32:0]  are in certaincombination. Under these conditions, there are two possible  Z_SHIFT   locations for a given  PHASE[4:0]  value [1, Fig. 11].Thisphenomenoniscalled“dualstability”andisnotacceptablefor real application. The design tradeoff for compensating thisproblem is very complicated.III. T HE  N EW  F REQUENCY  S YNTHESIS  A RCHITECTURE This section will discuss the techniques to improve the fre-quency synthesis circuitry of Fig. 3. Fig. 5. The principle idea of phase synthesis.  A. VCO Design As shown in [1, Figs. 9 and 10], the accumulator has to per-formthetaskofadditionaswellasmodular31check.Thismod-ular check is needed because the VCO architecture of that de-sign is an invert ring and the number of invert stages in the ringhas to be an  odd   number. A differential VCO architecture asshowninFig.6canbeusedtodeliveran even numberofoutputs[2]–[4]. This will eliminate the need for modular 31 check andspeed up the accumulator. In this design, a crystal of 14.31818MHz is used as a reference clock. A divider of 20 is insertedin the PLL’s feedback loop. Therefore, the VCO is running at286.3636 MHz (3.492 ns). The VCO has 16 differential delaystageswith32outputs.Thedelaybetweenanytwoadjacentout-puts is ns ps.  B. The New Synthesizer  As addressed in Section II-B, if we reduce or eliminate thedelays of   AND  gates, D-flip-flops, and  XOR  /  XNOR  gates ( , ,and ), we can potentially make the synthesizer faster. Fig. 7 isthe improved architecture.ComparingtoFig.3,itcanbeseenthatthetwo AND gates,twoD-flip-flops, and  XOR  /  XNOR  gates have been replaced by one2 1 MUX and one D-flip-flop. This modification achievesthesametwo-pathinterlockfunctionbutreducesthesignificantly. Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:06 from IEEE Xplore. Restrictions apply.  640 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 10, NO. 5, OCTOBER 2002 Fig. 6. The VCO structure. The 2 1 MUX in front of the D-flip-flop is used to selectone 32 1 MUX’s outputs from the two paths. It will nevergenerate a glitch in its output by itself since it has only one ad-dress bit. To ensure that it does not pass the possible glitchesfrom the two 32 1 MUXs, its address signal has to be con-nected to  CLK1  for the reason described below.By studying the triggering clocks of all the registers and thewaveforms of   CLK1  and  CLK2 , it can be seen that the addressbitsofthe32 1MUXthatis currently selectedbythe2 1MUX are already stable before that path is selected. For ex-ample, the top path is selected when  CLK1  . But the ad-dress bits of that MUX were available at the rising edge of  CLK1 , which is a half-cycle ago. Thus, when this path is se-lected, the address bits already had a half cycle of time to settledown and should be stable long before the falling edge arrives.Instead of   CLK1 , if   CLK2  is used as the select signal of the2 1 MUX, then there is a potential risk of passing a glitch tothe D-flip-flop. It will also degrade the speed of this circuitry.Physically, the two 32 1 MUXs and one 2 1 MUXcannot be combined as one 64 1 MUX because the addressbits of the two big MUXs are connected to different adders.Since this modification eliminates the  XOR  /  XNOR  gates, theworking mechanism can be understood relatively easier. How-ever,themostimportantadvantageof thisnewcircuitry isthatitreduces the number of flip-flops to one. The two paths share thesame flip-flop to generate the waveform, which opens up thedoor for architecture scalability and for possibility of a betterphase synthesis circuitry.Another distinguishing feature of this modification is that thefrequencyofsignal TRIGGER, whichistheoutputofthe2 1MUX, is twice the synthesized frequency  CLK1/CLK2 . There-fore, this new architecture not only improves the speed of thesynthesizer but also provides a signal that is twice as fast as thesynthesized output. The only disadvantage is that the duty cycleof   TRIGGER  is related to the  FREQ  word and is not control-lable. In many situations, users do not care about the duty cycleof the clock signal, and for those applications, TRIGGER  can beused directly as the clock.Itshouldbementionedthatthefrequencycalculationformula[1, (3)] is still valid. To the user of this frequency synthesizer,there is no difference. C. Simulation Results For this new design, a 0.13- m 1.5-V CMOS technology isused. Fig. 8 is the collection of SPICE simulation results of frequency control word  FREQ[32:0]  versus synthesized fre-quency.  FREQ[32:27]  is the integer part;  FREQ[26:0]  is thefractional part. In Fig. 8, all the bits in  FREQ[26:0]  are set tozero;  FREQ[32:27]  is swept from “0 3 ” to “0 0 .” Thedata points are the simulated output frequencies, each point cor-responding to a  FREQ[32:27]  setting. All the data points inthe plot are exact frequencies, not time-average frequencies. Inother words, when  FREQ[26:0]  “ ”, the synthe-sized frequencies do not contain any theoretical cycle-to-cycle jitter.Thefrequenciesbetweenthesedatapointscanbeachievedwith the help of   FREQ[26:0] .Fig. 9 shows the SPICE simulation of   FREQ[32:27]  “” . The calculated output frequency should bens (705.069 MHz). The result from SPICE simulation isalso 1.418 ns. Since  CLK2  has to drive 42 flip-flops (no clock tree in this design), its rising/falling edge is slower than thatof   CLK1,  which drives only five flip-flops.  MUXOUT_UP  and  MUXOUT_LOW   are the outputs of the two 32 1 MUXs. TRIGGER  is the output of the 2 1 MUX, which drives thekey D-flip-flop. It can be seen that the frequency of   TRIGGER is twice that of   CLK1/CLK2 . Theoretically, it can be proved thatthe synthesized frequency  CLK1/CLK2  will always be one-half of VCO frequency, 143.1818 MHz in our case, when the syn-thesizer is disabled, or  EN   . This can also be seen in Fig. 9.SPICE simulation suggests that the highest  CLK1/CLK2  fre-quency is 705 MHz for this process in weak condition. The lim-iting factor is the speed of the accumulator.IV. T HE  S CALABILITY OF THE  I MPROVED  F REQUENCY S YNTHESIS  A RCHITECHURE As described in [1], the main advantage of circuitry in Fig. 3over the circuitry in Fig. 1 is that the highest synthesizableoutput frequency of Fig. 3 is twice that of Fig. 1 because of the utilization of two paths. If four paths (or even more paths)can be used for this purpose, then the highest possible outputfrequency will be even higher. In other words, the architectureof scalability has the flexibility for expansion.  A. The Speed of Adders Fig.10showstherelationshipbetweenthespeedoftheaddersand the frequency of the output signal in a two-path configura- Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:06 from IEEE Xplore. Restrictions apply.  XIU AND YOU: “FLYING-ADDER” ARCHITECTURE OF FREQUENCY AND PHASE SYNTHESIS 641 Fig. 7. The new two-path frequency synthesizer. tion. Edges A and C complete a full cycle; edge B is the fallingedge in between. Starting from edge A, adder1 is responsiblefor edge C; adder2 isfor edge B. Therefore, the highest possibleoutputfrequencyislimitedbythespeedofadder1.Adder1isthebig accumulator with fractional part, which is the 10-bit adderin PATH_A of Fig. 3. Adder2 is an integer adder that is the 5-bitadder in PATH_B. Adder2 is not the limiting factor for two rea-sons: 1) its size is much smaller than adder1 and 2) it also hasa full cycle of time to finish addition because of pipeline oper-ation.Fig.11istheideaofutilizingfourpathstoimprovetheoutputfrequency. Edges A and C complete a full cycle; A and E com-pletetwocycles.EdgesBandDarethefallingedgesinbetween.It can be seen that adder1, which is an accumulator with frac-tional part, is responsible for generating edge E, not C. It hastwo cycles of time to finish addition. Thus, the highest possibleoutput frequency will be twice that of the two-path architectureif all other conditions are the same. Adders 2–4 are all integeradders, responsible for edges B–D, respectively. They all haveplenty of time to finish their work because of the pipeline.  B. The Four-Path Architecture As mentioned in Section III-B, the scalability of this archi-tecture is made possible by the improvement of Fig. 7: all pathsshare the same D-flip-flop to generate the waveform. Fig. 12 isthe schematic of a four-path frequency synthesizer.In this configuration, there are four adders. Adder1 is an ac-cumulator with fractional part, 5-bit integer, and 27-bit frac-tion. Adders 2–4 are all 5-bit integer adders. Starting from anygiven rising/falling edge of output , adder1 is responsible forthe rising/falling edge two cycles downstream; adder3 is for therising/falling edge one cycle downstream. Adder2 and adder4are for the falling/rising edges in between. The inputs of allfour 32 1 MUXs are connected to the 32 VCO outputs. Thefour outputs of these MUXs are connected to a 4 1 MUXwhose output is used to trigger the D-flip-flop. The pipelined Authorized licensed use limited to: TEXAS INSTRUMENTS VIRTUAL LIBRARY. Downloaded on April 9, 2009 at 23:06 from IEEE Xplore. Restrictions apply.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks