ADVANCED LIGHT SOURCE FPGA-BASED BUNCH CLEANING*

Abstract
At the Advanced Light Source (ALS), imperfections in the injection system plus electron diffusion result in storage ring RF bucket contamination. A Virtex-4 FPGA is used to generate a Direct-Digital Synthesized (DDS) sinewave waveform at the vertical betatron tune frequency, which is synchronously gated on or off at the 1.6MHz ring orbit frequency. Any pattern on/off/invert in 328 buckets by 2ns at the ring orbit frequency can be set. An embedded Power-PC core in the FPGA provides TCP access for control and monitoring via a remote PC running LabVIEW.

INTRODUCTION
The ALS has several fill patterns (camshaft, 2-bunch) that require a filled bucket be surrounded by empty buckets. This requires actively “cleaning” by selectively exciting the empty buckets at the vertical tune [1]. The major components of such a system include transverse kickers, kicker amplifiers and a signal source.

The ALS signal source uses a Xilinx FPGA demo board, the ML403 [2] together with a custom add-on board that has a 12-bit 500MHz DAC. By clocking the FPGA and DAC at the 500MHz master oscillator rate \( f_{RF} \), any of the 328 bunches can be set to an independent value. Bunches to be cleaned are then selectively kicked at the tune frequency, while the isolated filled bunch is left un-stimulated. Additionally, due to tune shift, the kick frequency is swept in a few KHz bandwidth.

BUNCHCLEANER HARDWARE
The BunchCleaner board in Figure 1 was designed to investigate the LTC 2242-12 ADC (12-bits, 250MHz clock speed, 1.2GHz bandwidth) and the MAX5886 DAC (12-bits, 500MHz clock speed, 450Mhz bandwidth, 375ps rise/fall times) using the ML403 for FPGA interfacing and system design. Both ADC and DACs use LVDS digital I/Os. The DAC takes advantage of the double-data rate I/Os of the Virtex-4 to allow full speed updates at 500MHz while allowing the gate array internal clock to run at 250MHz.

For BunchCleaning, only the DAC is used, but the ADC was also successfully tested for transverse feedback.

ML403 Interfacing
There are two DIN-style connectors intended for user expansion that connect to FPGA pins and provide power. These connectors are not impedance controlled, and not specified for any maximum speed. Nevertheless, there are nearly 1 ground pin per 2 signal pins, and acceptable signal distortion was measured when used in 100ohm LVDS (the lower trace of Figure 2).

Figure 1: BunchCleaner mounted to ML403

Figure 2: Top trace is DAC Output, lower trace is LVDS to DAC. Pattern is 1,-1 (rest are 0’s)

500MHz is brought into the ML403 on two of its onboard SMAs, which are hooked up to an LVDS clock input of the FPGA. An RF transformer is used for a single-ended to differential conversion.

FPGA DESIGN – BUNCHCLEANER
The main goal of the bunch cleaning is to generate a gated-sinewave where the stimulus is only on for bunches to be kicked out. Due to the 250MHz bandwidth of the Amplifier Research kicker amplifiers, a simple on-off approach does not work, because they cannot turn fully on

*This work was supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231
or off fast enough to change between adjacent 2ns bunches. Instead, the sine-wave is polarity inverted, to allow a zero crossing to coincide with the bunch to be kept.

The main parts of the bunch cleaning are the direct-digital synthesis, a pattern-generator, a multiplier, and a multiplexer-to-DDR output converter (mux-ddr). All of these blocks are clocked at $f_{RF}/2 = 250$MHz. They are either self-contained VHDL cores, or netlists provided by Xilinx design tools.

**DDS Core**

The Xilinx utility program CoreGen was used to generate a direct-digital synthesis (DDS) core, with a 27-bit phase accumulator and 12-bit output. It is nominally set for an output of approximately 1.4MHz.

**Pattern Generator Core**

The pattern-generator is basically a 1K by 4 bits dual port memory. One side is written under software control, and is essentially write-only. The 4 bit value represents 2 bunch position controls, with “00” meaning zero output, “01” meaning normal output, and “10” meaning inverted output. This means only 328/2 address locations are actually used.

The other side of the memory is read out under hardware control, with an 8-bit counter generating sequential address 0 to 163 at 250MHz.

**Multiplier Core**

The multiplier is also generated from Coregen, parameterized as a 12-bit inputs with a 12-output. It serves as a software-settable gain control of the stimulus to the kicker amplifier. One input is fed by the output of the DDS core, the other by a parallel I/O port.

**Mux-DDR Core**

The output of the multiplier goes to one input of the mux-ddr. The other input port of the mux-ddr comes from the pattern generator. The two output ports D1 and D2 of the mux-ddr feed a 12-bit DDR output port on the FPGA, according to Figure 3.

<table>
<thead>
<tr>
<th>Pattern</th>
<th>D1</th>
<th>D2</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>Zero DAC code</td>
<td>Zero DAC code</td>
</tr>
<tr>
<td>0110</td>
<td>Multiplier +</td>
<td>Multiplier -</td>
</tr>
<tr>
<td>0101</td>
<td>Multiplier +</td>
<td>Multiplier +</td>
</tr>
<tr>
<td>1010</td>
<td>Multiplier -</td>
<td>Multiplier -</td>
</tr>
</tbody>
</table>

Figure 3: mux-ddr output D1 and D2.

Thus a pattern of “0110” would result in D1 being Multiplier+ and D2 being Multiplier-. With the DAC updating at $f_{RF} = 500$MHz, this allows a sign inversion of the DDS stimulus in a single 2ns period. The upper trace in Figure 2 shows the DAC output for this pattern.

The custom core connection is shown in Figure 4. All cores except for the GPIO run at 250MHz.

**FPGA DESIGN – EMBEDDED SYSTEM**

The system design for the ML403 takes advantage of Xilinx-supplied cores designed for embedded processor systems; these include the Power-PC 405 hardcore, the UART, DDR SDRAM, System-ACE, and 100-baseT Ethernet. These cores are supported for the ML403 in an “push-button” template design provided in the Xilinx Embedded Device Kit. Similar FPGA embedded processor systems have been developed for the ALS Mini-IOC[3].

Although there are many lines of Verilog or VHDL describing each of these cores, most of the complexity is hidden through the use of parameterized description files. The entire BunchCleaner design is described in a 500 line text file (MHS file). Figure 5 is the sub-section of the MHS used to describe the FPGA’s UART.

**Figure 5 – UART description in MHS file**

BEGIN opb_uartlite
PARAMETER INSTANCE = RS232_Uart
PARAMETER HW_VER = 1.00.b
PARAMETER C_BAUDRATE = 9600
PARAMETER C_DATA_BITS = 8
PARAMETER C_ODD_PARITY = 0
PARAMETER C_USE_PARITY = 0
PARAMETER C_CLK_FREQ = 100000000
PARAMETER C_BASEADDR = 0x40600000
PARAMETER C_HIGHADDR = 0x4060ffff
BUS_INTERFACE SOPB = opb
PORT OPB_Clk = sys_clk_s
PORT Interrupt = RS232_Uart_Interrupt
PORT RX = fpga_0_RS232_Uart_RX
PORT TX = fpga_0_RS232_Uart_TX
END
**BunchCleaner Custom Core Interfacing**

Since software setup of the BunchCleaner custom cores is very low performance, they do not require mapping into the high-performance bus structure connecting the Xilinx cores; it was sufficient to use dedicated internal ports from Xilinx parallel I/O cores to control the user-settings. This allows software bit-twiddling to control DDS frequency, pattern loading or multiplier gain.

**System Software**

The system software is written in C for a Xilinx-supplied RTOS called xilkernel. It provides multi-threading, timer interrupt, task prioritization, and an LightWeight IP implementation. A modification of the Xilinx-supplied web server demonstration program is used to implement a command interpreter using a simple read/write TCP buffer. The software takes about 500Kbytes. Both FPGA firmware and software are stored in a single System-ACE file on the Flash Memory Card, allowing power-up initialization.

The software in the FPGA sets the patterns, DDS frequency and amplitude. Since the excitation tune is variable over a few kilo-hertz range, an upper and lower stimulus frequency range is varied at a user-specified rate under software control. Although random frequency settings were tried, empirically we find that simply ramping the rate up and down works best.

**Control Room Software**

A PC running a LabVIEW Virtual Instrument (VI) is used to send the parameters to the FPGA software. Buffers up to 2048 bytes are transferred using TCP-Write and TCP-Read VI functions. Initial setups are done with a laptop at the transverse kicker racks. During user operations the same VI is used in the control room PC’s to turn the stimulus on/off and to adjust the stimulus frequency range as required.

**CONCLUSION**

The ALS BunchCleaner project shows that the Xilinx Virtex-4 FPGA can be used for a versatile gated signal generator. It combines the functionality of an arbitrary waveform generator with capabilities normally associated with accelerator timing systems. Using the ML403 demonstration board greatly speeds up the proto-typing process, and enhances the implementation of an embedded processor system to support networked user control of the BunchCleaning.

**REFERENCES**

