COMMISSIONING OF THE FPGA-BASED TRANSVERSE FEEDBACK SYSTEM AT THE ADVANCED PHOTON SOURCE*

N. DiMonte#, E. Norum, C.-Y. Yao, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

Abstract
The Advanced Photon Source installed a transverse feedback system to correct the instability in the electron beam during single-bunch mode. This instability manifests itself when a large amount of current is present in the beam. The only method formerly available to correct the instability was through chromaticity correction. The transverse feedback system deals with the instability without requiring changes to the ring chromaticity. Initial testing revealed issues with the input and output electronics. This paper will discuss these issues, their resolution, and many other enhancements to the FPGA-based system.

INTRODUCTION
The Advanced Photon Source (APS) experiences beam instabilities in both the transverse and longitudinal planes. The P0 feedback system, in its initial version, will correct these instabilities in a bunch pattern that has up to 24 bunches. This is accomplished by using a pick-up stripline, drive stripline, four drive amplifiers, a 3-tap comb filter for front-end signal conditioning, and an Altera Stratix II FPGA-based DSP development board coupled with a Coldfire CPU. The Coldfire CPU uses EPICS [1] with RTEMS [2] for all the remote monitoring and control. Figure 1 shows a block diagram of the feedback system. This paper discusses FPGA performance, added features and modifications, as well as the imminent upgrade.

SYSTEM DESCRIPTION
The system consists of a pick-up stripline, a front-end signal-processing unit, an Altera Stratix II DSP processor unit, drive amplifiers, and a driver stripline. This system has been described in detail [3]. Figure 1 shows a block diagram of the current system without the remote DAC hardware. At the core of the FPGA processor are the 648 32-tap FIR filters. The algorithm used for the filter is based on the least square fitting method to determine filter coefficients [4]. Our pickup and drive striplines are located in different locations of the ring, which is a bigger issue for the Y-channel since the distance between pickup and drive is seven sectors apart, about 188 m. A remote DAC linked to the main P0 feedback system via high-speed fiber is used to overcome this distance. Figure 2 shows the addition of the transceiver in the main P0 feedback chassis used to connect the remote DAC chassis.

SYSTEM IMPROVEMENTS
Based on our observation and test results we have made significant changes in both the hardware and firmware.

Figure 1: Block diagram of the feedback system.

The current hardware uses a Altera DSP development board with a Stratix II FPGA [5] as the main component. Two 12-bit ADCs and two 14-bit DACs are also part of the DSP board. However, the earlier design [3] used a monopulse receiver as part of the input circuit requiring a 14-bit ADC daughterboard that plugged onto the DSP board. This interface proved to be noisier than expected, so a significant amount of effort was devoted to improving the performance of the front-end unit. The input signals were moved to the 12-bit on-board ADCs, which have transformer-coupled inputs. Then a 3-tap comb filter was added to make the down-converted signal more flat at the peak. Replacing the mixers and amplifiers with higher-powered ones has increased the dynamic range of the system. The addition of a sum-signal compensation circuit, which was connected to one of the spare 14-bit ADCs, reduced the orbit part of the input signal, thus avoiding saturation at high beam charge.

The original clock source was derived from a 44-MHz timing distribution on the APS site, which was determined to contain excessive sidebands and jitter. We doubled the clock in the FPGA to 88 MHz through a PLL circuit, but this only added to the jitter. Our solution was to use the 352-MHz clock source directly from the RF system master synthesizer. Since the Altera DSP board could barely handle this frequency, an updated event receiver daughterboard was developed to include a clock input circuit to handle 352 MHz. This board had the option to divide down the clock to 176 MHz, which was used to feed the FPGA. Once inside the FPGA, a PLL once again divided the clock to 88 MHz. This simple change reduced...
our clock jitter, which provided a cleaner signal. Due to the bunch patterns being selected, the 88-MHz clock setup was not allowing us to see all the bunches we wanted, so the clock was changed to 117.333 MHz. This allowed us to sample 432 buckets out of the 1296 instead of the 324 buckets we could see when running at 88 MHz. Both setups work fine; the rate pattern determines the bunch pattern that can be stabilized. The 117-MHz setup covers all the existing bunch patterns except for the 324 bucket fill. In addition to the clock change, the PLL circuit was changed to allow on-the-fly programming of the sample clock phase. At 117.333 MHz, 48 steps are provided to shift the clock at 178 ps per step, allowing us to center the sample clock on the desired bucket and thus reduces the sample noise due to time jitter.

**Figure 2: FPGA block diagram of the P0 feedback.**

It became obvious that a single gain setting for all 432 buckets was not efficient. Since each bucket might be slightly different with non-uniform bunch currents, each bucket has its own programmable gain that can be easily adjusted during operations.

The delay circuit was also updated for each plane to provide better resolution, a half clock sample per bucket, which eliminated the need for external devices to delay the output signal.

**SOFTWARE IMPROVEMENTS**

To facilitate system diagnostics and tuning we also added some EPICS process variables for both ADC inputs and DAC outputs in addition to some firmware features.

For each sample, we define a 3-bit control pattern as shown in Table 1. These control functions are programmable for purposes such as timing alignment, AC coupling effect compensation of DAC output, and bandwidth matching of the amplifiers. Option 000 or 111 will set the DAC output to zero volts. Option 001 simply passes the data though to the DAC, while option 010 negates the data before going to the DAC. Options 011 and 100 are separate registers holding some preprogrammed value. Option 101 toggles data from the previous bucket for the current bucket, and option 110 stretches the data from the previous bucket into the current bucket. These last four options ignore the processed data from the FIR filters.

High-level software has been developed to facilitate machine studies, e.g., a graphic interface for loading and saving DAC output control patterns; generating FIR filter from lattice files; and saving, reviewing and loading FIR filters. This software is based on the Tcl/Tk interpreter using the SDDS Toolkit.

<table>
<thead>
<tr>
<th>Table 1: DAC Output Control Bit Patterns</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
</tr>
<tr>
<td>001</td>
</tr>
<tr>
<td>010</td>
</tr>
<tr>
<td>011</td>
</tr>
<tr>
<td>100</td>
</tr>
<tr>
<td>101</td>
</tr>
<tr>
<td>110</td>
</tr>
<tr>
<td>111</td>
</tr>
</tbody>
</table>

**HARDWARE UPGRADES**

Currently the P0 feedback system uses an Altera DSP development board [5], but with our limitation in the horizontal drive we where unable to put the horizontal drive line in Sector 2 at the Advance Photon Source. Another section in the ring that can accommodate the necessary horizontal drive stripline is located in Sector 35, about 180 meters away.

To deal with the distance we had two options available to us: analog and digital. The analog option is feasible since we already have unused cables in the tunnel that meet our needs. Originally the APS had a loss monitor system that used a large, well-insulated 7/8-in cable throughout the whole storage ring, but this system was decommissioned, which allowed us to use this cable. We will use about 200 meters of this cable for our application until our digital fiber link is installed. Our preliminary tests have shown that we only have about a 2-dB loss from Sector 2 to Sector 35, but we plan to use amplifiers at both ends that have variable attenuators to insure the best transmission of the horizontal drive signal.

However, we are still concerned that the horizontal drive signal may not meet our specifications with the analog option. Currently we are developing a 2.34-Gb/s fiber-optic link that will allow us to have a remote DAC in Sector 35. To do this we purchased two Altera PCI Express Development kits, Stratix II GX edition [6], and two Altera Data Conversion HSMC boards [7], and developed a new Coldfire daughterboard with HSMC connectors to interface to the new Altera PCI kit. The Data Conversion board upgrades the P0 feedback system to a 14-bit ADC compared to the 12-bit ADCs used in the former Altera DSP board. Two units will need to be created with this hardware. The first will be the main unit—the P0 feedback box, and the second unit will be the remote DAC box used to drive the Y-plane stripline. The
main reasons for choosing the Altera PCI Express Development kits were the transceiver logic built into the FPGA and the two SFP (small form-factor pluggable) cages on this board. An optical SFP transceiver from Finisar, model FTLF1324P2BTL [8], will use single-mode optical fiber to connect the two units at both ends.

Converting the FPGA logic to the PCI board was relatively simple except for connecting the clock signal to the Data Conversion board. The new kit is performing just as well as the older kit in lab simulations. However, the Altera Stratix II GX FPGA does not allow an internal FPGA clock to drive the transceiver logic in the FPGA; it requires that an external clock drive the transceiver logic. With this development board the Stratix II GX chip did not provide the flexibility to route the reference clock to the desired clock pins. While the compiler flagged multiple warnings for jitter violations due to our not using the correct transceiver clock pins, we noticed no performance losses once we had a working system. If this development board used the Stratix IV GX chip, then the clock could have been routed in the FPGA, and the issues we had would have been avoided.

Incorporating the transceiver logic had to be done in the main P0 feedback box acting as the transmitter while the remote DAC box would incorporate a receiver. Both transceivers would incorporate 16-bit transceiver logic to accommodate the 14-bit DAC in the remote DAC box. The configuration was simple: tap off the DAC data that would have gone to the local Y-plane stripline and send it to the remote DAC via the fiber optic connected to the transceivers.

We configured the transceivers to basic mode with 8b/10b encoding, single lane, 16-bit data width, using a reference clock of 117.333 MHz, which gave us an effective data rate of 2346.667 Mbps (20-bits at 117.333 MHz).

To synchronize the transceiver link, we developed a method to insert a sync byte (K28.5 comma character) whenever we detected two DAC words that were the same. We would take one of these words, replace the lower byte with K28.5 and the upper byte with all zeros, and then set the control bit. This sync method revealed a problem with the Stratix II GX chip that Altera didn’t know. It turns out that we could not send out another sync unless three data words were sent first, otherwise the receiver would get into an undefined state and would not output any data it received. Another issue we had with the receiver logic occurred if the transmission was interrupted. In this case, at times the receiver would not reorder the bytes correctly, which meant that the sync byte was not aligned correctly, and again we would see junk at the output. Our solution to this was to replace the transceiver with an 8-bit version instead of the 16-bit version. The 8-bit version has been solid with no errors detected. An added benefit to this approach allowed us to half-step the delay at the receiver end since we had to double our clock to receive the data. As mention before, zeros were passed in the upper byte when a sync byte was sent. We now send the delay in the upper byte, which takes two sync cycles to pass the full 14-bit delay value to the remote DAC.

## Future Plans

The P0 feedback system has great potential for being a bunch cleaner for the APS; however, since it can only monitor every other third bunch, it appears that this system cannot be of help. We are now discussing a P0 feedback system that will monitor and control all 1296 buckets, but this will require us to design and build our own board based on our experience with the Altera Development boards. During the ICALEPCS 2009 conference, tests will be performed on the remote DAC configuration, which will determine our next system upgrade.

## Conclusion

The P0 feedback system has proven to be very flexible in that it was easy to adapt when the need arose, e.g., increasing the number of buckets and adding a remote DAC function. The preliminary test data have shown some promise that this system can stabilize 432 bunches in both the horizontal and vertical planes at the APS and possibly function as a bunch cleaner.

## Acknowledgments

The author would like to acknowledge Altera for their on-site support and training in the use of their transceiver logic circuits.

## References