DEVELOPMENT OF BUNCH CURRENT AND OSCILLATION RECORDER FOR SuperKEKB ACCELERATOR

Makoto Tobiyama# and John W. Flanagan,
KEK Accelerator Laboratory, 1-1 Oho, Tsukuba 305-0801, Japan

Abstract

A high-speed digital signal memory has been developed for the bunch current and oscillation recorder for SuperKEKB. It consists of an 8-bit ADC and a FPGA daughter card consists of Spartan-6 and DDR2 memories commercially available on a double width VME card. The block-RAM on the FPGA is used to transfer bunch current data with low latency for prompt bunch current measurements, and the large DDR2 memory is used for long-duration position recording, such as post-mortem bunch oscillation recording. The performance of the board, including data transfer rate, will be presented.

INTRODUCTION

The construction of the SuperKEKB accelerators to upgrade the KEKB B-factory has started in FY 2010 and in progress almost on schedule up to now. On SuperKEKB rings, we almost double the stored current, reduce the beam emittance down to about 1/10, squeeze betatron functions at the interaction point to achieve 40 times larger luminosity than KEKB. As the physical and dynamic aperture of the rings will be much smaller than that of KEKB, a positron damping ring is under construction to reduce the beam size of the injected beam. Table 1 shows the main parameters of the SuperKEKB accelerators (High Energy Ring: HER, Low Energy Ring: LER and Damping Ring: DR).

Since the luminosity of a collider is proportional to the bunch current product of each beam, and it is almost normal that they push the bunch current as well as available to the beam-beam limit, it is fairly important to measure the bunch currents and to keep the filling pattern as flat as possible for stable operations and effective tunings. In other rings such as the damping ring or storage rings for SR use, though the priority of measuring prompt bunch current to flat the filling pattern is not so high, it is still meaningful to record or control the bunch filling information to understand the beam behaviour such as to study the collective effects.

In KEKB, we have used the bunch current monitor and the bunch oscillation recorder based on the hardware two-tap filter for the bunch feedback systems[1]. It had an 8-bit ADC (MAX101), fast data demultiplexer which demultiplex the 8-bit data to 16 ways, and 20MB SRAMS, controlled by the extended VME interface. During injection, the injection trigger signal interrupted the bunch current recorder to stopped the recording, then initiate the data transfer process using the VME interrupt to transfer the bunch current information to a reflective memory board on the same VME bus. One of the other reflective memory board connected with the optical fibre was placed at the gun control rack of the linac (~600 m far from the bunch current monitor) to select the injection bunch from the bunch current information[2]. This bucket selection system (bunch current equalizer) has worked very well to stabilize the bunch filling pattern fairly flat even after large loss of bunches due to beam-beam kick or beam instability. This enabled us to make fine tuning of the colliding condition to get higher and stable luminosity. Actually, when we had trouble on the bunch current monitor such as jumping the timing and couldn’t use the bunch current equalizer, the operation of the ring was fairly difficult and the peak and integrated luminosity of such period was obviously lower than other time.

We have developed a new bunch current recorder based on current FPGA technology (18K10). With the aid of the great flexibility and the capacity of the FPGA, the board supports the harmonic number of h=5120 for SuperKEKB-LER and HER, h=230 for SuperKEKB Damping Ring, h=640 for PF-AR and h=312 for PF ring just by setting the DIP-switch on the board. Also, we have designed the board to be usable for the large-scale memory board which records all the bunch oscillations over long time, 0.16 s for SuperKEKB rings. The performance of the board, including data transfer rate will be shown.

Table 1: Main parameters of SuperKEKB rings (HER: electron, LER: positron, DR: positron damping ring)

<table>
<thead>
<tr>
<th></th>
<th>LER</th>
<th>HER</th>
<th>DR</th>
</tr>
</thead>
<tbody>
<tr>
<td>Energy</td>
<td>4.0</td>
<td>7.0</td>
<td>1.05 GeV</td>
</tr>
<tr>
<td>Circumference</td>
<td>3016</td>
<td>3016</td>
<td>135 m</td>
</tr>
<tr>
<td>RF frequency</td>
<td>508.886</td>
<td>508.886</td>
<td>508.886 MHz</td>
</tr>
<tr>
<td>Revolution</td>
<td>0.0994</td>
<td>0.0994</td>
<td>2.2 MHz</td>
</tr>
<tr>
<td>Beam current</td>
<td>3.8</td>
<td>2.6</td>
<td>0.08 A</td>
</tr>
<tr>
<td>H. number</td>
<td>5120</td>
<td>5120</td>
<td>230</td>
</tr>
<tr>
<td>Bunch number</td>
<td>2500</td>
<td>2500</td>
<td>4</td>
</tr>
<tr>
<td>Bunch length</td>
<td>6</td>
<td>5</td>
<td>5 mm</td>
</tr>
<tr>
<td>H emittance</td>
<td>3</td>
<td>5</td>
<td>13 nm rad</td>
</tr>
<tr>
<td>Coupling</td>
<td>0.4</td>
<td>0.3</td>
<td>10</td>
</tr>
</tbody>
</table>

OUTLINE OF THE SYSTEM

Hardware Design

The block diagram of the bunch current recorder board (18K10) is shown in Fig. 1. It has a trigger input, an revolution input, an RF clock input and two complementary DATA inputs. Both trigger and revolution
inputs have buffered outputs. All the inputs and the outputs are designed to match impedance of 50 Ω. On the RF clock input, we have inserted a digital delay (MC10E195) with a resolution of 10 ps and a range from 0 to about 10 ns, which is controlled through VME bus. By using this delay, we can adjust the ADC timing without inserting analog delays before the bunch signal inputs. The two complementary signal inputs (SIG IN+ and SIG IN-) are amplified with DC amplifiers (THS4303, DC-1.8 GHz bandwidth) with the gain of 14 dB. Since we have inserted 6 dB attenuator between the DC amplifier and the ADC input, the total gain from the input to the ADC will be 8 dB (2.5 times). MAX108 (8-bit, DC-2.2 GHz BW, 1.5 GSPS max) is used as the Analog-to-Digital converter (ADC). The ADC has a built-in 1:2 demultiplexer and reference clock outputs (RF/2) which make latter signal processing much easier than directly handling 508 MSPS signal. The reset timing of the built-in demultiplexer is supplied from the FPGA card if needed. To cool the ADC, we added a heat-sink with small DC fan. The temperature of the ADC is monitored using built-in diode of the ADC with the control CPLD.

For the signal processing, we have used a FPGA daughter-card module (Mars MX1-45-3C) commercially available from Enclustra[3]. Figure 2 shows the Mars MX1-45-3C daughter card. It has an Spartan-6 FPGA (XC6SLX45-3CSG324C), 128MB of DDR2 SDRAM on a SO-SIMM form factor card (68 x 30 mm, 200 pins). The usable I/O are totally 108 pins. We have assigned 35 pairs as LVDS signals (70 pins) and 38 pins as LVTTL signals. Timing signals are converted to LVDS level to fit this interface except for the MAX108 outputs. The LVPECL outputs of the ADC are terminated to 50 Ω just before the SO-DIMM connector, then received as LVDS signals with the FPGA. With this scheme, attenuation of the signal level due to the large stray capacitance of the SO-DIMM connector has been compensated. It also has a Quad SPI flash of 16 MB on the board, which is re-programmable through the external J-TAG connector, to boot the FPGA code automatically after power cycle. To
extract the data directly to VME data bus, an 8-bit data bus with data strobe timing has used. The 32-bit data which fit the D32 data width of the VME bus is multiplexed with the data strobe timing. As the FPGA card has a voltage regulator (1.8V and 1.2 V) needed for the FPGA and DDR2 memory on it, only 3.3 V is supplied from the mother board.

The FPGA includes all the logic for the bunch current mode corresponding to SuperKEKB (h=5120), PF-AR (h=640), PF (h=312) and Damping Ring (h=230) in both bunch current mode and large scale memory mode. The mode can be selected with the setting of the DIP switch on the mother board.

To control the function of the FPGA and to communicate with the VME bus, we use two CPLDs. The first one is used to communicate with the VME bus (ALTERA EMP7256A), and the other is used to control FPGA and ADC (ALTERA EMP7256A). Both CPLDs are also re-programmable through external JTAG connector. The photo of the 18K10 is shown in Fig. 3.

Recording Function

In the bunch current recording mode, we limit the memory size as the ring memory with the length of harmonic number. Therefore, only the block-RAM in the FPGA is used. With the start command, the board memory address is at first synchronized to using the external revolution timing, then the data are recorded until the stop signal arrives. After receiving the stop signal, the recording is stopped at the end of the ring memory address, then the interrupt is initiated. The data acquisition will not restart until receiving the start command again from the VME.

For the memory mode, the synchronization mechanism of the revolution timing to the memory address is the same as in the bunch current mode. We can select the memory size from 4k, 8k and 16k turns of revolution. In case of SuperKEKB, 4k turns corresponds to 40 ms which is roughly same as the transverse radiation damping time. The trigger position can be selected by command from the VME with (1) -100%: stop recording just after receiving the stop signal. (2) 0%: stop after recording half of the memory after the trigger. (3) 100%: stop recording after the full length of the memory filled with the data after the trigger. ADC data is buffered in the FPGA block-RAM, then transferred to DDR2 memory with burst mode. When reading the memory, the data on the DDR2 memory is at first transferred to the block-RAM in the FPGA, then transferred to the VME bus using long FIFO buffer to compensate the refresh cycle of the memory.

VME Functions

The 18K10 is designed to support the following VME functions:

- A32D32 supervisory data access (AM code: 0x0D).
- A32 supervisory block transfer (BLT, 0x0F) on ADC data region only.
- VME Interrupter (ROAK).

The VME address, IRQ number and IRQ vector is defined on the DIP and rotary switch on the mother card. It occupies 512 byte of VME address space. For the ADC data, only one 32 bit address space (D32) is prepared. The data is accessed through this windows using FIFO logic. For the BLT access, 256 byte address space is accessible for the ADC data area but the VME address itself is ignored during BLT access and is handled as if it accesses the first address of the ADC address space.

EVALUATION OF THE BOARD

EPICS Settings

We have been using EPICS system as the main control scheme on KEKB accelerators and will continually be used on SuperKEKB rings. The version of the EPICS used on this evaluation is R314.12 on VxWorks 6.2. As the VME system controller (IOC), we have used the Emerson MVME5500 (MPC7457 1 GHz processor) which support block data transfer, such as MBLT or BLT by using the Universe II chip functions. Figure 4 shows the photo of the evaluation setup. Independent EPICS device support and databases is prepared for each bunch current mode and large scale memory mode. For the interrupt handler, the device support is designed to read the board status register at first which automatically reset the interrupt status of the board, then forward link of the EPICS database initiates the reading function of the data. During reading process, we disable both the interrupt and board restart function. After finishing the reading, the device support enables the interrupt and restart the board.

Figure 4: 18K10 in the VME subrack. VME extension left to the 18K10 is used to monitor the VME bus timing using logic analyser.

Synchronization Test

It is always important that the recording data is correctly synchronized to the revolution of the accelerator, especially when we use the recorded bunch current data to select minimum bunch current bucket on the bucket selection system to equalize the filling. With the simple algorithm to equalize the bunch filling pattern, jitter of the bunch current monitor easily yields fatal effect such as injecting huge bunch current into one special bunch within fairly short time.
We have examined the jitter rate by inputting the known pulse signal (attenuated revolution signal with pulse width of about 150 ns) to the signal input and checked the change of the falling edge ADC data using EPICS sequencer. The bunch current mode for SuperKEKB (h=5120) was used on this examination. By monitoring two 18K10 board simultaneously with the stop trigger range about 300 Hz, the jitter error was confirmed to be less than $8 \times 10^{-5}$ on both board after adjusting the timing of the revolution timing input to be the good point with enough timing margin. We confirm the synchronization mechanism used on this board is working well as expected.

**ADC Test**

The Peak-to-Peak and RMS error of the system has been measured by fitting the recorded data of the clear sinusoidal signal input. The bunch current recording mode was used on this test.

The RMS error of the fitted data was 0.98 bit and the peak-to-peak error was about 3.7 bit. No significant pattern was observed in the amplitude error distribution. We can conclude the fast ADC and the pre-amplifier is working well. On the large scale memory mode, it sometimes shows a large spike, especially when the data passes 0x7F to 0x80. Since the ADC is working well on the bunch current mode, this spike should be coming from the miss-timing of FPGA or DDR2 memory. Detailed investigation is now in progress.

**Data Transfer Rate**

As the maximum injection rate of the KEKB and SuperKEKB is 50 Hz, the data transfer time from the 18K10 to the IOC should be much shorter than 20 ms to keep the calculation time of the bunch current equalizer. The bunch current monitor used in KEKB showed an interrupt response time of typically 21 µs and data transfer time of about 1.4 ms (5120 bunches) under EPICS R313 with Power PC 6750 IOC (PPC750). For SuperKEKB, the situation might be more severe if the transient beam loading effect is much stronger than KEKB; in such case we need to use I/Q detection using two set of 18K10 for one ring and perform amplitude (and phase) calculation using the IOC CPU resource. For the large scale memory mode, as we need to record a large amount of data to relatively slower remote disk using NFS or tftp protocol through terribly crowded control network, the data transfer time from 18K10 to the IOC might not be the rate-limiting step, though the speeding up of the data transfer surely make total waiting time shorter when transferring large data such as 16k turns, 80 MB data.

The measured interrupt response time of the 18K10 with MVME5500 was 9.5 µs. Figure 5 shows the typical data transfer cycle (A32, D32) with AM code of 0x0D (A32D32 supervisory data access). The typical data transfer cycle was around 1.5-1.7 µs per D32 read process.

We have implemented block transfer function using the DMA access of the Universe II chip on the MVME5500. On the block transfer mode[4], we at first allocate memory space of the CPU (not in the VME address space but the memory directly accessible from the CPU). VME master at first asserts the first VME address, AM code of 0x0F (A32 supervisory block transfer: BLT mode) then drive AS* (AS* is kept low during entire BLT cycle). 18K10 places D32 data on the VME bus then drive DTACK*. VME master transfers the D32 data to the allocated CPU memory and release DS*. 18K10 releases the D32 bus and prepare next data. While AS* is kept low, the above cycle repeats up to a pre-defined cycle limit. After the block transfer finishes, we transfer the data from the allocated CPU space to the device support memory space, which also should be on the CPU memory space, to make needed calculation and to transfer the data to EPICS database. As the memory space on the 18K10 for the ADC data is 256 bytes, which is 64x4 word, the maximum length of the BLT cycle is limited 64 cycles per access though the maximum number of BLT cycle by transferring large data such as 16k turns, 80 MB data.

The measured interrupt response time of the 18K10 with MVME5500 was 9.5 µs. Figure 5 shows the typical data transfer cycle (A32, D32) with AM code of 0x0D (A32D32 supervisory data access). The typical data transfer cycle was around 1.5-1.7 µs per D32 read process. To transfer 5120 data using MVME5500 with VxWorks 6.2, it required 2.1 ms. It is much slower than the case when PPC750 was used as IOC, 1.4 ms typically. It is well known that the VME data cycle of the MVME5500 is slower than PPC750 due to huge overhead in PCI-VME bridge and is also fairly difficult to speed up with normal methods.

We have implemented block transfer function using the DMA access of the Universe II chip on the MVME5500. On the block transfer mode[4], we at first allocate memory space of the CPU (not in the VME address space but the memory directly accessible from the CPU). VME master at first asserts the first VME address, AM code of 0x0F (A32 supervisory block transfer: BLT mode) then drive AS* (AS* is kept low during entire BLT cycle). 18K10 places D32 data on the VME bus then drive DTACK*. VME master transfers the D32 data to the allocated CPU memory and release DS*. 18K10 releases the D32 bus and prepare next data. While AS* is kept low, the above cycle repeats up to a pre-defined cycle limit. After the block transfer finishes, we transfer the data from the allocated CPU space to the device support memory space, which also should be on the CPU memory space, to make needed calculation and to transfer the data to EPICS database. As the memory space on the 18K10 for the ADC data is 256 bytes, which is 64x4 word, the maximum length of the BLT cycle is limited 64 cycles per access though the maximum number of BLT cycle by transferring large data such as 16k turns, 80 MB data.

The measured interrupt response time of the 18K10 with MVME5500 was 9.5 µs. Figure 5 shows the typical data transfer cycle (A32, D32) with AM code of 0x0D (A32D32 supervisory data access). The typical data transfer cycle was around 1.5-1.7 µs per D32 read process. To transfer 5120 data using MVME5500 with VxWorks 6.2, it required 2.1 ms. It is much slower than the case when PPC750 was used as IOC, 1.4 ms typically. It is well known that the VME data cycle of the MVME5500 is slower than PPC750 due to huge overhead in PCI-VME bridge and is also fairly difficult to speed up with normal methods.

We have implemented block transfer function using the DMA access of the Universe II chip on the MVME5500. On the block transfer mode[4], we at first allocate memory space of the CPU (not in the VME address space but the memory directly accessible from the CPU). VME master at first asserts the first VME address, AM code of 0x0F (A32 supervisory block transfer: BLT mode) then drive AS* (AS* is kept low during entire BLT cycle). 18K10 places D32 data on the VME bus then drive DTACK*. VME master transfers the D32 data to the allocated CPU memory and release DS*. 18K10 releases the D32 bus and prepare next data. While AS* is kept low, the above cycle repeats up to a pre-defined cycle limit. After the block transfer finishes, we transfer the data from the allocated CPU space to the device support memory space, which also should be on the CPU memory space, to make needed calculation and to transfer the data to EPICS database. As the memory space on the 18K10 for the ADC data is 256 bytes, which is 64x4 word, the maximum length of the BLT cycle is limited 64 cycles per access though the maximum number of BLT cycle by
VME64x specification is 256. Figure 6 shows the VME bus timing during BLT access.

Typical access cycle time was around 550 ns per D32 read. To transfer 5120 data with the BLT access took 1.1 ms, which is about half of the normal data transfer time and about 30% faster than when using a PPC750 as IOC. By extrapolating this transfer rate to large memory transfer, we can estimate the data transfer time. For example, in case of 4k turns of data transfer (40 ms = 20 MB), it will need about 4 s which is much shorter than the PPC750’s case of about 16 s (measured). The overhead due to limited BLT cycle (64 cycles) is estimated to be negligibly small in KEKB bunch current mode (about 20 μs) and the 4k turns of memory (about 20 ms).

For the use of bunch current equalizer, it is required to transfer the bunch current data from the 18K10 board to the linac gun trigger control system through reflective memory earlier than the next injection timing. We have already started the evaluation of the reflective memory (VMIVME5565) using the long optical fiber from the central control room to the linac gun control, about 500 m. The data transfer rate between the two boards through optical fiber is 178MB/s by specification. As it supports DMA data transfer on the VME bus, we hope the latency of the system to be short enough. Total performance of the system will be evaluated within this fiscal year.

SUMMARY

We have developed the bunch current and bunch oscillation recorder (18K10) using FPGA daughter card commercially available. For the bunch current mode, the performance of the board such as ADC error, timing jitter and data transfer rate was evaluated to be sufficient as expected. In the large scale memory mode, it shows some unstable behaviour both in regards to the acquired data and the timing jitter. Detailed investigation is in progress.

ACKNOWLEDGEMENT

The authors would like to express their sincere appreciations to KEKB control group for their help to develop the device support with new EPICS environment. The DMA transfer function of the Universe-II chip on VxWorks has been developed by Mr. T. Okazaki of Tonichi-Giken Co. Ltd.

REFERENCES