# RESEARCH AND DEVELOPMENT OF THE FAST ORBIT FEEDBACK SYSTEM FOR HEPS

P. Zhu<sup>†1,2,3</sup>, D. P. Jin<sup>1,2</sup>, Z. X. Xie<sup>1,2</sup>, Z. Lei<sup>1,2</sup>, Y. L. Zhang<sup>1,2</sup>, Y. C. He<sup>1,2</sup>, D. Y. Wang<sup>1,2</sup> <sup>1</sup>Institute of High Energy Physics, Chinese Academy of Sciences, Beijing, China <sup>2</sup>Spallation Neutron Source Science Center, Dongguan, China <sup>3</sup>National Synchrotron Radiation Laboratory, University of Science and Technology of China, Hefei, China

Abstract

As a 4th-generation light source, High Energy Photon Source (HEPS) has much more stringent requirements to the beam orbit stability in both horizontal and vertical directions than the previous sources due to the much smaller beam sizes. A Fast Orbit Feedback (FOFB) system, with the closed-loop bandwidth around 500 Hz, is needed to meet the critical requirements. The latency of the FOFB system is the key to achieve these requirements.

This paper focuses on the design and implementation of the FOFB system. Based on the architecture of ATCA (Advanced Telecom Computing Architecture) standard, a total of 16 sub-stations are adopt to set up with bidirectional daisy-chained structure. The transmission between a substation with belong to BPMs or local fast correctors are connected in a "point-to-point" high-speed links, which is to minimize transmission delays and improve the system's closed-loop bandwidth. Meanwhile, it is optimized to calculate the large matrix based on the singular value decomposition (SVD) taking the digital signal processing (DSP) modules of V7 field programmable gate array (FPGA) with parallel pipeline. For performance tuning and additional flexibility, we implement an independent ethernet controller for remote operation using a full hardwired TCP / IP providing internet connectivity by using Serial Peripheral Interface (SPI), such as download a new matrix. In comparison to other standards, this architecture is advantageous for simplifying the hardware design of FOFB, improving system reliability, availability, maintainability, and scalability.

### INTRODUCTION

High Energy Photon Source (HEPS), a fourth generation of synchrotron radiation sources, is one of the major national scientific and technological infrastructures, as show in Fig. 1.



Figure 1: Layout of HEPS.

The size of modern light sources is getting smaller and smaller, and the brightness is getting higher and higher [1]. For HEPS, the brightest fourth-generation synchrotron radiation source in the world, it has higher requirements for the stability of the beam trajectory. The beam in the storage ring should have less than 10% jitter in both the horizontal and vertical directions (RMS).

To meet these stringent requirements, and also unlike most 3rd-generation light sources, such as SSRF, diamond and SLS, the fast orbit feedback system of HEPS is designed with a new architecture based on FPGA with Rocket IOs and built-in DSPs instead of real-time data transmission links and VME controllers with DSP boards. The response delay of the fast orbit feedback system should be minimized as much as possible, and the closed-loop bandwidth is designed to be 500 Hz. This new architecture is also adopted by latest designed and upgraded light sources, such as NSLS-II and APS-U [2].

### REQUIREMENT

There are many factors that affect the stability of the beam orbit, including the stability of the magnet power supply, ground vibration, temperature effects, etc. In order to suppress interference and keep the beam orbit stable, we must to adopt a high-intensity and high-speed orbit feedback system, a typical multiple-input and multiple-output (MIMO)system, to achieve long-term stable operation of the light source based on singular value decomposition (SVD) commonly, as shown in Fig. 2.

The correction algorithm is based on an SVD of the orbit response matrix:

$$\Delta \overline{X} = R \Delta \overline{\theta}$$
 and  $R = USV^T$  (1)

$$\Delta \overline{\theta} = V S^{(-1)} U^T \Delta \overline{X}. \tag{2}$$

Where  $\Delta X$  is error that the current orbit is compared to the golden orbit, U and V are matrices whose columns form an orthogonal basis in BPM(X) and corrector magnet  $\theta$ space, S is the diagonal matrix of singular values.

The proportional-integrator (PI) control algorithm operates in this diagonal space and the relevant parameters can be adjusted for each mode separately.

Figure 2: The schematic diagram of FOFB of HEPS.

Fast corrector power supply setting value S is calculated by the formula (3). Where P is the ratio between the corrector current change  $\Delta S$  and the corrector strength change  $\Delta\theta$ , SD is corrector DC current.

$$S = P \Delta \theta + S_D \tag{3}$$

All multiplication and calculations stages are done at one cycle. Therefore, the fast orbit feedback system should optimize strategies to achieve the low latency response with BMPs and fast corrector power supplies, which is essential to achieve a certain effective bandwidth range for the stable orbit.

#### **ARCHITECTURE**

The circumference of the HEPS storage ring is 1360.4 m, with 48 7BA structures, and a total of 576 BPMs and 384 fast correctors, with each 7BA structure having 12 BPMs and 8 fast correctors, which make up the input and output sections of the fast orbit feedback system. In addition, all actions are operated at a synchronous frequency of 22 kHz from Timing System.

There are two key issues to address: how to reduce data transmission error rate and how to maintain data calculation accuracy. Reliable data transmission links need to be designed between the BPM system and the FOFB substation, between different FOFB substations, and between the FOFB substation and the calibration system, to ensure that the receiving end receives the correct data and the system operates normally. In addition, the system needs to design a high bit-width data calculation accuracy maintenance scheme to reduce data calculation result deviation and ensure the normal operation of the system.

In order to reduce overall latency, a total of 16 FOFB sub-stations are placed in a kilometre scale ring. Each substation corresponds to 36 BPMs and 24 fast correction subpower supplies. This design aims to optimize the overall system by reducing data transmission delay and controlling the computational load of each sub-station at a reasonable level. As shown in Fig. 3, each FOFB sub-station receives data from 3 BI (Beam Instrumentation) sub-stations, which includes 72 BPM data (divided into X and Y components), and outputs 24 fast correction sub-power supply settings.



Figure 3: Layout of FOFB architecture.

In pursuit of the stable performance with high-speed data transmission, the hardware system of sub-station is only based on ATCA mechanical architecture, but custom the main logic backplane board and the front boards, including a logic backplane, a timing front board, a BPM front board, and a fast correction power supply front board. which is a more flexible design structure, higher data transmission bandwidth, excellent reliability, and thermal capabilities, as show in Fig. 4.

Content from this work may be used under





Figure 4: The hardware architecture of FOFB sub-station.

As show in Fig. 5, the sub-station hardware mainly in-

- 1 main logic backplane board;
- 4 front boards for local BPMs, one of which is reserved for XBPM for beam line:
- 1 front boards for global BPMs;
- 4 front boards for local fast corrector power supplies;
- 1 front boards for global Timing;

This design allows for the expansion of input and output interfaces and provides efficient cooling for the high-caused under the terms of the CC BY 4.0 licence (© 2023). Any distribution of this work pacity FPGA. The main logic backplane board and the front boards use high-speed TE signal connectors to interact mutually, and providing the front board power supply.



Figure 5: The functional diagram of hardware boards.

For example, it utilizes mainstream high-performance FPGA (Field Programmable Gate Array) chips, using integrated DSP modules and a big logic resources, to achieve high-speed transmission via more than 80 GTH, each substation of FOFB is design to obtain BPMs data using 2.38 Gbps using a serial point-to-point link, and each substation share BPMs data using 4.76 Gbps using a bidirectional daisy-chain link.

Additionally, to prevent damage to the core chips, it is design over-temperature alarm and "cut-off power immediately" functions, which contribute to improving the system's reliability and availability, as show in Fig. 6.



Figure 6: The functional diagram of protect.

Detail, the temperature protection function is controlled by a PLC. The heat sink is equipped with a thermal resistor and a thermocouple sensor. When the temperature exceeds the set threshold, the PLC controls the relay to disconnect the external 12V power supply, forcing the system to stop working. Real-time temperature monitoring of the chips is achieved through network connection and CSS operation interface.

#### HARDWARE DESIGN

Due to the overall evaluation, the FOFB sub-station is designed as the hardware based on ATCA mechanical architecture with a main logic backplane board, a global timing front board, a global BPM front board, four local BPM front boards, and four fast corrector front boards. All the boards are home designed

### Main Logic Backplane Board

Considering the stability and reliability for long-term operation and future upgrade, the core backplane of FOFB is adopt a large-capacity, high-density architecture, as shown Fig. 7.

© © Content from

Figure 7: The functional diagram of core backplane.

The design has chosen the XILINX Virtex-7 series xc7vx690tffg1927-2 chip as the core for system operations. This chip includes 3600 DSP processing units and 80 GTH high-speed data transceivers, which meet the requirements for large-scale data processing and high-speed data transmission in the system. It also includes 693120 logic units to meet the system's logic resource needs, 1470 36Kb BRAM units to meet the system's storage needs, and 20 clock management modules (CMT) and 32 global clock buffers (BUFG) to meet the system's clock requirements. The system configuration chip chosen is the XILINX Kintex-7 series xc7k325tffg676-2 chip. It includes 8 GTX transceivers, 840 DSP units, 326080 logic units, 445 36 Kb BRAM units, and 10 CMT modules. in addition, it is necessary to meet the requirements for power distribution and signal routing, ensuring good power quality and signal integrity, as shown in Fig. 8.



Figure 8: The PCB of core backplane.

There are 20 pairs of reserved LVDS differential IO channels between the Kintex-7 chip and the Virtex-7 chip. Among them, 2 pairs of signals are selected for data unidirectional transmission, 2 pairs of signals are used for communication handshake signals (corresponding to the data signals one by one), and 1 pair of signals is used for clock signal (driven by the Kintex-7 chip and received by the Virtex-7 chip) to achieve data transmission between the two chips.

#### Front Board

According to the requirements of the fast orbit feedback system, three types of front panels, based on the ATCA standard (322mm\*280mm), have been designed: timing front panel, BPM front panel, and fast correction power supply front panel. They correspond to timing signal reception, BPM data reception and distribution, and fast correction power supply setting value output, respectively. In this design, the ATCA architecture has 14 slots. Slot 1 is connected to the timing front panel, slots 3, 5, 11, and 13 are connected to the fast correction power supply front panel, and slots 6, 7, 8, 9, and 10 are connected to the BPM front panel (where 3 are used for local BPM data reception, 1 for global data distribution, and 1 as a backup). Slots 2, 4, 12, and 14 are reserved for future system upgrades. Figure 9 is shown the diagram of signal and function of the front board.

To enable external storage functionality, the system integrates a Titan2 series AXP390 FPGA development board onto the front panel with a PG2T390HFFBG900 FPGA chip. The development board is designed with 4 DDR4 SDRAM chips, each with a storage capacity of 2GB, 4 high-speed DDR3 SDRAM chips with a storage capacity of 512MB, and 4 QSPI FLASH chips with a storage capacity of 128Mb. The onboard interfaces include 1 PCIex8 interface, 4 10G SFP fiber interfaces, 1 40G QSFP+ fiber interface, 1 UART serial interface, 1 gigabit Ethernet port, 1 FMC expansion interface, 1 SD card interface, and a 40pin expansion port.







Figure 9: The diagram of signal and function of the front boards.



Figure 10: The diagram of pipeline design.

The PID controller is designed with incremental PID algorithm.

$$u(k) = u(k-1) + K_p[e(k) - e(k-1)] + K_i e(k) + K_d[e(k) - 2e(k-1) + e(k-2)]$$
(4)

Where, e(k-2), e(k-1), and e(k) are the BPM data received by the system in the last three consecutive feedback

### IMPLEMENTATION AND TEST

A test system is set up for latency evaluation, logic development and verification, as shown in Fig. 11 and Fig. 12.

- The delay of local BPM data transmission, T1
- The delay of local BPM encoding, T2
- The delay of global data transmission, T3
- The delay of local data operation, T4
- The delay of global data operation, T5
- The delay of PS setting values transmission, T6

As shown in Table 1, the total delay of FOFB is computed by the following formula:

$$T_{total} = T_1 + T_2 + 8T_3 + T_5 + T_6 = 10.4 \,\mu s$$
 (5)

# **CONCLUSION**

The design of the feedback logic and the latency evaluation of the FOFB system for HEPS is illustrated. Due to the simulation and the measurements, a 500 Hz bandwidth can be achieved for the FOFB system. Further desktop tests and optimization, field installation and commissioning will be done later.





Figure 11: The diagram of signal and function of time consumption.



Figure 12: The transmission data (between adjacent sub stations).

Table 1: Test Results of the System Delay

|         |     |     |     | <i>T4</i> [ns] |     |     |
|---------|-----|-----|-----|----------------|-----|-----|
| Latency | 821 | 705 | 950 | 874            | 773 | 620 |

periods in sequence. u(k-1) is the previous feedback period output of the PID controller. Binary tree algorithm is used in the inverse response matrix operation to minimize the number of accumulation stage.

## REFERENCES

- [1] S. Kongtawong et al., "Recent improvements in beam orbit feedback at NSLS-II," Nucl. Instrum. Methods Phys. Res., Sect. A, vol. 976, p. 164250, 2020. doi:10.1016/j.nima.2020.164250
- [2] Y. Tian et al., "NSLS-II Fast Orbit Feedback System", in Proc. ICALEPCS'15, Melbourne, Australia, Oct. 2015, pp. 34-37. doi:10.18429/JACoW-ICALEPCS2015-M0C3005

TUMBCM016

**System Modelling**