**THCA4** 

# DEVELOPMENT OF A NETWORK-BASED TIMING AND TAG INFORMATION DISTRIBUTION SYSTEM FOR SYNCHROTRON RADIATION EXPERIMENTS AT SPring-8

T. Masuda<sup>†</sup>, Japan Synchrotron Radiation Research Institute (JASRI), Sayo, Hyogo, Japan

## Abstract

Time-resolved measurements in synchrotron radiation experiments require an RF clock synchronized with a storage ring accelerator and a fundamental revolution frequency (zero address) signal. For using these signals around the experimental station, long RF cables from the accelerator timing station, divider modules, and delay modules must be deployed. These installations are costly and require significant effort to adjust the timing by experts. To lower these costs and efforts, the revolution frequency, which is ~209 kHz at the SPring-8 storage ring, and tag information distribution system have been studied based on a high-precision time synchronization technology over a network. In this study, the White Rabbit technology is adopted. The proof of concept consists of a master PC, a slave PC, and two WR switches. The master PC detects the zero-address signal and distributes time stamps with tag information to the slave PC. Then the slave PC generates ~209 kHz signals synchronized with the target bunch by adding the offset time calculated by software. The output signals from the slave PC achieved a measured one-sigma jitter of less than 100 ps.

## INTRODUCTION

High-precision timing signals synchronized with a storage ring accelerator are indispensable for time-resolved measurements in synchrotron radiation experiments utilizing short pulse characteristics of the synchrotron radiation. These timing signals have been introduced into the required beamline by deploying long distance RF cables from the accelerator timing station. In this method, beamlines that can utilize the precise timing signals are quite restricted, as their expansion is difficult. Furthermore, introducing these RF signals demands high cost and effort since various types of electronics have to be deployed and adjusted to divide the frequency or to tune the phase.

In order to optimize experimental results, it is quite common to pursue a more appropriate timing distribution system with higher expandability. In other words, distribution of the timing signal should be processed as universal digital information closely coupled with a control system, instead of effecting direct distribution of the analog RF signal. For any beamline that requires precise timing signals, experimental users should be able to employ and manipulate these signals easily by means of GUI programs in a control system.

I have studied a timing distribution system based on a high-precision time synchronization technology over a standard network. Since the system delivers "digital information" related to timing signals over the network, the system can be handled and expanded without difficulty. The timing signal generated based on this information can be precisely and easily adjusted by the software. In addition, other useful information can be attached easily such as a shot number.

As a high-precision time synchronization technology, I have adopted White Rabbit (WR) [1], which is promoted as an international collaborative project. It extends the IEEE 1588 standard and achieves highly precise synchronization with sub-nanosecond accuracy by using clock synchronization at the hardware layer of Synchronous Ethernet and phase detection based on digital dual-mixer time difference. The WR technology is opened to the public at the Open Hardware Repository (OHR) [2], and any information required for development is freely available in accordance with open licenses such as the CERN Open Hardware License [3].

To conform the digital-based timing distribution system, I have built the proof of concept (PoC) system with the minimum configuration. In the case of SPring-8, the aim of the PoC system is to realize generation of a 208.8 kHz revolution frequency signal synchronized with the target bunch of the storage ring, which was accelerated by 508.58 MHz RF, and deliver tag information, such as a shot number.

## **BUILDING THE POC SYSTEM**

The PoC system has been constructed as the first step for verification of the new timing and tag information distribution system, as shown in Fig. 1. The system consisted of a master PC, a slave PC, two WR switches (WR-A and WR-B), a GPS receiver, and single mode fibers (SMFs).



Figure 1: A block diagram of the PoC system.

<sup>†</sup> masuda@spring8.or.jp

12th Int. Workshop on Emerging Technologies and Scientific Facilities ControlsPCaPAC2018, Hsinchu, Taiwan JACoW PublishingISBN: 978-3-95450-200-4doi:10.18429/JACoW-PCaPAC2018-THCA4

Both the master and slave PCs are equipped with FMC of DEL 1 ns 4cha [4] (Fig. 2), which is a fine delay FPGA Mezzanine Card (FMC), combined with a PCI Express FMC carrier card called SPEC [5] (Fig. 3) equipped with Xilinx Spartan-6 [6]. FMC DEL 1ns 4cha has one trigger input channel and four output channels with a user-programmable fine-delay function. It is operated by three to modes: pulse delay, pulse generation and time-to-digital of converter modes.



Figure 2: Picture of an FMC DEL 1ns 4cha SPEC card.



Figure 3: Picture of a SPEC board.

By installing WR-core firmware into the FPGA of the SPEC, both the master and slave PCs served as WR nodes. After a 10 MHz clock and a 1 PPS signal input from the GPS receiver to the WR-A switch, the master and slave PCs, as well as the WR switches were synchronized with the GPS via the SMF. Ubuntu [7] 12.0.4 LTS was adopted as the OS for both PCs.

The master PC received the 208.8 kHz zero-address signal, i.e., the fundamental revolution frequency signal. This signal was defined as the pre-trigger after N turns at the slave side, where N was the programmable parameter at the master side. The master recorded the absolute time  $T_I$  of the received zero-address signal and calculated the output time  $T_N$  of the trigger after N turns.

$$T_N = T_I + \frac{N}{208.8 \, kHz} \tag{1}$$

The master PC transmitted  $T_N$  and D for every D value to the slave PC, where D denotes the decimation rate of the transmission, and is assigned by the software. The slave PC received the values of both  $T_N$  and D, and output the revolution frequency signal synchronized with target bunch address K at the absolute time  $T_K$  defined as follows.

$$T_K = T_N + \frac{K}{508.58 \, MHz}$$
(2)

The slave PC interpolated the output decimation by means of the pulse-train generation function of FMC DEL lns 4cha and realized the continuous output of the 208.8 kHz revolution frequency signal. Figure 4 shows the timing chart of the consecutive processes on the master and slave PCs. Figure 5 illustrates the block diagrams of the developed FPGA logics to achieve the functions shown in Fig. 4. These functions were mostly embedded into *modified\_fine\_delay\_cores* in both the master and slave PCs, which were newly developed to fulfill the original functions as mentioned above.



Figure 5: Block diagrams of FPGA logics.



Figure 4: Timing chart of consecutive processes on the master and the slave PCs.

**THCA4** 

When development of the PoC system was initiated,  $T_N$  and D could not be delivered directly from an SFP module of the SPEC board due to the limitation of FPGA logic on the master side. Therefore, the master sent this information from the LAN port on the PC via software. Here, only the case of D > N was considered in order to simplify the logic construction.

Since utility software was prepared to change the output frequency, delay, pulse width, etc., in addition to the required parameters for the FPGA logic operation, it was relatively easy to adjust the output signal in the PoC system.

# **EVALUATION OF THE POC SYSTEM**

Evaluation measurements of the PoC system were carried out on the test bench. Figure 6 shows a photo of the measurement environment of the system. For simplicity, a synthesizer substituted the GPS receiver because the system did not need to be synchronized with the coordinated universal time (UTC) in the evaluation phase. A 208.8 kHz signal simulating the zero-address signal was generated by dividing 508.58 MHz from the synthesizer by the harmonic number 2436. A 10 MHz clock from the synthesizer was input to the WR-A switch.



Figure 6: Measurement environment of the PoC system.

## Jitter Measurement of the Output Signal

After fine-tuning of the output frequency in efforts to minimize the jitter, I measured jitter of the 208.8 kHz signals from the slave PC in the case of D = 1000 and N = 900. Figure 7 shows the measurements results. The data indicate presence of unnatural structures in the output jitter, and large jitter beyond the  $\pm 2$  ns was observed. Standard deviation  $\sigma$  of the entire jitter was about 250 ps, while it was only about 170 ps for the center structure alone.



Figure 7: Result of the jitter measurement.

# Measurement of the Decimation-Rate Dependence of Jitter

Table 1 shows results of the measured jitter with varying decimation rate D. As expected, the results show that jitter decreases with decrease in D. Therefore, D should be reduced as much as possible to lower the jitter. However, adopting smaller D results in a network traffic increase. The range D = 600 - 1000 is considered practical because the system hung up when a smaller D was specified. These values were larger than those I initially assumed.

Table 1: Decimation Rate D Dependence on the Jitters

| Parameters        | σ<br>(Whole) | σ<br>(Center) | Number of<br>Samples |  |
|-------------------|--------------|---------------|----------------------|--|
| D = 1000, N = 900 | 254 (ps)     | 171 (ps)      | 163,233              |  |
| D = 600, N = 500  | 221 (ps)     | 139 (ps)      | 68,321               |  |
| D = 300, N = 200  | 201 (ps)     | 131 (ps)      | 71,249               |  |

# INVESTIGATION AND COUNTER-MEASURE OF THE PROBLEMS

As the evaluation results mentioned in the previous section, the PoC system had following problems:

- 1. Output signal from the slave was sometimes missing.
- 2. Large jitter beyond  $\pm 2$  ns was sometimes observed.
- 3. There was the limitation of D > N in the operation parameters.

In particular, 1 and 2 would pose serious problems when the system is set into actual operation. Therefore, I investigated the cause of these two problems with the goal of solving them.

## Lack of Output Signals

As the result of detailed examinations, I found that the lack of the output signal that sometimes occurred the consequence of an execution delay of software which was running on the master PC, and was responsible for transmitting the output time  $T_N$  and the decimation rate D.

In order to solve this problem, the FPGA logic on the master side should perform data transmission as originally planned. Fortunately a sample version of the Etherbone [8] master core, which was a missing piece at the construction phase, was released by OHR at that timing, and I planned to improve the PoC system by employing it.

## *Large Jitter Beyond* $\pm 2$ *ns*

In order to investigate the cause of the large jitter beyond  $\pm 2$  ns, the jitter was measured again using a new FMC DEL 1ns 4cha card. The results of this measurement are shown in Table 2 and Fig. 8. The large off-center jitters as shown in Fig. 7 were absent.

After numerous measurements carried out by using the new FMC DEL 1ns 4cha card for an extended period of time, the big jitter reappeared. At some time, I noticed that the FMC DEL card was heating up, and the core temperature of the FPGA had surpassed 70 °C. This phenomenon was documented in articles related to the heat problem of

THCA4

12th Int. Workshop on Emerging Technologies and Scientific Facilities ControlsPCaPAC2018, Hsinchu, TaiwanJACoW PublishingISBN: 978-3-95450-200-4doi:10.18429/JACoW-PCaPAC2018-THCA4

the FMC DEL 1ns 4cha on the OHR website. Upon removal of a side panel of the PC and cooling of the FMC with the use of an electric fan from the outside, the large jitters disappeared. Subsequently, I decided to attach a small fan on the SPEC board in order to cool the FMC DEL 1ns 4cha continuously according to the design guide provided on the OHR website.

Table 2: Result of the Jitter Measurements with a New FMC DEL 1 ns 4cha

| Conditions       | Ave.<br>(ps) | Min.<br>(ps) | Max.<br>(ps)  | σ<br>(ps) | Num. of<br>Samples                   |
|------------------|--------------|--------------|---------------|-----------|--------------------------------------|
| D = 300, N = 200 | 20.4         | -522         | 595           | 127.7     | 121.2k                               |
|                  |              |              |               |           | TELEDYNE LECROY<br>Evurywhorygodiock |
|                  |              | ſ            | 2ns/Div<br>←→ |           | · · · · · · · · ·                    |

Figure 8: Hardcopy of an oscilloscope display at the jitter measurement with a new FMC DEL 1ns 4cha.

# **IMPROVEMENT OF THE POC SYSTEM**

In order to solve the problems described in the previous section, the PoC system has been improved as follows.

# Realization of Data Transmission by FPGA

In order to prevent the lack of the output signal from the slave, the Etherbone master core has been integrated into the FPGA logic for the master as illustrated in Fig. 9. The red rectangle illustrates the modified part from the previous version in Fig. 5(a). At start, I thought that this function could be realized by implementing only the Etherbone master core, but in fact an Etherbone slave core was required as well. By installing the Etherbone master core, the master was able to transmit data directly from the SFP module on the SPEC board without software assist. Figure 10 illustrates the block diagram of the improved PoC system.

# Implementation of FPGA Logic in the Case of D < N

As mentioned in the previous section, the D < N case functionality was originally not implemented in the initial PoC system for the sake of logic simplicity.



Figure 9: Block diagram of the improved FPGA logic for the master PC.



Figure 10: Block diagram of the improved PoC system.

Implementation was difficult because the slave received the data transmitted from the master before the first signal output.

In the improved PoC, the logic for this case has been successfully integrated into the system. This functionality makes it possible for the PoC system to specify smaller *D* values, and smaller jitter is expected.

# Installation of a Cooling Fan on SPEC

In order to solve the heat problem of FMC DEL 1ns 4cha, which exacerbates the jitter, a cooling fan was installed on the SPEC board according to the OHR design guidelines.

# EVALUATION OF THE IMPROVED POC SYSTEM

Evaluative measurements of the improved PoC system were performed with the same setup as shown in Fig. 6. The system has serious difficulties in stopping output from the slave PC within a few milliseconds, except in the case where target bunch address K = 0 and fine delay time F = 0 in the operation parameters of the slave.

# Jitter Measurement of the Output Signal

Evaluation of the improved PoC system is limited due to the stability problem described above. Nevertheless, jitter of the 208.8 kHz output signal from the slave was measured while varying the parameter D on the master. The operation parameters of the slave were fixed to K = 0, F = 0and fine tuning of output frequency = -31.2 ps. Table 3 shows results of the jitter measurements. Figure 11 shows the hardcopy of an oscilloscope display in the case where D = 50 and N = 1000.

A standard deviation of measured jitter below 100 ps is achieved thanks to the smaller D enabled in the improved system. This result indicates that performance of the jitter is improved by basing the output on absolute time as much as possible instead of the complementary pulse-train generation function of the FMC DEL 1 ns 4cha.

**THCA4** 

•

DOD

 12th Int. Workshop on Emerging Technologies and Scientific Facilities Controls
 PCaPAC2018, Hsinchu, Taiwan JACoW Publishing

 ISBN: 978-3-95450-200-4
 doi:10.18429/JACoW-PCaPAC2018-THCA4

Table 3: Results of Jitter Measurement with the Improved PoC System

| Conditions         | Ave.<br>(ps) | Min.<br>(ps) | Max.<br>(ps) | σ<br>(ps) | Num. of<br>Samples |
|--------------------|--------------|--------------|--------------|-----------|--------------------|
| D = 1000, N = 1000 | 4.89         | 4.24         | 5.53         | 155.2     | 108.21             |
| D = 50, N = 1000   | 5.06         | 4.55         | 5.62         | 97.33     | 458.31             |
|                    |              |              |              |           |                    |



Figure 11: Hardcopy of an oscilloscope display at the jitter measurement with the improved PoC system.

## Stability of the Improved PoC System

As mentioned in the beginning of this section, the improved PoC system has a severe stability problem. For example, in the case where K = 0 and F = 0, which is a unique condition that enables the PoC system to generate output from the slave PC, there was an occasion where the system stopped immediately, while on another occasion the system ran continuously for over 18 hours. Additionally, there was an occasion where the slave PC hung up during the test.

Furthermore, the PoC system has not become functional after a series of measurements with any parameter choice. I suspected malfunctioning of hardware, including SPEC board, FMC DEL 1ns 4cha card and the PC, and replaced all of the parts with new ones. However, the improved PoC system did not perform. The stability issue persists.

## **FUTURE PLAN**

I have recently succeeded in acquiring a new fund to study the 508.58 MHz clock distribution over the WR network. In the time-resolved measurement of the synchrotron radiation experiment, there are many opportunities to utilize the 508.58 MHz clock rather than the 208.8 kHz zeroaddress signal. Since the possibility of the practical use of RF clock distribution over WR technology is already shown in the proof of concept system referenced in [9], I aspire to apply it to the 508.58 MHz clock of the SPring-8 storage ring. Figure 12 shows an overview of the system that I plan to build.

In this study, distribution of the 208.8 kHz revolution frequency signal is also projected for integration into the future system. Since I regard that the 208.8 kHz signal distribution will be realized based on the PoC system, the PoC system will correspondingly be debugged to fix the stability problem. However, considering that debugging of the FPGA logic is quite difficult, I also will have to take into account other approaches for implementation to the system. One of these approaches would be to count the RF clock up to the harmonic number 2436 locally at the slave side and to reset the local counter simultaneously at the absolute time transmitted from the master.



Figure 12: Overview of a planned system for distribution of the 508.58 MHz clock as well as the 208.8 kHz revolution frequency signal over WR network.

#### SUMMARY

I have developed the PoC system, which distributes the 208.8 kHz revolution frequency signal synchronized with the dedicated electron bunch of the SPring-8 storage ring, with the aim of building a trigger and tag information distribution system based on a high precision time synchronization technology.

The initial PoC system has achieved an output jitter of 127 ps (1 $\sigma$ ) at the decimation rate D = 300. In the realistic parameter range of D = 600 - 1000, the resulting jitter was 140 - 170 ps (1 $\sigma$ ). Since the master has transmitted information such as absolute time by software, output signals from the slave PC were sometimes dropped.

The improved PoC system has successfully realized automatic transmission of the information by means of hardware, i.e., by integrating an Etherbone master core in the FPGA logic in the master. This improvement has made it possible to operate the system at smaller decimation rates, by which consequently the system accomplished a jitter below 100 ps (97 ps (1 $\sigma$ )). However, the improved PoC system has a severe stability problem that remains unresolved.

At present, I have commenced the study of a 508.58 MHz clock distribution over the WR network. In this new study, I plan to integrate the distribution of 208.8 kHz revolution frequency based on the PoC system as well.

## REFERENCES

- [1] https://www.ohwr.org/projects/white-rabbit
- [2] https://www.ohwr.org
- [3] https://www.ohwr.org/projects/cernohl
- [4] https://www.ohwr.org/projects/fmc-delay-1ns-8cha
- [5] https://www.ohwr.org/projects/spec
- [6] https://www.xilinx.com/products/silicon-devices /fpga/spartan-6.html
- [7] https://www.ubuntu.com
- [8] https://www.ohwr.org/projects/etherbone-core
- [9] T. Włostowski et al., "Trigger and RF Distribution Using White Rabbit", in Proc. ICALEPCS'15, Melbourne, Australia, Oct. 2015, pp. 619-623. doi:10.18429/JAC0W-ICALEPCS2015-WEC3001