Paper |
Title |
Page |
TUDPP01 |
A Monitoring System for the New ALICE O2 Farm |
835 |
|
- G. Vino, D. Elia
INFN-Bari, Bari, Italy
- V. Chibante Barroso, A. Wegrzynek
CERN, Meyrin, Switzerland
|
|
|
The ALICE Experiment has been designed to study the physics of strongly interacting matter with heavy-ion collisions at the CERN LHC. A major upgrade of the detector and computing model (O2, Offline-Online) is currently ongoing. The ALICE O2 farm will consist of almost 1000 nodes enabled to readout and process on-the-fly about 27 Tb/s of raw data. To increase the efficiency of computing farm operations a general-purpose near real-time monitoring system has been developed: it lays on features like high-performance, high-availability, modularity, and open source. The core component (Apache Kafka) ensures high throughput, data pipelines, and fault-tolerant services. Additional monitoring functionality is based on Telegraf as metric collector, Apache Spark for complex aggregation, InfluxDB as time-series database, and Grafana as visualization tool. A logging service based on Elasticsearch stack is also included. The designed system handles metrics coming from operating system, network, custom hardware, and in-house software. A prototype version is currently running at CERN and has been also successfully deployed by the ReCaS Datacenter at INFN Bari for both monitoring and logging.
|
|
|
Slides TUDPP01 [1.128 MB]
|
|
DOI • |
reference for this paper
※ https://doi.org/10.18429/JACoW-ICALEPCS2019-TUDPP01
|
|
About • |
paper received ※ 30 September 2019 paper accepted ※ 10 October 2019 issue date ※ 30 August 2020 |
|
Export • |
reference for this paper using
※ BibTeX,
※ LaTeX,
※ Text/Word,
※ RIS,
※ EndNote (xml)
|
|
|
WEDPL02 |
AliECS: A New Experiment Control System for the Alice Experiment |
956 |
|
- T. Mrnjavac, K. Alexopoulos, V. Chibante Barroso, G.C. Raduta
CERN, Geneva, Switzerland
|
|
|
The ALICE Experiment at CERN LHC (Large Hadron Collider) is undertaking during Long Shutdown 2 in 2019-2020 a major upgrade, which includes a new computing system called O² (Online-Offline). To ensure the efficient operation of the upgraded experiment along with its newly designed computing system, a reliable, high performance and automated experiment control system is being developed with the goal of managing all O² synchronous processing software, and of handling the data taking activity by interacting with the detectors, the trigger system and the LHC. The ALICE Experiment Control System (AliECS) is a distributed system based on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. Such technologies will allow the ALICE collaboration to benefit from a vibrant and innovating open source community. This communication illustrates the AliECS architecture. It provides an in-depth overview of the system’s components, features and design elements, as well as its performance. It also reports on the experience with AliECS as part of ALICE Run 3 detector commissioning setups.
|
|
|
Slides WEDPL02 [2.858 MB]
|
|
DOI • |
reference for this paper
※ https://doi.org/10.18429/JACoW-ICALEPCS2019-WEDPL02
|
|
About • |
paper received ※ 30 September 2019 paper accepted ※ 09 October 2019 issue date ※ 30 August 2020 |
|
Export • |
reference for this paper using
※ BibTeX,
※ LaTeX,
※ Text/Word,
※ RIS,
※ EndNote (xml)
|
|
|