The Joint Accelerator Conferences Website (JACoW) is an international collaboration that publishes the proceedings of accelerator conferences held around the world.
TY - CONF AU - Vino, G. AU - Chibante Barroso, V. AU - Elia, D. AU - Wegrzynek, A. ED - White, Karen S. ED - Brown, Kevin A. ED - Dyer, Philip S. ED - Schaa, Volker RW TI - A Monitoring System for the New ALICE O2 Farm J2 - Proc. of ICALEPCS2019, New York, NY, USA, 05-11 October 2019 CY - New York, NY, USA T2 - International Conference on Accelerator and Large Experimental Physics Control Systems T3 - 17 LA - english AB - The ALICE Experiment has been designed to study the physics of strongly interacting matter with heavy-ion collisions at the CERN LHC. A major upgrade of the detector and computing model (O2, Offline-Online) is currently ongoing. The ALICE O2 farm will consist of almost 1000 nodes enabled to readout and process on-the-fly about 27 Tb/s of raw data. To increase the efficiency of computing farm operations a general-purpose near real-time monitoring system has been developed: it lays on features like high-performance, high-availability, modularity, and open source. The core component (Apache Kafka) ensures high throughput, data pipelines, and fault-tolerant services. Additional monitoring functionality is based on Telegraf as metric collector, Apache Spark for complex aggregation, InfluxDB as time-series database, and Grafana as visualization tool. A logging service based on Elasticsearch stack is also included. The designed system handles metrics coming from operating system, network, custom hardware, and in-house software. A prototype version is currently running at CERN and has been also successfully deployed by the ReCaS Datacenter at INFN Bari for both monitoring and logging. PB - JACoW Publishing CP - Geneva, Switzerland SP - 835 EP - 840 KW - monitoring KW - detector KW - network KW - database KW - controls DA - 2020/08 PY - 2020 SN - 2226-0358 SN - 978-3-95450-209-7 DO - doi:10.18429/JACoW-ICALEPCS2019-TUDPP01 UR - https://jacow.org/icalepcs2019/papers/tudpp01.pdf ER -