Author: Galli, D.
Paper Title Page
WEBHAUST01 LHCb Online Infrastructure Monitoring Tools 618
 
  • L.G. Cardoso, C. Gaspar, C. Haen, N. Neufeld, F. Varela
    CERN, Geneva, Switzerland
  • D. Galli
    INFN-Bologna, Bologna, Italy
 
  The Online System of the LHCb experiment at CERN is composed of a very large number of PCs: around 1500 in a CPU farm for performing the High Level Trigger; around 170 for the control system, running the SCADA system - PVSS; and several others for performing data monitoring, reconstruction, storage, and infrastructure tasks, like databases, etc. Some PCs run Linux, some run Windows but all of them need to be remotely controlled and monitored to make sure they are correctly running and to be able, for example, to reboot them whenever necessary. A set of tools was developed in order to centrally monitor the status of all PCs and PVSS Projects needed to run the experiment: a Farm Monitoring and Control (FMC) tool, which provides the lower level access to the PCs, and a System Overview Tool (developed within the Joint Controls Project – JCOP), which provides a centralized interface to the FMC tool and adds PVSS project monitoring and control. The implementation of these tools has provided a reliable and efficient way to manage the system, both during normal operations but also during shutdowns, upgrades or maintenance operations. This paper will present the particular implementation of this tool in the LHCb experiment and the benefits of its usage in a large scale heterogeneous system.  
slides icon Slides WEBHAUST01 [3.211 MB]