ICALEPCS2011 - List of Authors (Brarda, L.)

Paper

Title

Page

Fabric Management with Diskless Servers and Quattor on LHCb

691

P. Schweitzer, E. Bonaccorsi, L. Brarda, N. Neufeld
CERN, Geneva, Switzerland

Large scientific experiments nowadays very often are using large computer farms to process the events acquired from the detectors. In LHCb a small sysadmin team manages 1400 servers of the LHCb Event Filter Farm, but also a wide variety of control servers for the detector electronics and infrastructure computers : file servers, gateways, DNS, DHCP and others. This variety of servers could not be handled without a solid fabric management system. We choose the Quattor toolkit for this task. We will present our use of this toolkit, with an emphasis on how we handle our diskless nodes (Event filter farm nodes and computers embedded in the acquisition electronic cards). We will show our current tests to replace the standard (RedHat/Scientific Linux) way of handling diskless nodes to fusion filesystems and how it improves fabric management.

Slides WEMMU005 [0.119 MB]

Poster WEMMU005 [0.602 MB]

WEPMU037

Virtualization for the LHCb Experiment

1157

E. Bonaccorsi, L. Brarda, M. Chebbi, N. Neufeld
CERN, Geneva, Switzerland
F. Sborzacchi
INFN/LNF, Frascati (Roma), Italy

The LHCb Experiment, one of the four large particle physics detectors at CERN, counts in its Online System more than 2000 servers and embedded systems. As a result of ever-increasing CPU performance in modern servers, many of the applications in the controls system are excellent candidates for virtualization technologies. We see virtualization as an approach to cut down cost, optimize resource usage and manage the complexity of the IT infrastructure of LHCb. Recently we have added a Kernel Virtual Machine (KVM) cluster based on Red Hat Enterprise Virtualization for Servers (RHEV) complementary to the existing Hyper-V cluster devoted only to the virtualization of the windows guests. This paper describes the architecture of our solution based on KVM and RHEV as along with its integration with the existing Hyper-V infrastructure and the Quattor cluster management tools and in particular how we use to run controls applications on a virtualized infrastructure. We present performance results of both the KVM and Hyper-V solutions, problems encountered and a description of the management tools developed for the integration with the Online cluster and LHCb SCADA control system based on PVSS.

THCHAUST05

LHCb Online Log Analysis and Maintenance System

1228

J.C. Garnier, L. Brarda, N. Neufeld, F. Nikolaidis
CERN, Geneva, Switzerland

History has shown, many times computer logs are the only information an administrator may have for an incident, which could be caused either by a malfunction or an attack. Due to huge amount of logs that are produced from large-scale IT infrastructures, such as LHCb Online, critical information may overlooked or simply be drowned in a sea of other messages . This clearly demonstrates the need for an automatic system for long-term maintenance and real time analysis of the logs. We have constructed a low cost, fault tolerant centralized logging system which is able to do in-depth analysis and cross-correlation of every log. This system is capable of handling O(10000) different log sources and numerous formats, while trying to keep the overhead as low as possible. It provides log gathering and management, offline analysis and online analysis. We call offline analysis the procedure of analyzing old logs for critical information, while Online analysis refer to the procedure of early alerting and reacting. The system is extensible and cooperates well with other applications such as Intrusion Detection / Prevention Systems. This paper presents the LHCb Online topology, problems we had to overcome and our solutions. Special emphasis is given to log analysis and how we use it for monitoring and how we can have uninterrupted access to the logs. We provide performance plots, code modification in well known log tools and our experience from trying various storage strategies.

Slides THCHAUST05 [0.377 MB]

Paper	Title	Page
WEMMU005	Fabric Management with Diskless Servers and Quattor on LHCb	691
	P. Schweitzer, E. Bonaccorsi, L. Brarda, N. Neufeld CERN, Geneva, Switzerland
	Large scientific experiments nowadays very often are using large computer farms to process the events acquired from the detectors. In LHCb a small sysadmin team manages 1400 servers of the LHCb Event Filter Farm, but also a wide variety of control servers for the detector electronics and infrastructure computers : file servers, gateways, DNS, DHCP and others. This variety of servers could not be handled without a solid fabric management system. We choose the Quattor toolkit for this task. We will present our use of this toolkit, with an emphasis on how we handle our diskless nodes (Event filter farm nodes and computers embedded in the acquisition electronic cards). We will show our current tests to replace the standard (RedHat/Scientific Linux) way of handling diskless nodes to fusion filesystems and how it improves fabric management.
	Slides WEMMU005 [0.119 MB]
	Poster WEMMU005 [0.602 MB]

WEPMU037	Virtualization for the LHCb Experiment	1157
	E. Bonaccorsi, L. Brarda, M. Chebbi, N. Neufeld CERN, Geneva, Switzerland F. Sborzacchi INFN/LNF, Frascati (Roma), Italy
	The LHCb Experiment, one of the four large particle physics detectors at CERN, counts in its Online System more than 2000 servers and embedded systems. As a result of ever-increasing CPU performance in modern servers, many of the applications in the controls system are excellent candidates for virtualization technologies. We see virtualization as an approach to cut down cost, optimize resource usage and manage the complexity of the IT infrastructure of LHCb. Recently we have added a Kernel Virtual Machine (KVM) cluster based on Red Hat Enterprise Virtualization for Servers (RHEV) complementary to the existing Hyper-V cluster devoted only to the virtualization of the windows guests. This paper describes the architecture of our solution based on KVM and RHEV as along with its integration with the existing Hyper-V infrastructure and the Quattor cluster management tools and in particular how we use to run controls applications on a virtualized infrastructure. We present performance results of both the KVM and Hyper-V solutions, problems encountered and a description of the management tools developed for the integration with the Online cluster and LHCb SCADA control system based on PVSS.

THCHAUST05	LHCb Online Log Analysis and Maintenance System	1228
	J.C. Garnier, L. Brarda, N. Neufeld, F. Nikolaidis CERN, Geneva, Switzerland
	History has shown, many times computer logs are the only information an administrator may have for an incident, which could be caused either by a malfunction or an attack. Due to huge amount of logs that are produced from large-scale IT infrastructures, such as LHCb Online, critical information may overlooked or simply be drowned in a sea of other messages . This clearly demonstrates the need for an automatic system for long-term maintenance and real time analysis of the logs. We have constructed a low cost, fault tolerant centralized logging system which is able to do in-depth analysis and cross-correlation of every log. This system is capable of handling O(10000) different log sources and numerous formats, while trying to keep the overhead as low as possible. It provides log gathering and management, offline analysis and online analysis. We call offline analysis the procedure of analyzing old logs for critical information, while Online analysis refer to the procedure of early alerting and reacting. The system is extensible and cooperates well with other applications such as Intrusion Detection / Prevention Systems. This paper presents the LHCb Online topology, problems we had to overcome and our solutions. Special emphasis is given to log analysis and how we use it for monitoring and how we can have uninterrupted access to the logs. We provide performance plots, code modification in well known log tools and our experience from trying various storage strategies.
	Slides THCHAUST05 [0.377 MB]