WEBHAU —  Infrastructure Management   (12-Oct-11   10:45—12:15)
Chair: S.G. Azevedo, LLNL, Livermore, California, USA
Paper Title Page
WEBHAUST01 LHCb Online Infrastructure Monitoring Tools 618
 
  • L.G. Cardoso, C. Gaspar, C. Haen, N. Neufeld, F. Varela
    CERN, Geneva, Switzerland
  • D. Galli
    INFN-Bologna, Bologna, Italy
 
  The Online System of the LHCb experiment at CERN is composed of a very large number of PCs: around 1500 in a CPU farm for performing the High Level Trigger; around 170 for the control system, running the SCADA system - PVSS; and several others for performing data monitoring, reconstruction, storage, and infrastructure tasks, like databases, etc. Some PCs run Linux, some run Windows but all of them need to be remotely controlled and monitored to make sure they are correctly running and to be able, for example, to reboot them whenever necessary. A set of tools was developed in order to centrally monitor the status of all PCs and PVSS Projects needed to run the experiment: a Farm Monitoring and Control (FMC) tool, which provides the lower level access to the PCs, and a System Overview Tool (developed within the Joint Controls Project – JCOP), which provides a centralized interface to the FMC tool and adds PVSS project monitoring and control. The implementation of these tools has provided a reliable and efficient way to manage the system, both during normal operations but also during shutdowns, upgrades or maintenance operations. This paper will present the particular implementation of this tool in the LHCb experiment and the benefits of its usage in a large scale heterogeneous system.  
slides icon Slides WEBHAUST01 [3.211 MB]  
 
WEBHAUST02 Optimizing Infrastructure for Software Testing Using Virtualization 622
 
  • O. Khalid, B. Copy, A A. Shaikh
    CERN, Geneva, Switzerland
 
  Virtualization technology and cloud computing have a brought a paradigm shift in the way we utilize, deploy and manage computer resources. They allow fast deployment of multiple operating system as containers on physical machines which can be either discarded after use or snapshot for later re-deployment. At CERN, we have been using virtualization/cloud computing to quickly setup virtual machines for our developers with pre-configured software to enable them test/deploy a new version of a software patch for a given application. We also have been using the infrastructure to do security analysis of control systems as virtualization provides a degree of isolation where control systems such as SCADA systems could be evaluated for simulated network attacks. This paper reports both on the techniques that have been used for security analysis involving network configuration/isolation to prevent interference of other systems on the network. This paper also provides an overview of the technologies used to deploy such an infrastructure based on VMWare and OpenNebula cloud management platform.  
slides icon Slides WEBHAUST02 [2.899 MB]  
 
WEBHAUST03 Large-bandwidth Data Acquisition Network for XFEL Facility, SACLA 626
 
  • T. Sugimoto, Y. Joti, T. Ohata, R. Tanaka, M. Yamaga
    JASRI/SPring-8, Hyogo-ken, Japan
  • T. Hatsui
    RIKEN/SPring-8, Hyogo, Japan
 
  We have developed a large-bandwidth data acquisition (DAQ) network for user experiments at the SPring-8 Angstrom Compact Free Electron Laser (SACLA) facility. The network connects detectors, on-line visualization terminals and a high-speed storage of the control and DAQ system to transfer beam diagnostic data of each X-ray pulse as well as the experimental data. The development of DAQ network system (DAQ-LAN) was one of the critical elements in the system development because the data with transfer rate reaching 5 Gbps should be stored and visualized with high availability. DAQ-LAN is also used for instrument control. In order to guarantee the operation of both the high-speed data transfer and instrument control, we have implemented physical and logical network system. The DAQ-LAN currently consists of six 10-GbE capable network switches exclusively used for the data transfer, and ten 1-GbE capable network switches for instrument control and on-line visualization. High-availability was achieved by link aggregation (LAG) with typical convergence time of 500 ms, which is faster than RSTP (2 sec.). To prevent network trouble caused by broadcast, DAQ-LAN is logically separated into twelve network segments. Logical network segmentation are based on DAQ applications such as data transfer, on-line visualization, and instrument control. The DAQ-LAN will connect the control and DAQ system to the on-site high performance computing system, and to the next-generation super computers in Japan including K-computer for instant data mining during the beamtime, and post analysis.  
slides icon Slides WEBHAUST03 [5.795 MB]  
 
WEBHAUST04
A Virtualized Computing Platform For Fusion Control Systems  
 
  • T.M. Frazier, P. Adams, J.M. Fisher, A.J. Talbot
    LLNL, Livermore, California, USA
 
  Funding: This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
The National Ignition Facility (NIF) at the Lawrence Livermore National Laboratory is a stadium-sized facility that contains a 192-beam, 1.8-Megajoule, 500-Terawatt, UV laser system together with a 10-meter diameter target chamber with room for multiple experimental diagnostics. NIF is the world's largest and most energetic laser experimental system, providing a scientific center to study inertial confinement fusion (ICF) and matter at extreme energy densities and pressures. NIF's laser beams are designed to compress fusion targets to conditions required for thermonuclear burn, liberating more energy than required to initiate the fusion reactions. 2,500 servers, 400 network devices and 700 terabytes of networked attached storage provide the foundation for NIF's Integrated Computer Control System (ICCS) and Experimental Data Archive. This talk discusses the rationale & benefits for server virtualization in the context of an operational experimental facility, the requirements discovery process used by the NIF teams to establish evaluation criteria for virtualization alternatives, the processes and procedures defined to enable virtualization of servers in a timeframe that did not delay the execution of experimental campaigns and the lessons the NIF teams learned along the way. The virtualization architecture ultimately selected for ICCS is based on the Open Source Xen computing platform and 802.1Q open networking standards. The specific server and network configurations needed to ensure performance and high availability of the control system infrastructure will be discussed.
LLNL-CONF-477653
 
slides icon Slides WEBHAUST04 [2.201 MB]  
 
WEBHAUST05
Distributed System and Network Performance Monitoring  
 
  • R. Petkus
    BNL, Upton, Long Island, New York, USA
 
  A robust and reliable network and system infrastructure is vital for successful operations at the NSLS-II (National Synchrotron Light Source). A key component is a monitoring solution that can provide system information in real-time for fault-detection and problem isolation. Furthermore, this information must be archived for historical trending and post-mortem analysis. With 200+ network switches and dozens of servers comprising our control system, what tools should be selected to monitor system vitals, visualize network utilization, and parse copious Syslog files? How can we track latency on a large network and decompose traffic flows to better optimize configuration? This work will examine both open-source and proprietary tools utilized in the controls group for distributed monitoring such as Splunk, Nagios, SNMP, sFlow, Brocade Network Advisor, and the perfSONAR Performance Toolkit. We will also describe how these elements are integrated into a cohesive platform.  
slides icon Slides WEBHAUST05 [1.346 MB]  
 
WEBHAUST06 Virtualized High Performance Computing Infrastructure of Novosibirsk Scientific Center 630
 
  • A. Zaytsev, S. Belov, V.I. Kaplin, A. Sukharev
    BINP SB RAS, Novosibirsk, Russia
  • A.S. Adakin, D. Chubarov, V. Nikultsev
    ICT SB RAS, Novosibirsk, Russia
  • V. Kalyuzhny
    NSU, Novosibirsk, Russia
  • N. Kuchin, S. Lomakin
    ICM&MG SB RAS, Novosibirsk, Russia
 
  Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies, and Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG). Since each institute has specific requirements on the architecture of computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for the particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM&MG), and a Grid Computing Facility of BINP. A dedicated optical network with the initial bandwidth of 10 Gbps connecting these three facilities was built in order to make it possible to share the computing resources among the research communities, thus increasing the efficiency of operating the existing computing facilities and offering a common platform for building the computing infrastructure for future scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technology based on XEN and KVM platforms. Our contribution gives a thorough review of the present status and future development prospects for the NSC virtualized computing infrastructure focusing on its applications for handling everyday data processing tasks of HEP experiments being carried out at BINP.  
slides icon Slides WEBHAUST06 [14.369 MB]