Author: Varela, F.
Paper Title Page
WEBHAUST01 LHCb Online Infrastructure Monitoring Tools 618
 
  • L.G. Cardoso, C. Gaspar, C. Haen, N. Neufeld, F. Varela
    CERN, Geneva, Switzerland
  • D. Galli
    INFN-Bologna, Bologna, Italy
 
  The On­line Sys­tem of the LHCb ex­per­i­ment at CERN is com­posed of a very large num­ber of PCs: around 1500 in a CPU farm for per­form­ing the High Level Trig­ger; around 170 for the con­trol sys­tem, run­ning the SCADA sys­tem - PVSS; and sev­er­al oth­ers for per­form­ing data mon­i­tor­ing, re­con­struc­tion, stor­age, and in­fras­truc­ture tasks, like databas­es, etc. Some PCs run Linux, some run Win­dows but all of them need to be re­mote­ly con­trolled and mon­i­tored to make sure they are cor­rect­ly run­ning and to be able, for ex­am­ple, to re­boot them when­ev­er nec­es­sary. A set of tools was de­vel­oped in order to cen­tral­ly mon­i­tor the sta­tus of all PCs and PVSS Pro­jects need­ed to run the ex­per­i­ment: a Farm Mon­i­tor­ing and Con­trol (FMC) tool, which pro­vides the lower level ac­cess to the PCs, and a Sys­tem Overview Tool (de­vel­oped with­in the Joint Con­trols Pro­ject – JCOP), which pro­vides a cen­tral­ized in­ter­face to the FMC tool and adds PVSS pro­ject mon­i­tor­ing and con­trol. The im­ple­men­ta­tion of these tools has pro­vid­ed a re­li­able and ef­fi­cient way to man­age the sys­tem, both dur­ing nor­mal op­er­a­tions but also dur­ing shut­downs, up­grades or main­te­nance op­er­a­tions. This paper will pre­sent the par­tic­u­lar im­ple­men­ta­tion of this tool in the LHCb ex­per­i­ment and the ben­e­fits of its usage in a large scale het­ero­ge­neous sys­tem.  
slides icon Slides WEBHAUST01 [3.211 MB]  
 
WEPMU033 Monitoring Control Applications at CERN 1141
 
  • F. Varela, F.B. Bernard, M. Gonzalez-Berges, H. Milcent, L.B. Petrova
    CERN, Geneva, Switzerland
 
  The In­dus­tri­al Con­trols and En­gi­neer­ing (EN-ICE) group of the En­gi­neer­ing De­part­ment at CERN has pro­duced, and is re­spon­si­ble for the op­er­a­tion of around 60 ap­pli­ca­tions, which con­trol crit­i­cal pro­cess­es in the do­mains of cryo­gen­ics, quench pro­tec­tions sys­tems, power in­ter­locks for the Large Hadron Col­lid­er and other sub-sys­tems of the ac­cel­er­a­tor com­plex. These ap­pli­ca­tions re­quire 24/7 op­er­a­tion and a quick re­ac­tion to prob­lems. For this rea­son the EN-ICE is present­ly de­vel­op­ing the mon­i­tor­ing tool to de­tect, an­tic­i­pate and in­form of pos­si­ble anoma­lies in the in­tegri­ty of the ap­pli­ca­tions. The tool builds on top of Simat­ic WinCC Open Ar­chi­tec­ture (for­mer­ly PVSS) SCADA and makes usage of the Joint COn­trols Pro­ject (JCOP) and UNI­COS Frame­works de­vel­oped at CERN. The tool pro­vides cen­tral­ized mon­i­tor­ing of the dif­fer­ent el­e­ments in­te­grat­ing the con­trols sys­tems like Win­dows and Linux servers, PLCs, ap­pli­ca­tions, etc. Al­though the pri­ma­ry aim of the tool is to as­sist the mem­bers of the EN-ICE Stand­by Ser­vice, the tool may pre­sent dif­fer­ent lev­els of de­tails of the sys­tems de­pend­ing on the user, which en­ables ex­perts to di­ag­nose and trou­bleshoot prob­lems. In this paper, the scope, func­tion­al­i­ty and ar­chi­tec­ture of the tool are pre­sent­ed and some ini­tial re­sults on its per­for­mance are sum­ma­rized.  
poster icon Poster WEPMU033 [1.719 MB]