Big Data Archiving From Oracle to Hadoop

Prieto Barreiro, Ivan; Sobieszek, Marcin

doi:10.18429/JACoW-ICALEPCS2019-MOPHA117

Joint Accelerator Conferences Website

The Joint Accelerator Conferences Website (JACoW) is an international collaboration that publishes the proceedings of accelerator conferences held around the world.

BiBTeX citation export for MOPHA117: Big Data Archiving From Oracle to Hadoop

@InProceedings{prietobarreiro:icalepcs2019-mopha117,
  author       = {I. Prieto Barreiro and M. Sobieszek},
  title        = {{Big Data Archiving From Oracle to Hadoop}},
  booktitle    = {Proc. ICALEPCS'19},
  pages        = {497--501},
  paper        = {MOPHA117},
  language     = {english},
  keywords     = {database, network, monitoring, operation, SCADA},
  venue        = {New York, NY, USA},
  series       = {International Conference on Accelerator and Large Experimental Physics Control Systems},
  number       = {17},
  publisher    = {JACoW Publishing, Geneva, Switzerland},
  month        = {08},
  year         = {2020},
  issn         = {2226-0358},
  isbn         = {978-3-95450-209-7},
  doi          = {10.18429/JACoW-ICALEPCS2019-MOPHA117},
  url          = {https://jacow.org/icalepcs2019/papers/mopha117.pdf},
  note         = {https://doi.org/10.18429/JACoW-ICALEPCS2019-MOPHA117},
  abstract     = {The CERN Accelerator Logging Service (CALS) is used to persist data of around 2 million predefined signals coming from heterogeneous sources such as the electricity infrastructure, industrial controls like cryogenics and vacuum, or beam related data. This old Oracle based logging system will be phased out at the end of the LHC’s Long Shut-down 2 (LS2) and will be replaced by the Next CERN Accelerator Logging Service (NXCALS) which is based on Hadoop. As a consequence, the different data sources must be adapted to persist the data in the new logging system. This paper describes the solution implemented to archive into NXCALS the data produced by QPS (Quench Protection System) and SCADAR (Supervisory Control And Data Acquisition Relational database) systems, which generate a total of around 175, 000 values per second. To cope with such a volume of data the new service has to be extremely robust, scalable and fail-safe with guaranteed data delivery and no data loss. The paper also explains how to recover from different failure scenarios like e.g. network disruption and how to manage and monitor this highly distributed service.},
}