Author: Boychenko, S.
Paper Title Page
THPHA036 Multi-Criteria Partitioning on Distributed File Systems for Efficient Accelerator Data Analysis and Performance Optimization 1436
 
  • S. Boychenko, M.A. Galilée, J.C. Garnier, M. Zerlauth
    CERN, Geneva, Switzerland
  • M. Zenha-Rela
    University of Coimbra, Coimbra, Portugal
 
  Since the introduction of the map-reduce paradigm, relational databases are being increasingly replaced by more efficient and scalable architectures, in particular in environments where a query will process TBytes or even PBytes of data in a single execution. The same tendency is observed at CERN, where data archiving systems for operational accelerator data are already working well beyond their initially provisioned capacity. Most of the modern data analysis frameworks are not optimized for heterogeneous workloads such as they arise in the dynamic environment of one of the world's largest accelerator complex. This contribution presents a Mixed Partitioning Scheme Replication (MPSR) as a solution that will outperform conventional distributed processing environment configurations for almost the entire phase-space of data analysis use cases and performance optimization challenges as they arise during the commissioning and operational phases of an accelerator. We will present results of a statistical analysis as well as the benchmarking of the implemented prototype, which allow defining the characteristics of the proposed approach and to confirm the expected performance gains.  
poster icon Poster THPHA036 [0.280 MB]  
DOI • reference for this paper ※ https://doi.org/10.18429/JACoW-ICALEPCS2017-THPHA036  
Export • reference for this paper using ※ BibTeX, ※ LaTeX, ※ Text/Word, ※ RIS, ※ EndNote (xml)