MapReduce supports the processing of large data sets in parallel. It has been shown that MapReduce is an example for the use of the bulk synchronous parallel (BSP) bridging model, a model for parallel computation on a fixed set of processors comprising alternating computation and communication phases. In this article we extend the normal execution of MapReduce from processing large finite data sets to processing stream queries with input data stream assumed to continue indefinitely. We classify stream queries into three classes, memoryless, semi-memoryless and memorable, and provide the model for each class using MapReduce based on BSP. In addition, as some stream queries require large amounts of computing sources, the BSP computation model is extended to a model with unbounded many agents, but preserving the barrier synchronization. A behavioral theory is developed for this model extending the behavioral theory of the BSP model. This comprises an axiomatization, the definition of Infinite-Agent BSP abstract state machines (Inf-Ag-BSP-ASM) and the proof that such ASMs capture the unbounded synchronized computations. Finally, we show how MapReduce processing can be further improved on grounds of the unbounded extension.
If you cannot see the document below, the PDF document is most likely not freely accessible. In this case, please try to access the document via this link.
% BibTex
@inproceedings{LiHDGS21,
author = {Zilinghan Li and
Shilan He and
Yiqing Du and
Sen{\'{e}}n Gonz{\'{a}}lez and
Klaus{-}Dieter Schewe},
editor = {Alexander Raschke and
Dominique M{\'{e}}ry},
title = {Unbounded Barrier-Synchronized Concurrent ASMs for Effective MapReduce
Processing on Streams},
booktitle = {Rigorous State-Based Methods - 8th International Conference, {ABZ}
2021, Ulm, Germany, June 9-11, 2021, Proceedings},
series = {Lecture Notes in Computer Science},
volume = {12709},
pages = {3--16},
publisher = {Springer},
year = {2021},
url = {https://doi.org/10.1007/978-3-030-77543-8\_1},
doi = {10.1007/978-3-030-77543-8\_1},
timestamp = {Wed, 09 Jun 2021 12:14:31 +0200},
biburl = {https://dblp.org/rec/conf/asm/LiHDGS21.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}