banner

An enhanced distributed framework for real-time performance testing of large scale IoT dataset using big data analytic tools

Vijay Hasanpuri, Chander Diwaker

Abstract


The demand for analyzing enormous IoT datasets is rising in parallel with the popularity of the IoT. There are considerable obstacles to effective processing and analysis due to the amount, velocity, and variety of IoT data. In this research, we present a distributed system that makes use of big data analytic tools like Apache Hive, Spark, and Hadoop to efficiently test the performance of massive IoT datasets. The framework addresses the lack of a comprehensive solution by providing a scalable and fault-tolerant architecture. We discuss the motivation behind real-time performance testing in the context of big data analytics for IoT datasets and highlight the need for a distributed framework. A literature review is conducted to explore existing performance testing frameworks, big data analytic tools, and approaches for performance testing big data analytics. The proposed framework’s key components, including dataset generation, test scenario specification, cluster configuration, performance metrics collection, analysis and visualization modules, and implementation details, including tool choices, are discussed. An experimental evaluation is conducted to validate the framework’s performance, and it is suggested to incorporate blockchain technology. Overall, the proposed framework offers a comprehensive solution for real-time performance testing of large-scale IoT datasets, providing organizations and researchers with a valuable tool to ensure efficient and reliable IoT data processing and analysis.


Keywords


big-data; IoT; Hadoop; MapReduce; Apache Hive; benchmark; Spark; WordCount; TeraSort

Full Text:

PDF

References


1. Hasanpuri V, Diwaker C. Comparative analysis of techniques for big-data performance testing. In: Proceedings of the 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC); 25–27 November 2022; Solan, Himachal Pradesh, India. pp. 292–297.

2. Bhardwaj A, Singh R, Deep V, Sharma P. BDT3V—A Technique for big data testing considering 3V’s. In: Proceedings of the 2018 Second International Conference on Green Computing and Internet of Things (ICGCIoT); 16–18 August 2018; Bangalore, India. pp. 222–225.

3. Jankatti S, Raghavendra BK, Raghavendra S, Meenakshi M. Performance evaluation of Map-reduce jar pig hive and spark with machine learning using big data. International Journal of Electrical and Computer Engineering 2020; 10(4): 3811. doi: 10.11591/ijece.v10i4.pp3811-3818

4. Mavridis I, Karatza H. Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark. Journal of Systems and Software 2017; 125: 133–151. doi: 10.1016/j.jss.2016.11.037

5. Chou SC, Yang CT. A high-performance data accessing and processing system for campus real-time power usage. International Journal of Informatics and Information Systems 2020; 3(3): 128–135. doi: 10.47738/ijiis.v3i3.98

6. Awasthy N, Valivarthi N. Evolution of hadoop and big data trends in smart world. In: Awasthi S, Sanyal G, Travieso-Gonzalez CM, et al. (editors). Sustainable Computing: Transforming Industry 4.0 to Society 5.0. Springer International Publishing; 2023. pp. 99–127.

7. Wu X, He Y. Optimization of the join between large tables in the spark distributed framework. Applied Sciences 2023; 13(10): 6257. doi: 10.3390/app13106257

8. Adamov A. Large-scale data modelling in hive and distributed query processing using MapReduce and tez. arXiv 2023; arXiv:2301.12454. doi: 10.48550/arXiv.2301.12454

9. Yu P, Tao Y, Zhang J, Jin Y. Design and implementation of a cloud-native platform for financial big data processing course. In: Hong W, Weng Y (editors). Computer Science and Education, Proceedings of the 17th International Conference, ICCSE 2022; 18–21 August 2022; Ningbo, China. Springer Nature Singapore; 2022. pp. 180–193.

10. Ahmed S, Abdel-Hamid Y, Hefny HA. Traffic flow prediction using big data and gis: a survey of data sources, frameworks, challenges, and opportunities. International Journal of Computing and Digital Systems 2023; 14(1): 613–632. doi: 10.12785/ijcds/140147

11. Shi J, Qiu Y, Minhas UF, et al. Clash of the titans: MapReduce vs. spark for large scale data analytics. Proceedings of the VLDB Endowment 2015; 8(13): 2110–211. doi: 10.14778/2831360.2831365

12. Veiga J, Expósito RR, Pardo XC, et al. Performance evaluation of big data frameworks for large-scale data analytics. In: Proceedings of the 2016 IEEE international conference on Big Data; 5–8 December 2016; Washington, DC, USA. pp. 424–431.

13. Thiruvathukal GK, Christensen C, Jin X, et al. A benchmarking study to evaluate apache spark on large-scale supercomputers. arXiv 2019; arXiv:1904.11812. doi: 10.48550/arXiv.1904.11812

14. Marcu OC, Costan A, Antoniu G, Pérez-Hernández MS. Spark versus flink: Understanding performance in big data analytics frameworks. In: 2016 IEEE international conference on cluster computing (CLUSTER); 12–16 September 2016; Taipei, Taiwan. pp. 433–442.

15. Bolze R, Cappello F, Caron E, et al. Grid’5000: A large scale and highly reconfigurable experimental grid testbed. The International Journal of High Performance Computing Applications 2006; 20(4): 481–494. doi: 10.1177/1094342006070078

16. Samadi Y, Zbakh M, Tadonki C. Comparative study between hadoop and spark based on hibench benchmarks. In: Proceedings of the 2016 2nd international conference on cloud computing technologies and applications (CloudTech); 24–26 May 2016; Marrakech, Morocco. pp. 267–275.

17. Samadi Y, Zbakh M, Tadonki C. Performance comparison between hadoop and spark frameworks using hibench benchmarks. Concurrency and Computation: Practice and Experience 2018; 30(12): e4367. doi: 10.1002/cpe.4367

18. Mavridis I, Karatza E. Log file analysis in cloud with apache hadoop and apache spark. In: Proceedings of the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015); 10–11 September 2015; Krakow, Poland.

19. Gopalani S, Arora R. Comparing apache spark and map reduce with performance analysis using k-means. International Journal of Computer Applications 2015; 113(1): 8–11. doi: 10.5120/19788-0531

20. Gu L, Li H. Memory or time: Performance evaluation for iterative operation on hadoop and spark. In: Proceedings of the 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing; 13–15 November 2013; Zhangjiajie, China. pp. 721–727.

21. Lin X, Wang P, Wu B. Log analysis in cloud computing environment with hadoop and spark. In: 2013 5th IEEE international conference on broadband network & multimedia technology; 17–19 November 2013; Guilin, China. pp. 273–276.

22. Petridis P, Gounaris A, Torres J. Spark parameter tuning via trial-and-error. arXiv 2016; arXiv:1607.07348. doi: 10.48550/arXiv.1607.07348

23. Pajooh HH, Rashid MA, Alam F, Demidenko S. IoT Big Data provenance scheme using blockchain on Hadoop ecosystem. Journal of Big Data 2021; 8: 114. doi: 10.1186/s40537-021-00505-y

24. Ahmed N, Barczak AL, Susnjak T, Rashid MA. A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench. Journal of Big Data 2020; 7(1): 110. doi: 10.1186/s40537-020-00388-5

25. de Oliveira BFP, Valente ASO, Victorino M, et al. Analysis of the influence of modeling, data format and processing tool on the performance of hadoop-hive based data warehouse. Journal of Information and Data Management 2022; 13(3). doi: 10.5753/jidm.2022.2516

26. Gupta P, Sharma A, Grover J. Rating based mechanism to contrast abnormal posts on movies reviews using MapReduce paradigm. In: Proceedings of the 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO); 7–9 September 2016; Noida, India. pp. 262–266.

27. Gupta D, Rani R. A study of big data evolution and research challenges. Journal of information science 2019; 45(3): 322–340. doi: 10.1177/0165551518789880

28. Kumar A, Sharma S, Goyal N, et al. Secure and energy-efficient smart building architecture with emerging technology IoT. Computer Communications 2021; 176: 207–217. doi: 10.1016/j.comcom.2021.06.003

29. Kumar CNS, Reddy KS. An experimental analysis of the applications of datamining methods on bigdata. Journal of Autonomous Intelligence 2019; 2(3): 30–38. doi: 10.32629/jai.v2i3.59




DOI: https://doi.org/10.32629/jai.v7i1.746

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Vijay Hasanpuri, Chander Diwaker

License URL: https://creativecommons.org/licenses/by-nc/4.0/