banner

Mining timed sequential patterns: The Minits-AllOcc technique

Somayah Karsoum, Clark Barrus, Le Gruenwald, Eleazar Leal

Abstract


Sequential pattern mining is one of the data mining tasks used to find the subsequences in a sequence dataset that appear together in order based on time. Sequence data can be collected from devices, such as sensors, GPS, or satellites, and ordered based on timestamps, which are the times when they are generated/collected. Mining patterns in such data can be used to support many applications, including transportation recommendation systems, transportation safety, weather forecasting, and disease symptom analysis. Numerous techniques have been proposed to address the problem of how to mine subsequences in a sequence dataset; however, current traditional algorithms ignore the temporal information between the itemset in a sequential pattern. This information is essential in many situations. Though knowing that measurement Y occurs after measurement X is valuable, it is more valuable to know the estimated time before the appearance of measurement Y, for example, to schedule maintenance at the right time to prevent railway damage. Considering temporal relationship information for sequential patterns raises new issues to be solved, such as designing a new data structure to save this information and traversing this structure efficiently to discover patterns without re-scanning the database. In this paper, we propose an algorithm called Minits-AllOcc (MINIng Timed Sequential Pattern for All-time Occurrences) to find sequential patterns and the transition time between itemsets based on all occurrences of a pattern in the database. We also propose a parallel multi-core CPU version of this algorithm, called MMinits-AllOcc (Multi-core for MINIng Timed Sequential Pattern for All-time Occurrences), to deal with Big Data. Extensive experiments on real and synthetic datasets show the advantages of this approach over the brute-force method. Also, the multi-core CPU version of the algorithm is shown to outperform the single-core version on Big Data by 2.5X.

Keywords


Data Mining; Sequential Pattern Mining; Timed Sequential Patterns; Singe-core and Multi-core Processor

Full Text:

PDF

References


1. Agrawal R, Srikant R. Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering; 1995 Mar 6–10; Taipei. New York: IEEE; 2002. p. 3–14. doi: 10.1109/ICDE.1995.380415.

2. Brock FV, Crawford KC, Elliott RL, et al. The Oklahoma Mesonet: A technical overview. Journal of Atmospheric and Oceanic Technology 1995; 12(1): 5–19. doi: 10.1175/1520-0426(1995)012<0005:tomato>2.0.co;2.

3. McPherson RA, Friedrich CA, Crawford KC, et al. Statewide monitoring of the mesoscale environment: A technical update on the Oklahoma Mesonet. Journal of Atmospheric and Oceanic Technology 2007; 24(3): 301–321. doi: 10.1175/JTECH1976.1.

4. Jay N, Herengt G, Albuisson E, Kohler F. Sequential pattern mining and classification of patient path. Medinfo 2004; 1667.

5. Pramono YWT, Suhardi. Anomaly-based intrusion detection and prevention system on website usage using rule-growth sequential pattern analysis: Case study: Statistics of Indonesia (BPS) website. In: 2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA); 2014 Aug 20–21; Bandung. New York: IEEE; 2015. p. 203–208. doi: 10.1109/ICAICTA.2014.7005941.

6. Dermy O, Brun A. Can we take advantage of time-interval pattern mining to model students activity? In: International Conference on Educational Data Mining; 2020 Jul 10–13; Online. Massachusetts: International Educational Data Mining Society; 2020. p. 69–80.

7. Rossetti MA. Analysis of weather events on US railroads [Report]. Volpe National Transportation Systems Center; 2007.

8. Simes T. A blow to train operations, can strong winds cause derailment [Report]. Australian Transport Safety Bureau; 2011.

9. Han J, Pei J, Mortazavi-Asl B, et al. FreeSpan: Frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000 Aug 20–23; Boston. New York: Association for Computing Machinery; 2000. p. 355–359.

10. Han J, Pei J, Mortazavi-Asl B, et al. PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering; 2001 Apr 2–6; Heidelberg. New York: IEEE; 2002. p. 215–224.

11. Zaki MJ. Spade: An efficient algorithm for mining frequent sequences. Machine Learning 2001; 42: 31–60. doi: 10.1023/A:1007652502315.

12. Jou C, Shyur HJ, Yen CY. Timed sequential pattern mining based on confidence in accumulated intervals. In: Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014); 2014 Aug 13–15; Redwood City. New York: IEEE; 2015. p. 771–778. doi: 10.1109/IRI.2014.7051967.

13. Kumar S, Mohbey KK. A review on big data based parallel and distributed approaches of pattern mining. Journal of King Saud University-Computer and Information Sciences 2022; 34(5): 1639–1662. doi: 10.1016/j.jksuci.2019.09.006.

14. Ghorbani M, Abessi M. A new methodology for mining frequent itemsets on temporal data. IEEE Transactions on Engineering Management 2017; 64(4): 566–573. doi: 10.1109/TEM.2017.2712606.

15. Zhao P, Jonietz D, Raubal M. Applying frequent-pattern mining and time geography to impute gaps in smartphone-based human-movement data. International Journal of Geographical Information Science 2021; 35(11): 2187–2215. doi: 10.1080/13658816.2020.1862126.

16. Aggarwal A, Toshniwal D. Frequent pattern mining on time and location aware air quality data. IEEE Access 2019; 7: 98921–98933. doi: 10.1109/ACCESS.2019.2930004.

17. Ritika, Gupta SK. HUFTI-SPM: High-utility and frequent time-interval sequential pattern mining from transactional databases. International Journal of Data Science and Analytics 2022; 13: 239–250. doi: 10.1007/s41060-021-00297-7.

18. Huang JW, Jaysawal BP, Chen KY, Wu YB. Mining frequent and top-k high utility time interval-based events with duration patterns. Knowledge and Information Systems 2019; 61: 1331–1359. doi: 10.1007/s10115-019-01333-6.

19. Mirbagheri SM, Hamilton HJ. Mining high utility patterns in interval-based event sequences. Data & Knowledge Engineering 2021; 135: 101924. doi: 10.1016/j.datak.2021.101924.

20. Giannotti F, Nanni M, Pedreschi D. Efficient mining of temporally annotated sequences. In: Frasconi P, Landwehr N, Manco G, Vreeken J (editors). Proceedings of the 2006 SIAM International Conference on Data Mining; 2006 Apr 20–22; Bethesda. Philadelphia: Society for Industrial and Applied Mathematics; 2006. p. 348–359. doi: 10.1137/1.9781611972764.31.

21. Yang H, Gruenwald L, Boulanger M. A novel real-time framework for extracting patterns from trajectory data streams. In: Proceedings of the 4th ACM SIGSPATIAL International Workshop on GeoStreaming; 2013 Nov 5; Orlando. New York: Association for Computing Machinery; 2013. p. 26–32. doi: 10.1145/2534303.2534313.

22. Titarenko SS, Titarenko VN, Aivaliotis G, Palczewski J. Fast implementation of pattern mining algorithms with time stamp uncertainties and temporal constraints. Journal of Big Data 2019; 6(1): 1–34. doi: 10.1186/s40537-019-0200-9.

23. Karsoum S, Gruenwald L, Barrus C, Leal E. Using timed sequential patterns in the transportation industry. In: 2019 IEEE International Conference on Big Data (Big Data); 2019 Dec 9–12; Los Angeles. New York: IEEE; 2020. p. 3573–3582. doi: 10.1109/BigData47090.2019.9006394.

24. Srikant R, Agrawal R. Mining sequential patterns: Generalizations and performance improvements. In: Apers P, Bouzeghoub M, Gardarin G (editors). Advances in Database Technology—EDBT’96: 5th International Conference on Extending Database Technology; 1996 Mar 25–29; Avignon. Heidelberg: Springer; 1996. p. 1–17.

25. Fournier-Viger P, Lin JCW, Kiran RU, et al. A survey of sequential pattern mining. Data Science and Pattern Recognition 2017; 1: 54–77.

26. Huynh B, Vo B, Snasel V. An efficient method for mining frequent sequential patterns using multi-core processors. Applied Intelligence 2017; 46: 703–16. doi: 10.1007/s10489-016-0859-y.

27. Li H, Zhou X, Pan C. Study on GSP algorithm based on Hadoop. In: 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication; 2015 May 14–16; Beijing. New York: IEEE; 2015. p. 321–324. doi: 10.1109/ICEIEC.2015.7284549.

28. Wei Y, Liu D, Duan L. Distributed PrefixSpan algorithm based on MapReduce. In: 2012 International Symposium on Information Technologies in Medicine and Education; 2012 Aug 3–5; Hokkaido. New York: IEEE; 2012. p. 901–904. doi: 10.1109/ITiME.2012.6291449.

29. Yu X, Li Q, Liu J. Scalable and parallel sequential pattern mining using spark. World Wide Web 2019; 22(1): 295–324. doi: 10.1007/s11280-018-0566-1.

30. Gan W, Lin JCW, Fournier-Viger P, et al. A survey of parallel sequential pattern mining. ACM Transactions on Knowledge Discovery from Data (TKDD) 2019; 13(3): 1–34. doi: 10.1145/3314107.

31. Dong G, Pei J. Sequence data mining. New York: Springer Science & Business Media; 2007.

32. Patnaik D, Butler P, Ramakrishnan N, et al. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2011 Aug 21–24; San Diego. New York: Association for Computing Machinery; 2011. p. 360–368. doi: 10.1145/2020408.2020468.

33. Chen YL, Chiang MC, Ko MT. Discovering time-interval sequential patterns in sequence databases. Expert Systems with Applications 2003; 25(3): 343–354. doi: 10.1016/S0957-4174(03)00075-7.

34. Hu YH, Huang TCK, Yang HR, Chen YL. On mining multi-time-interval sequential patterns. Data & Knowledge Engineering 2009; 68(10): 1112–1127. doi: 10.1016/j.datak.2009.05.003.

35. AlZahrani MY, Mazarbhuiya FA. Discovering constraint-based sequential patterns from medical datasets. International Journal of Recent Technology and Engineering 2019; 8(4): 724–728. doi: 10.35940/ijrte.D7011.118419.

36. Giannotti F, Nanni M, Pinelli F, Pedreschi D. Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2007 Aug 12–15; San Jose. New York: Association for Computing Machinery; 2007. p. 330–339. doi: 10.1145/1281192.1281230.

37. Mannila H, Toivonen H, Inkeri Verkamo A. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1997; 1: 259–289. doi: 10.1023/A:1009748302351.

38. Zimmermann A. Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data. Intelligent Data Analysis 2014; 18(5): 761–791. doi: 10.3233/IDA-140668.

39. Zhang D, Lee K, Lee I. Mining medical periodic patterns from spatio-temporal trajectories. In: Siuly S, Lee I, Huang Z (editors). Health Information Science: 7th International Conference; 2018 Oct 5–7; Cairns. Berlin: Springer International Publishing; 2018. p. 123–133.

40. Zhang D, Lee K, Lee I. Mining hierarchical semantic periodic patterns from GPS-collected spatio-temporal trajectories. Expert Systems with Applications 2019; 122: 85–101. doi: 10.1016/j.eswa.2018.12.047.

41. Yuan J, Zheng Y, Zhang C, et al. T-drive: Driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems; 2010 Nov 2; San Jose. New York: Association for Computing Machinery; 2010. p. 99–108. doi: 10.1145/1869790.1869807.

42. Yuan J, Zheng Y, Xie X, Sun G. T-drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge and Data Engineering 2011; 25(1): 220–232. doi: 10.1109/TKDE.2011.200.

43. Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis E, Han J, Fayyad U (editors). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; 1996 Aug 2–4; Portland. Washington, D.C.: AAAI Press; 1996. p. 226–231.

44. Karsoum S, Gruenwald L, Leal E. Impact of trajectory segmentation on discovering trajectory sequential patterns. In: 2018 IEEE International Conference on Big Data (Big Data); 2018 Dec 10–13; Seattle. New York: IEEE; 2019. p. 3432–3441. doi: 10.1109/BigData.2018.8622209.

45. What is the heat index? [Internet]. Amarillo: Weather Forecast Office; [2021 Oct 17]. Available from: https://www.weather.gov/ama/heatindex.

46. Water Science School. The 100-year flood [Internet]. Virginia: USGS; 2018 [cited 2021 Oct 17]. Available from: https://www.usgs.gov/special-topic/water-science-school/science/100-year-flood?qt-science_center_objects=0#qt-science_center_objects.

47. The Beaufort wind scale [Internet]. London: MetMatters; [cited 2021 Oct 17]. Available from: https://www.rmets.org/resource/beaufort-scale.

48. Dew point vs humidity [Internet]. La Crosse: Weather Forecast Office; [cited 2021 Oct 17]. Available from: https://www.weather.gov/arx/why_dewpoint_vs_humidity.

49. Fournier-Viger P, Lin JCW, Gomariz A, et al. The SPMF open-source data mining library version 2. In: Berendt B, Bringmann B, Fromont É, et al. (editors). 19th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2016) Part III; 2016 Sept 19–23; Riva del Garda. Berlin: Springer; 2016. p. 36–40. doi: 10.1007/978-3-319-46131-1_8.




DOI: https://doi.org/10.32629/jai.v6i1.593

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Somayah Karsoum, Clark Barrus, Le Gruenwald, Eleazar Leal

License URL: https://creativecommons.org/licenses/by-nc/4.0