banner

One shot alpha numeric weight based clustering algorithm with user threshold

Durga Venkata Prasad Maradana, Srikanth Thota

Abstract


Information Retrieval from Files and data bases like data sources is a major issue now days. After Information Retrieval clustering is also a one of the important things. In the market so many clustering algorithms were available. But choosing of the clustering algorithm depends on the user requirements. This paper addresses the study of agglomerative approach for different constraints or metrics or user preferences like Number of levels in the clustering process, number of clusters that should be generated at each level and range of the attributes at each level for doing the clustering for the given data set. In brief overview we discuss the agglomerative approach for clustering algorithm with their user preferences.


Keywords


clustering; k-mean; hierarchical agglomerative clustering; weight of object positional value for a term/field/attribute; clustering ranges

Full Text:

PDF

References


1. Bindra K, Mishra A. A detailed study of clustering algorithms. In: Proceedings of the 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO 2017); 20–22 September 2017; Noida, India. pp. 371–376.

2. Liu F, Wei Y, Ren M, et al. An agglomerative hierarchical clustering algorithm based on global distance measurement. In: Proceedings of the 7th International Conference on Information Technology in Medicine and Education (ITME 2015); 13–15 November 2015; Huangshan, China. pp. 363–367.

3. Lahane SV, Kharat MU, Halgaonkar PS. Divisive approach of clustering for educational data. In: Proceedings of the 2012 Fifth International Conference on Emerging Trends in Engineering and Technology; 5–7 November 2012; Himeji, Japan. pp. 191–195.

4. Makrehchi M. Hierarchical agglomerative clustering using common neighbours similarity. In: Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2016); 13–16 October 2016; Omaha, NE, USA. pp. 546–551.

5. Pranata I, Skinner G. Segmenting and targeting customers through clusters selection & analysis. In: Proceedings of the 2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS 2015); 10–11 October 2015; Depok, Indonesia. pp. 303–308.

6. Ahmed M, Mahmood AN. A novel approach for outlier detection and clustering improvement. In: Proceedings of the IEEE 8th Conference on Industrial Electronics and Applications (ICIEA 2013); 19–21 June 2013; Melbourne, VIC, Australia. pp. 577–582.

7. Madaan V, Kumar R. An improved approach for web document clustering. In: Proceedings of the 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN 2018); 12–13 October 2018; Greater Noida, India. pp. 435–440.

8. Shen H, Duan Z. Application research of clustering algorithm based on k-means in data mining. In: Proceedings of the 2020 International Conference on Computer Information and Big Data Applications (CIBDA 2020); 17–19 April 2020; Guiyang, China. pp. 66–69.

9. Chen Y, Kim J, Mahmassani HS. Pattern recognition using clustering algorithm for scenario definition in traffic simulation-based decision support systems. In: Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems (ITSC 2014); 08–11 October 2014; Qingdao, China. pp. 798–803.

10. Coleman GB, Andrews HC. Image segmentation by clustering. IEEE 1979; 67(5): 773–785. doi: 10.1109/PROC.1979.11327

11. Sharma M, Toshniwal D. Pre-Clustering Algorithm for anomaly detection and clustering that uses variable size buckets. In: Proceedings of the 1st International Conference on Recent Advances in Information Technology (RAIT 2012); 15–17 March 2012; Dhanbad, India. pp. 515–519.

12. Zhan Y, Pan H, Han Q, et al. Medical image clustering algorithm based on graph entropy. In: Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2015); 15–17 August 2015; Zhangjiajie. pp. 1151–1157.

13. Suneetha M, Fatima SS, Mohd S, Pervez Z. Clustering of web search results using Suffix tree algorithm and avoidance of repetition of same images in search results using L-Point Comparison algorithm. In: Proceedings of the 2011 International Conference on Emerging Trends in Electrical and Computer Technology; 23–24 March 2011; Nagercoil, India. pp. 1041–1046.

14. Prabhu J, Sudharshan M, Saravanan M, Prasad G. Augmenting rapid clustering method for social network analysis. In: Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining; 09–11 August 2010; Odense, Denmark. pp. 407–408.

15. Panapakidis IP, Alexiadis MC, Papagiannis GK. Three-stage clustering procedure for deriving the typical load curves of the electricity consumers. In: Proceedings of the 2013 IEEE Grenoble Conference; 16–20 June 2013; Grenoble, France. pp. 1–6.

16. Iiritano S, Ruffolo M. Managing the knowledge contained in electronic documents: A clustering method for text mining. In: Proceedings of the 12th International Workshop on Database and Expert Systems Applications; 03–07 September 2001; Munich, Germany. pp. 454–458.

17. Dharmarajan A, Velmurugan T. Applications of partition based clustering algorithms: A survey. In: Proceedings of the 2013 IEEE International Conference on Computational Intelligence and Computing Research; 26–28 December 2013; Enathi, India. pp. 1–5.

18. Nagpal A, Jatain A, Gaur D. Review based on data clustering algorithms. In: Proceedings of the 2013 IEEE Conference on Information & Communication Technologies; 11–12 April 2013; Thuckalay, India. pp. 298–303.

19. Singh P, Meshram PA. Survey of density based clustering algorithms and its variants. In: Proceedings of the 2017 International Conference on Inventive Computing and Informatics (ICICI 2017); 23–24 November 2017; Coimbatore, India. pp. 920–926.

20. Amini A, Wah TY, Saybani MR, Sahaf Yazdi SRAS. A study of density-grid based clustering algorithms on data streams. In: Proceedings of the Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2011); 26–28 July 2011; Shanghai, China. pp. 1652–1656.

21. Xu R, Wunsch D. A comprehensive survey of clustering algorithms. IEEE Transactions on Neural Networks 2005; 16(3): 645–678. doi: 10.1109/tnn.2005.845141

22. Swain S, Das Mohapatra MK. A review paper on soft computing based clustering algorithm. In: Proceedings of the 7th International Conference on Recent Development in Engineering Science; 3 June 2017; Chandigarh, India. pp. 204–210.

23. Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2004; 1(1): 24–25.

24. Mingqiang Z, Hui H, Qian W. A graph-based clustering algorithm for anomaly intrusion detection. In: Proceedings of 7th International Conference on Computer Science & Education (ICCSE 2012); 14–17 July 2012; Melbourne, VIC, Australia. pp. 1311–1314.

25. Zhang X, Wu Y, Qiu Y. Constraint based dimension correlation and distance divergence for clustering high-dimensional data. In: Proceedings of the 2010 IEEE International Conference on Data Mining; 13–17 December 2010; Sydney, NSW, Australia. pp. 629–638.

26. Ramadan H, Tairi H. Collaborative Xmeans-EM clustering for automatic detection and segmentation of moving objects in video. In: Proceedings of the 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA 2015); 17–20 November 2015; Marrakech, Morocco. pp. 1–2.

27. Du H, Li Y. An improved BIRCH clustering algorithm and application in thermal power. In: Proceedings of the 2010 International Conference on Web Information Systems and Mining; 23–24 October 2010; Sanya, China. pp. 53–56.

28. Lathiya P, Rani R. Improved CURE Clustering for Big Data using Hadoop and Mapreduce. In: Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT 2016); 26–27 August 2016; Coimbatore, India. pp. 1–5.

29. Guha S, Rastogi R, Shim K. ROCK: A robust clustering algorithm for categorical attributes. Information Systems 2000; 25(5): 345–366. doi: 10.1016/S0306-4379(00)00022-3

30. Karypis G, Han EH, Kumar V. Chameleon: Hierarchical clustering using dynamic modeling. Computer 1999; 32(8): 68–75. doi: 10.1109/2.781637

31. Xue W, Hu Z, Wang N, Zhang L. Unsupervised learning based acoustic NLOS identification for smart phone indoor positioning. In: Proceedings of the 2020 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC 2020); 21–24 August 2020; Macau, China. pp. 1–6.

32. Bindra K, Mishra A. A detailed study of clustering algorithms. In: Proceedings of the 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO 2017); 20–22 September 2017; Noida, India. pp. 371–376.

33. Mahmood AN, Leckie C, Udaya P. An efficient clustering scheme to exploit hierarchical data in network traffic analysis. IEEE Transactions on Knowledge and Data Engineering 2008; 20(6): 752–767. doi: 10.1109/TKDE.2007.190725

34. Deng D. DBSCAN clustering algorithm based on density. In: Proceedings of the 7th International Forum on Electrical Engineering and Automation (IFEEA 2020); 25–27 September 2020; Hefei, China. pp. 949–953.

35. Babichev S, Durnyak B, Zhydetskyy V, et al. Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: Proceedings of the IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT 2019); 17–20 September 2019; Lviv, Ukraine. pp. 169–172.

36. Xu X, Ester M, Kriegel HP, Sander J. A distribution-based clustering algorithm for mining in large spatial databases. In: Proceedings of the 14th International Conference on Data Engineering; 23–27 February 1998; Orlando, FL, USA. pp. 324–331.

37. Idrissi A, Rehioui H, Laghrissi A, Retal S. An improvement of DENCLUE algorithm for the data clustering. In: Proceedings of the 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA 2015); 21–23 December 2015; Marrakech, Morocco. pp. 1–6.

38. Milstein R, Schreyoegg J. Pay for performance in the inpatient sector: A review of 34 P4P programs in 14 OECD countries. Health Policy 2016; 120(10): 1125–1140. doi: 10.1016/j.healthpol.2016.08.009

39. Sawada H, Shoji Y, Sato K. A clustering method of arrival waves suitable for analyzing propagation characteristics. In: Proceedings of the 2008 Global Symposium on Millimeter Waves; 21–24 April 2008; Nanjing, China. pp. 1–3.

40. Oyelade J, Isewon I, Oladipupo O, et al. Data clustering: Algorithms and its applications. In: Proceedings of the 19th International Conference on Computational Science and Its Applications (ICCSA 2019); 01–04 July 2019; St. Petersburg, Russia. pp. 71–81.

41. Bethis SK, Phoha VV, Reddy YB. CLIQUE clustering approach to detect denial-of-service attacks. In: Proceedings of the Fifth Annual IEEE SMC Information Assurance Workshop, 2004; 10–11 June 2004; West Point, NY, USA. pp. 447–448.

42. Ishida M, Takakura H, Okabe Y. High-performance intrusion detection using Opti grid clustering and grid-based labelling. In: Proceedings of the 2011 IEEE/IPSJ International Symposium on Applications and the Internet; 18–21 July 2011; Munich, Germany. pp. 11–19.

43. Moon TK. The expectation-maximization algorithm. IEEE Signal Processing Magazine 1996; 13(6): 47–60. doi: 10.1109/79.543975

44. Satyanarayana A, Acquaviva V. Enhanced cobweb clustering for identifying analog galaxies in astrophysics. In: Proceedings of the 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE 2014); 04–07 May 2014; Toronto, ON, Canada. pp. 1–4.

45. Loyola-González O, Gutierrez-Rodríguez AE, Medina-Pérez MA, et al. An explainable artificial intelligence model for clustering numerical databases.IEEE Access 2020; 8: 52370–52384. doi: 10.1109/ACCESS.2020.2980581

46. Vesanto J, Alhoniemi E. Clustering of the self-organizing map. IEEE Transactions on Neural Networks 2000; 11(3): 586–600. doi: 10.1109/72.846731.

47. Wang W, Zhang Y, Li Y, Zhang X. The global fuzzy c-means clustering algorithm. In: Proceedings of the 2006 6th World Congress on Intelligent Control and Automation; 21–23 June 2006; Dalian. pp. 3604–3607.

48. Runkler TA. Relational Gustafson Kessel clustering using medoids and triangulation. In: Proceedings of the 14th IEEE International Conference on Fuzzy Systems, 2005 (FUZZ 2005); 25–25 May 2005; Reno, NV, USA. pp. 73–78.

49. Wang H, Yang H, Xu Z, Yuan Z. A clustering algorithm use SOM and k-means in intrusion detection. In: Proceedings of the 2010 International Conference on E-Business and E-Government; 07–09 May 2010; Guangzhou, China. pp. 1281–1284.

50. Sheikh RH, Raghuwanshi MM, Jaiswal AN. Genetic algorithm based clustering: A survey. In: Proceedings of the 2008 First International Conference on Emerging Trends in Engineering and Technology; 16–18 July 2008; Nagpur, India. pp. 314–319.

51. Gao BJ, Griffith OL, Ester M, et al. On the deep order-preserving submatrix problem: A best effort approach.IEEE Transactions on Knowledge and Data Engineering 2012; 24(2): 309–325. doi: 10.1109/TKDE.2010.244

52. Lu R, Cao A, Koh CK. Improving the scalability of SAMBA bus architecture. In: Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005; 21–21 January 2005; Shanghai, China. pp. 1164–1167.

53. Sekar K, Devi KS, Suganthi J, Dheepa T. Jellyfish search algorithm based optimal routing protocol for energy efficient data aggregation in wireless sensor networks. In: Proceedings of the 2023 International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC 2023); 03–04 February 2023; Silchar, India. pp. 1–6.

54. Badase PS, Deshbhratar GP, Bhagat AP. Classification and analysis of clustering algorithms for large datasets. In: Proceedings of the 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS 2015); 19–20 March 2015; Coimbatore, India. pp. 1–5.

55. Aljrees T, Shi D, Windridge D, Wong W. Criminal pattern identification based on modified K-means clustering. In: Proceedings of the 2016 International Conference on Machine Learning and Cybernetics (ICMLC 2016); 10–13 July 2016; Jeju, Korea (South). pp. 799–806.

56. Hu T, Liu C, Sun J, et al. Pairwise constrained clustering with group similarity-based patterns. In: Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications; 12–14 December 2010; Washington, DC, USA. pp. 260–265.

57. de Amorim RC. Constrained clustering with Minkowski weighted k-means. In: Proceedings of the 2012 IEEE 13th International Symposium on Computational Intelligence and Informatics (CINTI 2012); Budapest, Hungary. pp. 13–17.

58. Celik O, Hasanbasoglu M, Aktas MS, et al. Implementation of data preprocessing techniques on distributed big data platforms. In: Proceedings of the 2019 4th International Conference on Computer Science and Engineering (UBMK 2019); 11–15 September 2019; Samsun, Turkey. pp. 73–78.

59. Sreenivas P, Srikrishna CV. An analytical approach for data preprocessing. In: Proceedings of the 2013 International Conference on Emerging Trends in Communication, Control, Signal Processing and Computing Applications (C2SPCA 2013); 10–11 October 2013; Bangalore, India. pp. 1–12.

60. Mhon GGW, Kham NSM. ETL preprocessing with multiple data sources for academic data analysis. In: Proceedings of the2020 IEEE Conference on Computer Applications (ICCA 2020); 27–28 February 2020; Yangon, Myanmar. pp. 1–5.

61. Cooley R, Mobasher B, Srivastava J. Web mining: Information and pattern discovery on the world wide web. In: Proceedings of the Ninth IEEE International Conference on Tools with Artificial Intelligence; 03–08 November 1997; Newport Beach, CA, USA. pp. 558–567.

62. Venkatkumar IA, Shardaben SJK. Comparative study of data mining clustering algorithms. In: Proceedings of the 2016 International Conference on Data Science and Engineering (ICDSE 2016); 23–25 August 2016; Cochin, India. pp. 1–7.

63. Agnihotri D, Verma K, Tripathi P. Pattern and Cluster Mining on Text Data. 2014 Fourth International Conference on Communication Systems and Network Technologies. doi: 10.1145/1809400.1809404.

64. Bertini E, Lalanne D. Investigating and reflecting on the integration of automatic data analysis and visualization in knowledge discovery. ACM SIGKDD Explorations Newsletter 2009; 11(2): 9–18. doi: 10.1145/1809400.1809404

65. Sridevi KN, Prakasha S. Comparative study on various clustering algorithms review. In: Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS 2021); 06–08 May 2021; Madurai, India. pp. 153–158.




DOI: https://doi.org/10.32629/jai.v7i2.984

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Durga Venkata Prasad Maradana, Srikanth Thota

License URL: https://creativecommons.org/licenses/by-nc/4.0/