banner

Can Artificial Intelligence help a clinical laboratory to draw useful information from limited data sets? Application to mixed connective tissue disease

Daniel Bertin, Pierre Bongrand, Nathalie Bardin

Abstract


Diagnosis is a key step of patient management. During decades, refined decision algorithms and numerical scores based on conventional statistical methods were elaborated to ensure optimal reliability. Recently, a number of machine learning tools were developed and applied to process more and more extensive data sets, including up to millions of items and yielding sophisticated classification models. While this approach met with impressive efficiency in some cases, practical limitations stem from the high number of parameters that may be required by a model, resulting in increased cost and delay of decision making. Also, information relative to the specificity of local recruitment may be lost, hampering any simplification of universal models. Here, we explored the capacity of currently available artificial intelligence tools to classify patients found in a single health center on the basis of a limited number of parameters. As a model, the discrimination between systemic lupus erythematosus (SLE) and mixed connective tissue disease (MCTD) on the basis of thirteen biological parameters was studied with eight widely used classifiers (including logistic regression, support vector machine, nearest neighbor classifier, random forests and neural networks). A retrospective study including 44 patients (34 SLE, 10 MCTD) was conducted in Marseilles hospital organization. The best area under ROC curve yielded on test sets with classifiers using all 13 parameters was 0.83 ± 0.03 standard error and 0.86 ± 0.02 SE with 5 selected parameters. It is concluded that classification efficiency may be significantly improved by a knowledge-based selection of discriminating parameters.

Keywords


diagnostic algorithms; feature selection; machine learning; learning from data; systemic lupus erythematosus; mixed connective tissue disease; medical decision support; scikit-learn

Full Text:

PDF

References


1. Chamberlain G, Banks J. Assessment of the Apgar score. The Lancet 1974; 304(7891): 1225–1228. doi: 10.1016/s0140-6736(74)90745-4

2. Jameson JL, Fauci A, Kasper D, et al. Harrison’s Principles of Internal Medicine, 20th ed. McGraw Hill/Medical; 2018.

3. Petri M, Orbai AM, Alarcón GS, et al. Derivation and validation of the systemic lupus international collaborating clinics classification criteria for systemic lupus erythematosus. Arthritis & Rheumatism 2012; 64(8): 2677–2686. doi: 10.1002/art.34473

4. Hastie T, Tibshirani R, Friedman JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer; 2009.

5. Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: A retrospective analysis of outcome prediction. The Lancet 2019; 394(10201): 861–867. doi: 10.1016/S0140-6736(19)31721-0

6. Shen B, Yi X, Sun Y, et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell 2020; 182(1): 59–72. doi: 10.1016/j.cell.2020.05.032

7. Topol EJ. What’s lurking in your electrocardiogram?. The Lancet 2021; 397(10276): 785. doi: 10.1016/S0140-6736(21)00452-9

8. Beam AL, Kohane IS. Big data and machine learning in health care. JAMA 2018; 319(13): 1317–1318. doi: 10.1001/jama.2017.18391

9. Christodoulou E, Ma J, Collins GS, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology 2019; 110: 12–22. doi: 10.1016/j.jclinepi.2019.02.004

10. Finlayson SG, Subbaswamy A, Singh K, et al. The clinician and dataset shift in artificial intelligence. New England Journal of Medcine 2021; 385: 283–286. doi: 10.1056/NEJMc2104626

11. Tonner PD, Pressman A, Ross D. Interpretable modeling of genotype–phenotype landscapes with state-of-the-art predictive power. Proceedings of the National Academy of Sciences of the United States of America 2022; 119(26): e2114021119. doi: 10.1073/pnas.2114021119

12. van der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: A simulation study for predicting dichotomous endpoints. BMC Medical Research Methodology 2014; 14(1): 137. doi: 10.1186/1471-2288-14-137

13. Gennatas ED, Friedman JH, Ungar LH, et al. Expert-augmented machine learning. Proceedings of the National Academy of Sciences of the United States of America 2020; 117(9): 4571–4577. doi: 10.1073/pnas.1906831117

14. Steyvers M, Tejeda H, Kerrigan G, Smyth P. Bayesian modeling of human–AI complementarity. Proceedings of the National Academy of Sciences of the United States of America 2022; 119(11): e2111547119. doi: 10.1073/pnas.2111547119

15. Rose NR, Friedman H, Fahey JL. Manual of Clinical Laboratory Immunology, 3rd ed. American Society for Microbiology; 1986.

16. Chan EKL, Damoiseaux J, Carballo OG, et al. Report of the first international consensus on standardized nomenclature of antinuclear antibody HEp-2 cell patterns 2014–2015. Frontiers in Immunology 2015; 6: 412. doi: 10.3389/fimmu.2015.00412

17. Stark H, Dube P, Lührmann R, Kastner B. Arrangement of RNA and proteins in the spliceosomal U1 small nuclear ribonucleoprotein particle. Nature 2001; 409: 539–542. doi: 10.1038/35054102

18. Sharp GC, Irvin WS, LaRoque RL, et al. Association of autoantibodies to different nuclear antigens with clinical patterns of rheumatic disease and responsiveness to therapy. The Journal of Clinical Investigation 1971; 50(2): 350–359. doi: 10.1172/JCI106502

19. Sharp GC, Irvin WS, Tan EM, et al. Mixed connective tissue disease—An apparently distinct rheumatic disease syndrome associated with a specific antibody to an extractable nuclear antigen (ENA). The American Journal of Medicine 1972; 52(2): 148–159. doi: 10.1016/0002-9343(72)90064-2

20. Cappelli S, Randone SB, Martinović D, et al. “To Be or Not To Be,” ten years after: Evidence for mixed connective tissue disease as a distinct entity. Seminars in Arthritis and Rheumatism 2012; 41(4): 589–598. doi: 10.1016/j.semarthrit.2011.07.010

21. Ungprasert P, Crowson CS, Chowdhary VR, et al. Epidemiology of mixed connective tissue disease, 1985–2014: A population-based study: Epidemiology of MCTD. Arthritis Care & Research 2016; 68(12): 1843–1848. doi: 10.1002/acr.22872

22. Kasukawa R. Mixed connective tissue disease. Internal Medicine 1999; 38(5): 386–393. doi: 10.2169/internalmedicine.38.386

23. John KJ, Sadiq M, George T, et al. Clinical and immunological profile of mixed connective tissue disease and a comparison of four diagnostic criteria. International Journal of Rheumatology 2020; 2020. doi: 10.1155/2020/9692030

24. Lemrle J, Renaudineau Y. Anti-Sm and Anti-U1-RNP antibodies: An update. Lupus: Open Access 2016; 1: 3.

25. Damoiseaux J, Andrade LEC, Carballo OG, et al. Clinical relevance of HEp-2 indirect immunofluorescent patterns: The international consensus on ANA patterns (ICAP) perspective. Annals of the Rheumatic Diseases 2019; 78: 879–889. doi: 10.1136/annrheumdis-2018-214436

26. Cinquanta L, Bizzaro N, Pesce G. Standardization and quality assessment under the perspective of automated computer-assisted HEp-2 immunofluorescence assay systems. Frontiers in Immunology 2021; 12: 638863. doi: 10.3389/fimmu.2021.638863

27. Snedecor GW, Cochran WG. Statistical methods, 7th ed. Iowa State; 1980.

28. Müller AC, Guido S. Introduction to Machine Learning with Python: A Guide for Data Scientists, 1st ed. O’Reilly Media; 2016.

29. Joshi P. Artificial Intelligence with Python: Build real-world Artificial Intelligence applications with Python to intelligently interact with the world around you. Packt Publishing Limited; 2017.

30. Géron A. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 3rd ed. O’Reilly Media; 2023.

31. Fleetwood O, Kasimova MA, Westerlund AM, Delemotte L. Molecular insights from conformational ensembles via machine learning. Biophysical Journal 2020; 118: 765–780. doi: 10.1016/j.bpj.2019.12.016

32. Achar SR, Bourassa FXP, Rademaker TJ, et al. Universal antigen encoding of T cell activation from high-dimensional cytokine dynamics. Science 2022; 376: 880–884. doi: 10.1126/science.abl5311.

33. Provost F, Fawcett T. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking, 1st ed. O’Reilly Media; 2013.

34. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521: 436–444. doi: 10.1038/nature14539

35. Goodfellow I, Bengio Y, Courville A. Deep learning. In: Adaptive computation and machine learning. The MIT Press; 2016.

36. Gibney E. Could machine learning fuel a reproducibility crisis in science?. Nature 2022; 608: 250–251. doi: 10.1038/d41586-022-02035-w

37. de Hond AAH, Steyerberg EW, van Calster B. Interpreting area under the receiver operating characteristic curve. The Lancet Digital Health 2022; 4(12): e853–e855. doi: 10.1016/S2589-7500(22)00188-1

38. Booth S, Park KW, Lee CS, Ko JH. Predicting cognitive decline in Parkinson’s disease using FDG-PET-based supervised learning. Journal of Clinical Investigation 2022; 132(20): e157074. doi: 10.1172/JCI157074

39. Diebold M, Galli E, Kopf A, et al. High-dimensional immune profiling identifies a biomarker to monitor dimethyl fumarate response in multiple sclerosis. Proceedings of the National Academy of Sciences of the United States of America 2022; 119(31): e2205042119. doi: 10.1073/pnas.2205042119

40. Bertin D, Jourde-Chiche N, Bongrand P, Bardin N. Original approach for automated quantification of antinuclear autoantibodies by indirect immunofluorescence. Clinical and Developmental Immunology 2013; 2013: 1–8. doi: 10.1155/2013/182172

41. Bertin D, Mouhajir Y, Bongrand P, Bardin N. ICARE improves antinuclear antibody detection by overcoming the barriers preventing accreditation. Clinica Chimica Acta 2016; 454: 57–61. doi: 10.1016/j.cca.2015.12.034

42. Hasan G, Ferucci ED, Buyon JP, et al. Population-based prevalence and incidence estimates of mixed connective tissue disease from the manhattan lupus surveillance program. Rheumatology 2022. doi: 10.1093/rheumatology/keac703

43. Kheterpal S, Singh K, Topol EJ. Digitising the prediction and management of sepsis. The Lancet 2022; 399(10334): 1459. doi: 10.1016/S0140-6736(22)00658-4




DOI: https://doi.org/10.32629/jai.v6i2.664

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Daniel Bertin, Pierre Bongrand, Nathalie Bardin

License URL: https://creativecommons.org/licenses/by-nc/4.0