An Experimental Study on Microarray Expression Data from Plants under Salt Stress by using Clustering Methods.
DOI:
https://doi.org/10.9781/ijimai.2020.05.004Keywords:
Clustering, Clustering Quality Indexes, Gene ExpressionAbstract
Current Genome-wide advancements in Gene chips technology provide in the “Omics (genomics, proteomics and transcriptomics) research”, an opportunity to analyze the expression levels of thousand of genes across multiple experiments. In this regard, many machine learning approaches were proposed to deal with this deluge of information. Clustering methods are one of these approaches. Their process consists of grouping data (gene profiles) into homogeneous clusters using distance measurements. Various clustering techniques are
applied, but there is no consensus for the best one. In this context, a comparison of seven clustering algorithms was performed and tested against the gene expression datasets of three model plants under salt stress. These techniques are evaluated by internal and relative validity measures. It appears that the AGNES algorithm is the best one for internal validity measures for the three plant datasets. Also, K-Means profiles a trend for relative validity measures for these datasets.
Downloads
References
[1] Sharma. (2016). “Computational gene expression profiling under salt stress reveals patterns of co-expression”, Genomics data, Vol. 7, pp. 214- 221. DOI: https://doi.org/10.1016/j.gdata.2016.01.009.
[2] F. M. Afendi, N. Ono, Y. Nakamura, K. Nakamura, L. K. Darusman, N. Kibinge, A. H. Morita, K. Tanaka, H. Horai, and M. Altaf-Ul-Amin. (2013), “Data mining methods for omics and knowledge of crude medicinal plants toward big data biology”, Computational and Structural Biotechnology Journal, Vol. 4, No. 5, pp. e201301010. DOI: https://dx.doi.org/10.5936/csbj.201301010.
[3] L. M. O. Mesa, L. F. N. Vasquez, and L. Lopez-Kleine. (2012). “Identification and analysis of gene clusters in biological data”. In 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops, pp. 551-557.
[4] N. Pasquier, C. Pasquier, L. Brisson, and M. Collard. (2008). “Mining gene expression data using domain knowledge”, International Journal of Software and Informatics (IJSI), Vol. 2, No. 2, pp. 215-231.
[5] K. Raza. (2012), “Application of data mining in bioinformatics”, arXiv preprint arXiv: 1205.1125.
[6] R. Govindarajan, J. Duraiyan, K. Kaliyappan, and M. Palanisamy. (2012). “Microarray and its applications”, Journal of pharmacy & bioallied sciences, Vol. 4, No. (Supp2), pp.S310. DOI: https://dx.doi.org/10.4103/0975-7406.100283.
[7] W. Shannon, R. Culverhouse, and J. Duncan. (2003). “Analyzing microarray data using cluster analysis”. Pharmacogenomics, Vol. 4, No. 1, pp. 41-52. DOI: https://doi.org/10.1517/phgs.4.1.41.22581.
[8] W. A. Rensink and C. R. Buell. (2005). “Microarray expression profiling resources for plant genomics”, Trends in plant science, Vol. 10, No. 12, pp. 603-609. DOI: https://dx.doi.org/10.1016/j.tplants.2005.10.003.
[9] S. Y. Rhee and M. Mutwil. (2014). “Towards revealing the functions of all genes in plants”. Trends in plant science, Vol. 19, No. 4, pp. 212-221. DOI: https://dx.doi.org/10.1016/j.tplants.2013.10.006.
[10] K. Byron and J. T. Wang. (2018). “A comparative review of recent bioinformatics tools for inferring gene regulatory networks using time-series expression data”. International journal of data mining and bioinformatics, Vol. 20, No. 4, pp. 320-340. DOI: https://doi.org/10.1504/IJDMB.2018.094889.
[11] Y. Loewenstein, E. Portugaly, M. Fromer, and M. Linial. (2008), “Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space”, Bioinformatics, Vol. 24, No. 13, pp. i41-i49. DOI: https://doi.org/10.1093/bioinformatics/btn174.
[12] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. (1998). “Cluster analysis and display of genome-wide expression patterns” Proceedings of the National Academy of Sciences, Vol. 95, No. 25, pp. 14863-14868.
[13] A.Alizadeh, M. B. Eisen, R. E. Davis, C. Ma, I. S. Lossos, A. Rosenwald, J. C. Boldrick, H. Sabet, T. Tran, and X. Yu. (2000). “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling”, Nature, No. 6769, Vol. 403, pp. 503. DOI: https://doi.org/10.1038/35000501.
[14] J. Bajsa, Z. Pan, and S. O. Duke. (2011). “Transcriptional responses to cantharidin, a protein phosphatase inhibitor, in Arabidopsis thaliana reveal the involvement of multiple signal transduction pathways” Physiologia plantarum, Vol. 143, No. 2, pp. 188-205. DOI: https://doi.org/10.1111/j.1399-3054.2011.01494.x.
[15] A. Hossen, H. A. Siraj-Ud-Doulah, and A. Hoque. (2015). “Methods for evaluating agglomerative hierarchical clustering for gene expression data: a comparative study”, Computational Biology and Bioinformatics, Vol. 3, No. 6, pp. 88-94. DOI: https://doi.org/10.11648/j.cbb.20150306.12.
[16] F. Takahashi, J. Tilbrook, C. Trittermann, B. Berger, S. J. Roy, M. Seki, K. Shinozaki, and M. Tester. (2015). “Comparison of leaf sheath transcriptome profiles with physiological traits of bread wheat cultivars under salinity stress”, PLoS One, Vol. 10, No. 8, pp. e0133322. DOI: https://doi.org/10.1371/journal.pone.0133322.
[17] P. Gasch and M. B. Eisen. (2002). “Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering”, Genome biology, Vol. 3, No. 11, pp. research0059. 1, 2002. DOI: https://doi.org/10.1186/gb-2002-3-11-research0059.
[18] Y. Ge, Y. Li, Y.-M. Zhu, X. Bai, D.-K. Lv, D. Guo, W. Ji, and H. Cai. (2010). “Global transcriptome profiling of wild soybean (Glycine soja) roots under NaHCO 3 treatment”, BMC plant biology, Vol. 10, No. 1, pp. 153. DOI: https://doi.org/10.1186/1471-2229-10-153.
[19] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander, and T. R. Golub. (1999), “Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation”, Proceedings of the National Academy of Sciences, Vol. 96, No. 6, pp. 2907-2912, 1999. DOI: https://doi.org/10.1073/pnas.96.6.2907.
[20] S. Babichev, V. Lytvynenko, М. A. Taif, and A. Sharko. (2016). “Hybrid model of inductive clustering system оf high-dimensional data based on the sota algorithm”. No. 843, pp. 173-179.
[21] T. Deepika, and R. Porkodi. (2015). “A survey on microarray gene expression data sets in clustering and visualization plots”. Int J Emerg Res Manag Technol, Vol. 4, No. 3, pp. 56-66.
[22] M. S. Hasan, and Z. H. Duan. “Hierarchical k-Means: A Hybrid Clustering Algorithm and Its Application to Study Gene Expression in Lung Adenocarcinoma”. In Emerging Trends in Computer Science and Applied Computing, chap 4. Quoc Nam Tran and H. Arabnia, Eds. Boston: Morgan Kaufmann, 2015, pp. 51-67. DOI: https://doi.org/10.1016/B978-0-12-802508-6.00004-1.
[23] C. Murugananthi, and D. Ramyachitra. (December 2014). “An Empirical Analysis of Flame and Fuzzy C-Means Clustering for Protein Sequences”. International Journal of Computational Intelligence and Informatics Vol. 4, No. 3, pp. 214-220.
[24] J. Oyelade, I. Isewon, F. Oladipupo, O. Aromolaran, E. Uwoghiren, F. Ameh, M. Achas, and E. Adebiyi. (2016). “Clustering algorithms: Their application to gene expression data”. Bioinformatics and Biology insights, Vol. 10, pp. BBI. S38316. DOI: https://doi.org/10.4137/BBI.S38316.
[25] A. Sharma. (2016). “Computational gene expression profiling under salt stress reveals patterns of co-expression”, Genomics data, Vol. 7, pp. 214- 221. DOI: https://doi.org/10.1016/j.gdata.2016.01.009.
[26] L. López-Kleine, J. Romeo, and F. Torres-Avilés. (2013). “Gene functional prediction using clustering methods for the analysis of tomato microarray data”. In 7th International Conference on Practical Applications of Computational Biology & Bioinformatics. pp. 1-6. Springer, Heidelberg. DOI: https://doi.org/10.1007/978-3-319-00578-2_1.
[27] N. Belacel, Q. Wang, and M. Cuperlovic-Culf. (2006), “Clustering methods for microarray gene expression data”, Omics: a journal of integrative biology, Vol. 10, No. 4, pp. 507-531. DOI: https://doi.org/10.1089/omi.2006.10.507.
[28] A. Bihari, S. Tripathi, and A. Deepak. (2019). “Gene Expression Analysis Using Clustering Techniques and Evaluation Indices”. Available at SSRN 3350332.
[29] A. A. Singh, A. E. Fernando, and E. J. Leavline. (2016). “Performance Analysis on Clustering Approaches for Gene Expression Data”. International Journal of Advanced Research in Computer and Communication Engineering, Vol. 5, No. 2, pp. 196-200. DOI: https://doi.org/10.17148/IJARCCE.2016.5242.
[30] D. Luo, Y. Wu, J. Liu, Q. Zhou, W. Liu, Y. Wang ... & Z. Liu. (2019). “Comparative transcriptomic and physiological analyses of Medicago sativa l. indicates that multiple regulatory networks are activated during continuous ABA treatment”. International journal of molecular sciences, Vol. 20, No. 1, pp.47. DOI: https://doi.org/10.3390/ijms20010047.
[31] Rousseeuw, P. J., & Kaufman, L. (1990). Finding groups in Hoboken: Wiley Online Library.
[32] Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. Paper presented at the ACM Sigmod Record.
[33] Guha, S., Rastogi, R., & Shim, K. (1998). CURE: an efficient clustering algorithm for large databases. Paper presented at the ACM Sigmod Record.
[34] Guha, S., Rastogi, R., & Shim, K. (2000). ROCK: A robust clustering algorithm for categorical attributes. Information systems, Vol. 25, No. 5, pp. 345-366. DOI: https://doi.org/10.1016/S0306-4379(00)00022-3.
[35] Karypis, G., Han, E.-H. S., & Kumar, V. (1999). Chameleon: Hierarchical clustering using dynamic modeling. Computer, Vol. 8, pp. 68-75. DOI: https://doi.ieeecomputersociety.org/10.1109/MC.2005.258
[36] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Paper presented at the Proceedings of the fifth Berkeley symposium on mathematical statistics and probability.
[37] Kaufman, L., Rousseeuw, P., & Dodge, Y. (1987). Clustering by Means of Medoids in Statistical Data Analysis Based on the: L1 Norm,~ orthHolland, Amsterdam.
[38] Deepa, M. S., & Sujatha, N. (2014). Comparative Studies of Various Clustering Techniques and Its Characteristics. International Journal of Advanced Networking and Applications, Vol. 5, No.6, pp. 2104.
[39] Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge & Data Engineering, Vol. 5, pp. 1003-1016. DOI: https://doi.ieeecomputersociety.org/10.1109/TKDE.2002.1033770.
[40] Kohonen, T. (2013). Essentials of the self-organizing map. Neural networks, Vol. 37, pp. 52-65. DOI: https://doi.org/10.1016/j.neunet.2012.09.018
[41] Dunn, J. C. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. DOI: https:/doi:10.1080/01969727308546046.
[42] Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (2005). Automatic subspace clustering of high dimensional data. Data Mining and Knowledge Discovery, Vol. 11, No. 1, pp. 5-33. DOI: https://10.1007/s10618-005-1396-1
[43] Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Kdd.
[44] Saxena, A., Prasad, M., Gupta, A., Bharill, N., Patel, O. P., Tiwari, A., Lin, C.-T. (2017). A review of clustering techniques and developments. Neurocomputing, Vol. 267, pp. 664-681. DOI: https://doi.org/10.1016/j.neucom.2017.06.053.
[45] Z. Chan, R. Grumet, and W. Loescher. (2011). “Global gene expression analysis of transgenic, mannitol-producing, and salt-tolerant Arabidopsis thaliana indicates widespread changes in abiotic and biotic stress-relatedgenes”, Journal of Experimental Botany, Vol. 62, No. 14, pp. 4787-4803. DOI: https://doi.org//10.1093/jxb/err130.
[46] W. Sun, X. Xu, H. Zhu, A. Liu, L. Liu, J. Li, and X. Hua. (2010). “Comparative transcriptomic profiling of a salt-tolerant wild tomato species and a salt-sensitive tomato cultivar”, Plant and Cell Physiology, Vol. 51, No. 6, pp. 997-1006, 2010. DOI: https://doi.org/10.1093/pcp/pcq056.
[47] D. Li, Y. Zhang, X. Hu, X. Shen, L. Ma, Z. Su, T. Wang, and J. Dong. (2011). “Transcriptional profiling of Medicago truncatula under salt stress identified a novel CBF transcription factor MtCBF4 that plays an important role in abiotic stress responses”. BMC plant biology, Vol. 11, No. 1, pp. 109. DOI: https://doi.org/10.1186/1471-2229-11-109.
[48] T. Barrett, S. E. Wilhite, P. Ledoux, C. Evangelista, I. F. Kim, M. Tomashevsky, K. A. Marshall, K. H. Phillippy, P. M. Sherman, and M. Holko. (2012). “NCBI GEO: archive for functional genomics data sets— update”. Nucleic acids research, Vol. 41, No. D1, pp. D991-D995. DOI: https://doi.org/10.1093/nar/gks1193.
[49] A. Thalamuthu, I. Mukhopadhyay, X. Zheng, and G. C. Tseng. (2006). “Evaluation and comparison of gene clustering methods in microarray analysis”, Bioinformatics, Vol. 22, No. 19, pp. 2405-2412. DOI: https://doi.org/10.1093/bioinformatics/btl406
[50] G. Brock, V. Pihur, S. Datta, and S. Datta. (2011). “clValid, an R package for cluster validation”, Journal of Statistical Software (Brock et al., March 2008).
[51] Punitha, K. (2019). Extraction of Co-Expressed Degs From Parkinson Disease Microarray Dataset Using Partition Based Clustering Techniques. Paper presented at the 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). DOI: https://ieeexplore.ieee.org/document/8869140
[52] M. C. de Souto, I. G. Costa, D. S. de Araujo, T. B. Ludermir, and A. Schliep. (2008). “Clustering cancer gene expression data: a comparative study”, BMC bioinformatics, Vol. 9, No. 1, pp. 497. DOI: https://doi.org/10.1186/1471-2105-9-497.
[53] X. Yu, G. Yu, and J. Wang. (2017). “Clustering cancer gene expression data by projective clustering ensemble”, PLoS One, Vol. 12, No. 2, pp. e0171429. DOI: https://doi.org/10.1371/journal.pone.0171429.
Downloads
Published
-
Abstract299
-
PDF69






