Background Texts

ARTICLE FOR A TECHNICAL AUDIENCE

Rafi Shaik and Ramakrishna Wusirika, “Machine Learning Approaches Distinguish Multiple Stress Conditions Using Stress-Responsive Genes and Identify Candidate Genes for Broad Resistance in Rice” From Plant Physiology, January 2014

3

Rafi Shaik and Ramakrishna Wusirika, biologists at Michigan Technological University, wrote this article for an audience of plant biologists. The full article is not reproduced here, but the abstract and introduction will give you a sense of the writers’ tone and sentence structure. The writers provide background on the purpose and significance of their research, use technical and scientific terms that are appropriate for an audience of fellow biologists, and incorporate sources from their scientific field.

Abstract

Abiotic and biotic stress responses are traditionally thought to be regulated by discrete signaling mechanisms. Recent experimental evidence revealed a more complex picture where these mechanisms are highly entangled and can have synergistic and antagonistic effects on each other. In this study, we identified shared stress-responsive genes between abiotic and biotic stresses in rice (Oryza sativa) by performing meta-analyses of microarray studies. About 70% of the 1,377 common differentially expressed genes showed conserved expression status, and the majority of the rest were down-regulated in abiotic stresses and up-regulated in biotic stresses. Using dimension reduction techniques, principal component analysis, and partial least squares discriminant analysis, we were able to segregate abiotic and biotic stresses into separate entities. The supervised machine learning model, recursive-support vector machine, could classify abiotic and biotic stresses with 100% accuracy using a subset of differentially expressed genes. Furthermore, using a random forests decision tree model, eight out of 10 stress conditions were classified with high accuracy. Comparison of genes contributing most to the accurate classification by partial least squares discriminant analysis, recursive-support vector machine, and random forests revealed 196 common genes with a dynamic range of expression levels in multiple stresses. Functional enrichment and coexpression network analysis revealed the different roles of transcription factors and genes responding to phytohormones or modulating hormone levels in the regulation of stress responses. We envisage the top-ranked genes identified in this study, which highly discriminate abiotic and biotic stresses, as key components to further our understanding of the inherently complex nature of multiple stress responses in plants.

Introduction

4

The need to breed robust and high-productivity crops is more important than ever due to increasingly adverse environmental conditions and scarce natural resources. Food productivity has to be raised by as much as 70% to 100% to meet the nutritional needs of the growing population, which is expected to rise to 9 billion by 2050 (Godfray et al., 2010; Lutz and Samir, 2010). Rice (Oryza sativa) is both a major food crop and a model organism that shares extensive synteny and collinearity with other grasses. Thus, the development of rice that can sustain a wide variety of adverse conditions is vital to meet the imminent global energy demands.

A broad range of stress factors divided into two major categories, namely abiotic stresses encompassing a variety of unfavorable environmental conditions, such as drought, submergence, salinity, heavy metal contamination or nutrient deficiency, and biotic stresses caused by infectious living organisms, such as bacteria, viruses, fungi, or nematodes, negatively affect the productivity and survival of plants. Advancements in whole-genome transcriptome analysis techniques like microarrays and RNA sequencing have revolutionized the identification of changes in gene expression in plants under stress, making it possible now to chart out individual stress-specific biomolecular networks and signaling pathways. However, in field conditions, plants are often subjected to multiple stresses simultaneously, requiring efficient molecular mechanisms to perceive a multitude of signals and to elicit a tailored response (Sharma et al., 2013). Increasing evidence from experimental studies suggests that the cross talk between individual stress response signaling pathways via key regulatory molecules, resulting in the dynamic modulation of downstream effectors, is at the heart of multiple stress tolerance. A number of studies have identified many genes, especially transcription factors (TFs) and hormone response factors, that play a central role in multiple stresses and manifest a signature expression specific to the stress condition. For example, abscisic acid (ABA) response factors are up-regulated in the majority of abiotic stresses, activating an oxidative response to protect cells from reactive oxygen species damage, but were found to be down-regulated in a number of biotic conditions, possibly suppressed by immune response molecules (Cao et al., 2011).

The wide range of abiotic and biotic stress factors and their numerous combinations in natural conditions generate a customized stress response. This suggests that the identification and characterization of key genes and their coexpression partners, which show an expression profile that discriminates abiotic and biotic stress responses, would increase our understanding of plant stress response manyfold and provide targets for genetic manipulation to improve their stress tolerance. The availability of multiple genome-wide transcriptome data sets for the same stress condition provides an opportunity to identify, compare, and contrast the stress-specific gene expression profile of one stress condition with other stresses. Meta-analysis combining similar studies provides a robust statistical framework to reevaluate original findings, improve sensitivity with increased sample size, and test new hypotheses. Meta-analysis of microarray studies is widely used, especially in clinical research, to improve statistical robustness and detect weak signals (Liu et al., 2013; Rung and Brazma, 2013). For instance, thousands of samples belonging to hundreds of cancer types were combined, which provided new insights into the general and specific transcriptional patterns of tumors (Lukk et al., 2010).

5

Microarray studies are burdened with a high dimensionality of feature space, also called the “curse of dimensionality” (i.e. the availability of very many variables [genes] for very few observations [samples]). Machine learning algorithms (supervised and unsupervised), such as principal component analysis (PCA), decision trees, and support vector machines (SVM), provide a way to efficiently classify two or more classes of data. Further feature selection procedures like recursive-support vector machines (R-SVM) provide means to identify the top features contributing most to the accuracy of classification.

In this study, we performed a meta-analysis of stress response studies in rice using publicly available microarray gene expression data conducted on a single platform (AffymetrixRiceArray). Meta-analysis of abiotic and biotic stresses was performed separately to identify differentially expressed genes (DEGs) involved in multiple stress conditions. The lists of abiotic and biotic DEGs were then compared to identify common genes with conserved and nonconserved gene expression (i.e., whether up-regulated, down-regulated, or oppositely regulated in both the categories), revealing the broad patterns of their involvement in the stress response. In order to test the efficiency of identified common DEGs in the classification of abiotic and biotic stresses as well as individual stresses within abiotic and biotic stresses, we systematically investigated various classification and machine learning techniques, including PCA, partial least squares discriminant analysis (PLS-DA), SVM, and random forest (RF). We characterized the shared DEGs through functional enrichment analysis of gene ontologies, metabolic pathways, TF families, and microRNAs (miRNAs) targeting them. We also analyzed the correlation of coexpression between the common DEGs to find sets of genes showing high coexpression and identify hub genes that show the greatest number of edges over a very high cutoff value.

Literature Cited

Cao FY, Yoshioka K, Desveaux D (2011) The roles of ABA in plant pathogen interactions. J Plant Res 124: 489–499

Godfray HC, Beddington JR, Crute IR, Haddad L, Lawrence D, Muir JF, Pretty J, Robinson S, Thomas SM, Toulmin C (2010) Food security: the challenge of feeding 9 billion people. Science 327: 812–818

Liu Z, Xie M, Yao Z, Niu Y, Bu Y, Gao C (2013) Three meta-analyses define a set of commonly overexpressed genes from microarray datasets on astrocytomas. Mol Neurobiol 47: 325–336

Lukk M, Kapushesky M, Nikkilä J, Parkinson H, Goncalves A, Huber W, Ukkonen E, Brazma A (2010) A global map of human gene expression. Nat Biotechnol 28: 322–324

Lutz W, Samir KC (2010) Dimensions of global population projections: what do we know about future population trends and structures? Philos Trans R Soc Lond B Biol Sci 365: 2779–2791

Rung J, Brazma A (2013) Reuse of public genome-wide gene expression data. Nat Rev Genet 14: 89–99

Sharma R, De Vleesschauwer D, Sharma MK, Ronald PC (2013) Recent advances in dissecting stress-regulatory crosstalk in rice. Mol Plant 6: 250–260

ARTICLE FOR A GENERAL AUDIENCE

Marcia Goodrich, “Scientists ID Genes That Could Lead to Tough, Disease-Resistant Varieties of Rice” From Michigan Tech News, March 31, 2014

6

This article reports on the research done by Rafi Shaik and Ramakrishna Wusirika. It was published on Michigan Tech News, a Michigan Technological University website that reports on campus news and the accomplishments of members of the campus community.

image