|Year : 2021 | Volume
| Issue : 4 | Page : 425-434
Gene expression analysis to network construction for the identification of hub genes involved in neurodevelopment
Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow, Uttar Pradesh, India
|Date of Submission||30-Sep-2021|
|Date of Acceptance||13-Oct-2021|
|Date of Web Publication||14-Dec-2021|
Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow, Uttar Pradesh
Source of Support: None, Conflict of Interest: None
Background: Gene expression information can be decoded to identify not only differentially expressed genes but also co-expressed genes that can give insight into protein interaction network. Current research has been done for the prediction of genes associated with Neurodevelopmental process using Microarray data and to construct the network of coexpressed genes and their functional annotation. Methods: Mesenchymal stem cells (MSCs) were exposed with Resveratrol (RV), Nerve Growth Factor (NGF) and RV+NGF to study the effect of neuroprotective role of RV (Data submitted NCBI's Gene Expression Omnibus (GEO Series accession number GSE121261). Bioinformatics software's, tools and databases like R and Bioconductor, Affy package, CoExpress 1.0b software, Metascape tool and Gene Ontology database was used prediction and functional enrichment of coexpressed genes. Normalization was done using RMA (Robust Multi-array Average) as implemented in Affy package and co-expressed genes were identified using CoExpress 1.0b with default parameters. Results: Co- expression result shows that total 135 genes have same gene expression across microarray chip these genes have function in different biological processes like, developmental processes, MAPK TRK pathway, muscle structure development etc. Total fifteen were identified that have function in nervous system development. Conclusions: This study identifies the list of co-expressed that were expressed in neurodevelopmental stage. These genes can be used further as neuronal markers, neuronal injury identification and diagnosis prospective at the developmental stage. Further verification methods are required for these predicted proteins for their applicability in drug development process.
Keywords: Co-expressed, microarray experiment, network construction, neurodevelopment, neuronal injury, protein–protein interaction
|How to cite this article:|
Yadav R. Gene expression analysis to network construction for the identification of hub genes involved in neurodevelopment. Biomed Biotechnol Res J 2021;5:425-34
|How to cite this URL:|
Yadav R. Gene expression analysis to network construction for the identification of hub genes involved in neurodevelopment. Biomed Biotechnol Res J [serial online] 2021 [cited 2022 Jan 25];5:425-34. Available from: https://www.bmbtrj.org/text.asp?2021/5/4/425/332460
| Introduction|| |
Gene expression experiment has become the important techniques to uncover the biological information and function of specific biological systems such as pathways, diseases states, and cancers. Major advantage of these experiments is that samples can be prepared using different exposures such as chemical, radiations, and stress, and variation in gene expression can be identified. Till date, number of gene expression experiments has been developed such as DNA microarray, serial analysis of gene expression, expressed sequence tags, and RNAseq that deal with the transcript information of cell. Microarray is most potential technique that is used to identify gene expression in various experimental conditions.
Till date, microarray experiment results have been majorly analyzed to identify the differentially expressed genes (DEGs). These are the set of genes that show difference or variation in gene expression in different biological conditions. With the advancement in computational science and with the availability of high performance and sophisticated software's and tools, these high throughput experiments have been explored in various directions such as pathway analysis, co-expression genes, and functional enrichment that reveals the crucial information of cell.
Number of bioinformatic software's and tools such as R and Bioconductor, MeV, Genesis, Genowiz, Qlucore Omics Explorer, GeneSpring GX are available for analyzing gene expression. Out of these, most of the software's are either commercially available or only available for trial version for 1 month., R and Bioconductor have emerged as potential free and open-source software that facilitates the analysis of various high throughput genomics and proteomics data.
Traditional methods of microarray data analysis need to be carefully evaluated for their suitability to analyze particular type of data set. Microarray experiments are useful to understand key questions of modern genomics. Key objectives of microarray experiment include following categories.
- DEG identification, a traditional microarray experiment is used to identify DEGs. That is the identification of genes that behave differently in one condition and show variation in their expression graph, e.g. identification of DEGs between cancer cell and normal cell. This type of DEG analysis is called as statistical analysis and uses comparison statistics
- Clustering or co-expression analysis, this type of analysis includes the identification of those genes that show the same behavior in their expression across microarray experiment. These are important to identify group of genes that belong to the same molecular and biological process (BP) for instance group of genes that are expressed at neurodevelopmental process
- Classification techniques, it includes to identify novel subtype of population and when the data of such population is previously known. Here, the aim is to classify microarray data into various subtypes or classes, depending upon the known set of genes and parameters, as for example, identification of classes of cancer that is difficult to distinguish morphologically but can be easily identified by gene expression data
- Gene ontology and pathway analysis, this type of annotation includes microarray studies that aim to identify set of genes that belong to common pathways, BP, or cellular component (CC), e.g. identification of group of genes that are expressed when cells are treated with drug.
Affymetrix microarray experiment generates raw intensity file called as CEL file. This file contains thousands of numerical values that correspond to the intensity measurement of each feature probe spotted on chip, so these files are of very large size.
Initial step in microarray data analysis includes quality control analysis after reading in raw files. [Figure 1] represents the flowchart for microarray data analysis from raw files to data analysis steps for the prediction and statistically significant and functional genes. Quality control step is important for visualization of the entire microarray intensity values per sample and to identify any error in microarray results. It generated various visualization plots such as intensity plot, RNA degradation plot, and box plot.
|Figure 1: Flowchart of microarray data analysis from raw files (.CEL file) to data analysis such as DEGs to extract significant and biologically relevant genes|
Click here to view
After QC analysis, next step is to remove any sources of error or biasness that might have been incorporated due to experimental errors or instrumental errors commonly known as systematic errors. Biasness may be incorporated in the microarray results due to difference in intensity of dyes, biasness in probe labeling, difference in spotting, difference in laser scanner. Before any analysis has to be done, primary step is to remove these errors from the microarray result, so that different CEL files that are generated per sample can be compared and biologically significant genes can be identified.
Normalization method generates normalized gene expression matrix, where the column represents the sample probes and rows represent the feature probes (Affymetrix ids). [Table 1] shows the gene expression matrix format that is generated after normalization and background correction step.
|Table 1: Gene expression matrix, column represents the sample probes, and rows represent the feature probes ids, and normalized intensity values per sample|
Click here to view
Statistical analysis is done on this normalized gene expression file, and different comparison statistical methods can be implemented on this file, such as paired simple t-test, paired t-test, moderated variance t-test, two-way ANOVA, significant analysis of microarray, etc. Various clustering methods such as hierarchical clustering, K-means clustering, and self-organizing map can also be executed on this microarray gene expression matrix to group samples or feature probes or both (biclustering). Clustering methods can also be applied to list of DEGs extracted from comparison statistics to group genes with similar gene expression profile.
Another most crucial and important step that is done after screening statistically significant genes is biological annotation and pathway analysis. Functional enrichment is a method to identify those genes that are involved in particular questioned biological condition. This data gives information about the set of genes that are linked with the particular biological processes such as reproduction, cell synthesis, signaling, and response to stimulus. Functional studies can be done by gene ontology database that gives the ontology or classifies the genes in three distinct classes that are BP, molecular function, and CC. Study of functional characteristics of genes has to be done carefully and it should be supported by literature study. There are different other databases that are available for functional enrichment of Affymetrix ids such as DAVID database and Metascape tool.
Pathway analysis is important to understand and establish the relation between functional and molecular mechanism of identified genes. There are different databases that can be used to study the pathways such as KEGG database, Panther database, and Reactome database. Functional study of genes can also be done at protein level by functional relationship between gene and protein. Protein detail of each gene can be done by UniProt database, STRING database for protein–protein interaction study, GeneMANIA database, etc.,
After screening of few genes around 2–10 that corresponds to particular sample of interest, biological verification has to be done by different experimental methods such as EST analysis to verify the expression of identified gene in different sample condition. Microarray also reduces the time, money, manpower that was used to investigate the thousands of genes as compared to few genes only.
Microarray experiment “Neuronal Development Injury and Repair”
Microarray experiment was used to detail the global program of gene expression underlying the neurotoxicity of monocrotophos (MCP) and neuroprotection of resveratrol (RV) using human umbilical cord blood mesenchymal stem cell (MSC)-derived neuronal cells.
[Figure 2] shows the microarray experiment that was designed to identify the genes that are expressed in different experimental exposure. MSCs were exposed with RV, nerve growth factor (NGF), and RV + NGF to study the effect of neuroprotective role of RV; detail of this microarray experiment has already been described in our previous publication (Yadav and Srivastava, clustering, pathway enrichment, and protein–protein interaction analysis of gene expression in neurodevelopmental disorders. Advances in Pharmacological Sciences. https://doi.org/10.1155/2018/3632159) and Yadav et al. NCBIs Gene Expression Omnibus (GEO Series accession number GSE121261 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc = GSE121261). [Figure 2] also describes the amalgamation of wet laboratory microarray experiment and computational methods to identify biologically relevant genes.
|Figure 2: Workflow of Affymetrix microarray experiment used for the current study. MSCs exposed to RV and NGF. MCPs were exposed to these MSCs pre and postdifferentiation. With the used of different protocols, multiple analysis can be done such as prediction of DEGs, co-expressed genes, and pathway MDA-microarray data analysis|
Click here to view
Protocol development is a key part of microarray data analysis, with the use of appropriate statistical methods and comparison methods. With the careful observation of the expression patterns of all the genes, crucial genes can be identified that are important for different cell functions, such as neuronal development, overall cell growth and development, regeneration, and apoptosis. For deep understanding of the microarray data, proper objectives must be framed and to accomplish these objectives, different protocols should be designed to predict gene of interest.
| Materials and Methods|| |
Microarray gene expression data used in this study has been initially deposited in NCBIs Gene Expression Omnibus Database with the accession number GSE121261 and has been published in our previous publication.,
Computational softwares and tools were used for the identification of co-expressed genes co-expressed across microarray experiment and network construction. This paper also describes the protocol that can be used for network construction and identification of hub genes using microarray gene expression experiment and the annotation methods. [Figure 3] shows the protocol, different software's, and databases that were used for co-expression analysis, network construction, and functional enrichment analysis.
|Figure 3: Representation of materials and methods used for this study. Bioinformatic software's and databases such as R and Bioconductor, CoExpress 1.0b, Cytoscape, Metascape, and STRING database were used for the prediction and functional annotation of identified genes|
Click here to view
R and Bioconductor, Affy packages were used for data normalization and data transformation. Gene expression matrix was generated using robust multiarray average as implemented in Affy package and normalized data was deposited in NCBIs GEO database. Gene expression data matrix was used for co-expression genes prediction and the identification of hub genes that have the same biological function.
Co-expressed genes were identified using CoExpress 1.0b software. Co-expressed genes that were identified were further used for network construction using Cytoscape tool. Network analysis provides the list of hub genes that were connected and belongs to the same cluster. It also identifies those genes that were not linked with the hub genes and belongs to different cluster.
Functional enrichment of hub genes was done using Metascape tool. Hub genes that were identified from network analysis were further used for protein–protein interaction study. Functional annotation of hub genes was done using Gene Ontology database and genes were screened that have function in neurodevelopmental process.
| Results|| |
Identification of co-expressed genes
Gene expression matrix was first analyzed for overall expression of genes in microarray chip. The size of overall gene expression matrix was 49495 × 10 = 494950. [Figure 4] shows the overall expression of all genes across ten samples. Expression graph shows that there was variation in gene expression across microarray chip, this variation ranges from 0.62 (minimum expression) to 1.04 (maximum expression). Sample 4 probes show the maximum expression and sample 7 probes have minimum expression.
|Figure 4: Overall expression of microarray samples. X-axis shows the sample probes number used in microarray experiment and y-axis shows the expression level of each sample probes|
Click here to view
Initial step in co-expression was gene filtering. Genes were filtered, from gene expression matrix if gene expression is less than 4.0 and standard deviation is less than 0.40. Genes were also filtered if they represent no values in more than 20% of the values. Before filtering, microarray data have mean value of 4.270 and standard deviation of 1.590. Total 136 genes were filtered following the above parameters with the mean of 6.96 and standard deviation of 1.810. Filtered data set was used for co-expression analysis. [Figure 5] shows the values of 136 genes in 10 samples; it shows that genes show the expression range from −2 to 3.
|Figure 5: Values of 136 genes over 10 samples peaks of each genes shows the down regulation (peak downward) and upregulation (peaks upward)|
Click here to view
Dataset was normalized before co-expression analysis, data were centered (mean = 0) and scaled (variance = 1). [Figure 6] shows the normalized dataset of filtered values (considered values) with the min values of −2.320 and max value of 3.600; it shows the range of filtered genes expression.
|Figure 6: Normalization of filtered genes graph shows the variation in gene expression ranges from min values of - 2.320 and max value of 3.600|
Click here to view
After gene filtering and normalization, co-expression analysis was done using Pearson's correlation coefficient, with correlation power of 1 and filtering threshold power of 0.9. Total 87 co-expressed genes were identified with the above parameters. [Figure 7] shows the distribution of co-expressed genes (CE); it shows that CE genes were normally distributed with the values ranging from −0.9 to +0.9 and it also shows the frequency of occurrence of CE genes which ranges from 0.002 to 0.044.
|Figure 7: Distribution of co-expressed genes (CE) with the frequency of occurrence of these genes. Graph shows that CE genes were normally distributed with the values ranging from -0.9 to +0.9|
Click here to view
Network construction and identification of hub genes
Co-expressed genes (87 genes) that were predicted were further used for network construction to identify those genes which were uniquely expressed genes. Network of co-expressed genes also identifies the hub genes that were connected [Figure 8] and show interaction. Gene network was further subclustered to identify closely related and distinct genes. Network analysis shows that seven genes, namely CLTC, CYP51A1, HSPA5, E1F5B, TRPS1, FOS, and PURA, were outside hub genes network and are distinct.
|Figure 8: Network of co-expressed genes visualized using Cytoscape software. Network shows the genes that are coregulated and outside the network (CLTC, CYP51A1, HSPA5, E1F5B, TRPS1, FOS, and PURA)|
Click here to view
Functional enrichment and gene ontology study of network genes
Co-expressed genes identified were further annotated for biological intervention and pathway analysis. List of co-expressed genes was searched against pathway and GO database. Each gene was studied for its pathway and process enrichment score, for statistically significant of genes in each BP.
Functional enrichment of network genes was done to identify the function of co-expressed genes [Figure 9]. Functional enrichment signifies that some genes were involved in, developmental processes, negative regulation of cytoplasmic translation, PID MAPK TRK pathway, muscle structure development, and positive regulation of multiorganism process.
|Figure 9: Functional enrichment of 87 co-expressed genes predicted using Metascape tool|
Click here to view
[Table 2] shows the list of Gene ontology details of co-expressed genes along with the functional detail and gene symbol in each GO category. GO (GO: 0007420) study shows that, ten genes, namely ZFHX3, ATRX, HSPA5, KIF5B, KRAS, PTEN, ZNF148, XRN2, NIPBL, and PTBP2, were expressed in brain development. Functional analysis of all the GO recognized that two genes ATRX and NIPBL were co-expressed, and also, these genes have the same function in neurodevelopment along with the brain development processes.
|Table 2: Gene ontology of co-expressed genes along with their functional description and gene associated with gene ontologies|
Click here to view
Other important genes such as FOS, KRAS, RAP1A, PTEN, SRSF5, NIPBL, QKI, SOBP, CLTC, RICTOR, HSPA5, KIF5B, USP34, and NAA15 were predicted; functional detail shows that these genes have function in PID MAPK TRK pathway (Canonical Pathways database, M270).
GO: 0061061 includes 8 genes, namely ZFHX3, FOS, KRAS, MBNL1, PTEN, QKI, ZBTB18, and ANKRD17, and functional study shows that these genes have function in muscle structure development. Other important category is response to radiation (GO: 0009314) with the genes FMR1, FOS, HSPA5, KRAS, SMC1A, NIPBL, ATRX, CPEB4, GNAI2, and PTEN, have to function in negative regulation of cytoplasmic translation, cohesin loading onto chromatin, negative regulation of calcium ion-dependent exocytosis, mRNA processing, regulation of translational initiation, transcriptional regulation by RUNX3, ribonucleoprotein complex subunit organization, estrogen-dependent gene expression, gamete generation, positive regulation of multiorganism process.
Some genes show function in repair mechanism pathways such as cohesin loading onto chromatin (Reactome Gene Sets, R-HSA-2470946) with the gene symbols SMC1A, WAPL, NIPBL, ATRX, PURA, REV3 L, ANKRD17, FMR1, MTF2, CLTC, PTEN, SUZ12, BRD2, DEK, KIF5B, RICTOR, and MBNL1 and protein refolding (GO: 0042026) mechanism with the genes HSPA5, ST13, PPIG, and GNAI2.
Gene ontology and functional analysis of co-expressed genes show that identified genes have function in developmental process, repair mechanism, regulation processes, and response to radiation. To extend the analysis, protein–protein interaction analysis was done to relate gene and protein relationship and their function.
Protein–protein interaction study and identification of neurodevelopmental genes
Protein–protein interaction analysis was carried out by STRING database. [Figure 10] shows that the protein interaction map of co-expressed genes and protein in red color nodes have function in function in neurodevelopment. By using this protein interaction map, those proteins were identified that have function in neurodevelopment. This study signifies the function of genes at protein level and function of identified genes in developmental process was verified.
|Figure 10: Protein–protein interaction map of co-expressed genes and identification of proteins involved in neurodevelopment (red nodes represent proteins that have function in neurodevelopment)|
Click here to view
Total fifteen proteins, namely ATRX, FMR1, FOS, KIF5B, KRAS, MBNL1, NIPBL, PTEN, PURA, QKI, RAP1A, TCF4, ZFHX3, ZNF148, and ZNF238, were identified that have function in nervous system development with the GO ID: GO.0007399 and [Table 3] shows the list of proteins that have function in neurodevelopment and also pathways of these proteins have been discussed in [Table 3].
|Table 3: Details of 15 genes that have function in neurodevelopment along with their function and pathway information|
Click here to view
PTEN protein has function in PI3K-Akt signaling pathway and p53 signaling pathway. FOS protein is involved in MAPK signaling pathway, TNF signaling pathway, Cholinergic synapse, and Wnt signaling pathway. KIF5B belongs to dopaminergic synapse and PID cadherin pathway. FMR1 has function in RNA transport RAP1A and KRAS protein is also involved in MAPK signaling pathway, Ras signaling pathway, and Neurotrophin signaling pathway. ZFHX3 signaling pathways regulate pluripotency of stem cells. These proteins have important function in signaling pathways and can be further verified for their potential role in neurodevelopment.
| Discussion|| |
Co-expression analysis identifies the genes that were co-expressed across microarray experiment. Co-expressed genes are the gene that shows the same level of variation in gene expression, in each experimental sample. It is important to identify these genes since co-expressed genes information can be used to identify genes that belong to the same biological pathway. These genes can also give the evidence for protein–protein interaction. The current study identified the proteins that are associated with neuronal injury caused by MCP. This experiment was done by microarray experiment and analysis was done to identify proteins that are expressed in different condition. This paper also describes the method that can be adopted to analyze any microarray experiment data to identify proteins network and hub proteins that expressed together for function. By using computational software's and tools for gene expression analysis, in this paper, we have identified few proteins such as PTEN, KIF5B, FMR1, KRAS, and ZFHX3 that have function in neurodevelopmental process and these proteins are also expressed when injured with organophosphate. These proteins can act as important marker to identify neurodevelopmental disorder also. Further verification methods are required for these predicted proteins for their applicability in drug development process.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Xu XF, Li J, Cao YX, Chen DW, Zhang ZG, He XJ, et al.
Differential expression of long noncoding RNAs in human cumulus cells related to embryo developmental potential: A microarray analysis. Reprod Sci 2015;22:672-8.
Pushparaj PN. Introduction to functional bioinformatics. In: Essentials of Bioinformatics. Vol. I. Cham: Springer; 2019. p. 235-54.
Liang B, Li C, Zhao J. Identification of key pathways and genes in colorectal cancer using bioinformatics analysis. Med Oncol 2016;33:111.
Chen Z, Dodig-Crnković T, Schwenk JM, Tao SC. Current applications of antibody microarrays. Clin Proteomics 2018;15:7.
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al.
Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 2015;12:115-21.
Sauer UG, Deferme L, Gribaldo L, Hackermüller J, Tralau T, van Ravenzwaay B, et al.
The challenge of the application of 'omics technologies in chemicals risk assessment: Background and outlook. Regul Toxicol Pharmacol 2017;91 Suppl 1:S14-26.
Saber HB, Elloumi M. DNA microarray data analysis: A new survey on biclustering. Int J Comput Biol 2015;4:21-37.
Jiang J, Sun X, Wu W, Li L, Wu H, Zhang L, et al
. Construction and application of a co-expression network in Mycobacterium tuberculosis
. Sci Rep 2016;6:28422.
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A. Distributed feature selection: An application to microarray data classification. Appl Soft Comput 2015;30:136-50.
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent advances and emerging applications in text and data mining for biomedical discovery. Brief Bioinform 2016;17:33-42.
Hung JH, Weng Z. Analyzing microarray data. Cold Spring Harb Protoc 2017;2017: (3):pdb-rot093112.
Agapito G, Cannataro M. A software pipeline for multiple microarray data analysis. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Kansas City, MO, USA: IEEE; 2017. p. 1941-4.
Kim GT, Kim Y, Kwon MS, Park T. Quality control plot for high dimensional omics data. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Shenzhen, China: IEEE; 2016. p. 1763-6.
Babichev SA, Kornelyuk AI, Lytvynenko VI, Osypenko VV. Computational analysis of microarray gene expression profiles of lung cancer. Biopolym Cell 2016;32:70-9.
Wu D, Gantier MP. Normalization of Affymetrix miRNA Microarrays for the Analysis of Cancer Samples. Methods Mol Biol 2016;1375:1-10.
Ramasamy P, Kandhasamy P. Effect of intuitionistic fuzzy normalization in microarray gene selection. Turk J Elec Eng Comp Sci 2018;26:1141-52.
Arslan MT, Kalinli A. A comparative study of statistical and artificial intelligence based classification algorithms on central nervous system cancer microarray gene expression data. Int J Intell Syst Appl Eng 2016;26:78-81.
Tamayo P, Steinhardt G, Liberzon A, Mesirov JP. The limitations of simple gene set enrichment analysis assuming gene independence. Statistical methods in medical research. 2016 Feb;25(1):472-87.
Chen L, Chu C, Lu J, Kong X, Huang T, Cai YD. Gene ontology and KEGG pathway enrichment analysis of a drug target-based classification system. PLoS One 2015;10:e0126492.
Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44-57.
Curtis RK, Oresic M, Vidal-Puig A. Pathways to the analysis of microarray data. Trends Biotechnol 2005;23:429-35.
Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009;37:1-13.
Russo G, Zegar C, Giordano A. Advantages and limitations of microarray technology in human cancer. Oncogene 2003;22:6497-507.
Yadav R, Srivastava P. Clustering, pathway enrichment, and protein-protein interaction analysis of gene expression in neurodevelopmental disorders. Advances in pharmacological sciences. 2018 Oct;2018. 10 Pages.
Yadav R, Srivastava P. Significant analysis of microarray (SAM) to identify synergistic effect of RV and NGF in repairing damaged neuronal cells. Toxicol Int 2019;25:26-39.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8], [Figure 9], [Figure 10]
[Table 1], [Table 2], [Table 3]