Identification of Differentially Expressed Genes and Prognostic Biomarkers of Breast Cancer Based on RNA-Seq and KEGG Pathway Network
1. College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
2. Software College, East China University of Technology, Nanchang, 330013, China
3. The 2nd Affiliated Hospital, Harbin Medical University, Harbin, 150081, China
Cancer Genetics and Epigenetics, 2016, Vol. 4, No. 2 doi: 10.5376/cge.2016.04.0002
Received: 25 Jul., 2016 Accepted: 26 Jul., 2016 Published: 18 Oct., 2016
© 2016 BioPublisher Publishing Platform
This is an open access article published under the terms of the Creative Commons Attribution License
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:
Zhang S.M., Gu Y., Wu S.Y., Kang Y., Liu S. and Zhang D., 2016, Identification of Differentially Expressed Genes and Prognostic Biomarkers of Breast Cancer Based on RNA-Seq and KEGG Pathway Network, Cancer Genetics and Epigenetics, 4(2): 1-9 (doi: 10.5376/cge.2016.04.0002)
The incidence of breast cancer is a complex biological process and multiple genes involved in the regulation. The gene expression differences of tumor cells between different patients’ determine the different treatment and prognosis. Therefore investigate the characteristics changes of breast cancer from a genetic level include identification of differentially expressed genes and prognostic markers will facilitate the development of appropriate and effective treatment. This subject obtained RNA-Seq Level 3 gene expression data from TCGA database, SAM algorithm was used to find differentially expressed genes. Next, the DAVID bioinformatics tool was employed to analyze the function of these genes, and obtained the significantly enriched pathways of these genes. Then gene interaction information was extracted from the pathways, KEGG pathway network was built by integrating these information, and the network topology were analyzed. The hub nodes extracted from the network were as candidate genes. Then the genes which have a significant impact on the survival were identified by using Cox proportional hazards regression model. And these genes were introduced into a multivariate analysis, the sample risk scores were calculated, according to which samples were divided into a high risk group and a low risk group. The survival difference between these two groups was analyzed using Kaplan Meier method, and logrank test was used to assess the statistical significant. By analyzing the gene expression dataset of TCGA database, a total of 5880 differentially expressed genes were found. Eight significant pathways were obtained by enrichment analysis. Then we used the interaction information of genes extracted from the pathways to build a KEGG pathway network, and 32 candidate genes were obtained from the network. Three significant genes (AARS, ADK, and ADORA2A) which have significant impact on the prognosis of breast cancer were identified by Cox proportional hazards. These three genes can be used as new prognostic biomarkers in breast cancer, provide guidance for the treatment of breast cancer. Wherein AARS has been proven associated with breast cancer risk. By multivariate analysis, this subject divided breast cancer into a high risk group and a low risk group, and there exits significant difference between them.
Breast Cancer; Differentially Expressed; KEGG Pathway Network; Gene; Prognosis