比特app官网下载|gtex数据库
GTEx - Database Commons
GTEx - Database Commons
Database Commons a catalog of worldwide biological
databases
Search
e.g., human; SARS-CoV-2; lncRNA;
single cell;
spatial omics;
immune;
Oryza sativa;
European Bioinformatics Institute;China
Home
Search
Browse
Statistics
Curators
Help
Disclaimer
Submit
Sign in
Home
Database
Database Profile
GTEx
General information
URL:
https://www.gtexportal.org
Full name:
Genotype-Tissue Expression
Description:
GTEx established a data resource and tissue bank to study the relationship between genetic variation and gene expression in multiple human tissues. This release includes genotype data from approximately 714 donors and approximately 11688 RNA-seq samples across 53 tissue sites and 2 cell lines, with adequate power to detect Expression Quantitative Trait Loci in 48 tissues.
Year founded:
2013
Last update:
2019-7-24
Version:
v8
Accessibility:
Manual:
Accessible
Real time :
Checking...
Country/Region:
United States
Classification & Tag
Data type:
DNA
RNA
Data object:
Animal
Database category:
Expression
Genotype phenotype and variation
Major species:
Homo sapiens
Keywords:
normal tissue
tissue site
eQTL
RNA-seq
Contact information
University/Institution:
Broad Institute
Address:
9000 Rockville Pike, Bethesda, Maryland 20892
City:
Bethesda
Province/State:
Maryland
Country/Region:
United States
Contact name (PI/Team):
GTEx consortium
Contact email (PI/Helpdesk):
volpis@mail.nih.gov
Publications
29334591
GTEx project maps wide range of normal human genetic variation: A unique catalog and follow-up effort associate variation with gene expression across dozens of body tissues. [PMID: 29334591]
Abstract
Am J Med Genet A. 2018:176(2)
| 4 Citations (from Europe
PMC, 2024-03-09)
29019975
Enhancing GTEx by bridging the gaps between genotype, gene expression, and disease. [PMID: 29019975]
eGTEx Project.
Abstract
Genetic variants have been associated with myriad molecular phenotypes that provide new insight into the range of mechanisms underlying genetic traits and diseases. Identifying any particular genetic variant's cascade of effects, from molecule to individual, requires assaying multiple layers of molecular complexity. We introduce the Enhancing GTEx (eGTEx) project that extends the GTEx project to combine gene expression with additional intermediate molecular measurements on the same tissues to provide a resource for studying how genetic differences cascade through molecular phenotypes to impact human health.
Nat Genet. 2017:49(12)
| 92 Citations (from Europe
PMC, 2024-03-09)
25954001
Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. [PMID: 25954001]
GTEx Consortium.
Abstract
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
Science. 2015:348(6235)
| 2871 Citations (from Europe
PMC, 2024-03-09)
25954002
Human genomics. The human transcriptome across tissues and individuals. [PMID: 25954002]
Melé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, Sammeth M, Young TR, Goldmann JM, Pervouchine DD, Sullivan TJ, Johnson R, Segrè AV, Djebali S, Niarchou A, GTEx Consortium, Wright FA, Lappalainen T, Calvo M, Getz G, Dermitzakis ET, Ardlie KG, Guigó R.
Abstract
Transcriptional regulation and posttranscriptional processing underlie many cellular and organismal phenotypes. We used RNA sequence data generated by Genotype-Tissue Expression (GTEx) project to investigate the patterns of transcriptome variation across individuals and tissues. Tissues exhibit characteristic transcriptional signatures that show stability in postmortem samples. These signatures are dominated by a relatively small number of genes—which is most clearly seen in blood—though few are exclusive to a particular tissue and vary more across tissues than individuals. Genes exhibiting high interindividual expression variation include disease candidates associated with sex, ethnicity, and age. Primary transcription is the major driver of cellular specificity, with splicing playing mostly a complementary role; except for the brain, which exhibits a more divergent splicing program. Variation in splicing, despite its stochasticity, may play in contrast a comparatively greater role in defining individual phenotypes.
Science. 2015:348(6235)
| 697 Citations (from Europe
PMC, 2024-03-09)
26484571
A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. [PMID: 26484571]
Carithers LJ, Ardlie K, Barcus M, Branton PA, Britton A, Buia SA, Compton CC, DeLuca DS, Peter-Demchok J, Gelfand ET, Guan P, Korzeniewski GE, Lockhart NC, Rabiner CA, Rao AK, Robinson KL, Roche NV, Sawyer SJ, Segrè AV, Shive CE, Smith AM, Sobin LH, Undale AH, Valentino KM, Vaught J, Young TR, Moore HM, GTEx Consortium.
Abstract
The Genotype-Tissue Expression (GTEx) project, sponsored by the NIH Common Fund, was established to study the correlation between human genetic variation and tissue-specific gene expression in non-diseased individuals. A significant challenge was the collection of high-quality biospecimens for extensive genomic analyses. Here we describe how a successful infrastructure for biospecimen procurement was developed and implemented by multiple research partners to support the prospective collection, annotation, and distribution of blood, tissues, and cell lines for the GTEx project. Other research projects can follow this model and form beneficial partnerships with rapid autopsy and organ procurement organizations to collect high quality biospecimens and associated clinical data for genomic studies. Biospecimens, clinical and genomic data, and Standard Operating Procedures guiding biospecimen collection for the GTEx project are available to the research community.
Biopreserv Biobank. 2015:13(5)
| 423 Citations (from Europe
PMC, 2024-03-09)
23715323
The Genotype-Tissue Expression (GTEx) project. [PMID: 23715323]
GTEx Consortium.
Abstract
Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
Nat Genet. 2013:45(6)
| 4159 Citations (from Europe
PMC, 2024-03-09)
Ranking
All databases:
13/5981
(99.799%)
Genotype phenotype and variation:
4/850
(99.647%)
Expression:
3/1137
(99.824%)
13
Total Rank
8,148
Citations
740.727
z-index
Community reviews
Not Rated
Data quality & quantity:
Content organization & presentation
System accessibility & reliability:
Submit a review
Word cloud
Tags
DNA
RNA
Genotype phenotype and variation
Expression
normal tissue
tissue site
eQTL
RNA-seq
Related Databases
Citing
Cited by
Record metadata
Created on: 2019-07-30
Curated by:
Lina Ma [2019-07-31]
Lina Ma [2019-07-30]
GTEx
Previous
Next
GTEx数据库简介(1) - 知乎
GTEx数据库简介(1) - 知乎切换模式写文章登录/注册GTEx数据库简介(1)HuaMD医学大数据分享医学大数据知识----医学大数据及其综合分析(四)Hua+医学大数据 出品(转载请注明出处链接,翻版必究)(HuaPlusMD通过整合多种人类和动物数据库,建立了可靠的大数据库,为您提供疾病动物模型和临床大数据综合分析。链接:https://www.huaplusmd.com)前言:“大数据”概念早已出现,目前我们对(医学)大数据了解有多少呢?本平台将对医学大数据进行系统的介绍,并对大数据综合分析进行分享(每周更新)。分享的内容将主要涵盖大数据库(基因、蛋白数据库等)/生物银行介绍(UK Biobank, Finnish Biobanks, China Kadoorie Biobank, BioBank Japan, TCGA, GWAS catalog,GTEx等),疾病动物模型数据库(如GeneNetwork, BXD),大数据库的综合使用(如Mendelian randomization),组学数据分析等。(分享的其他系列内容请见:https://www.huaplusmd.com/knowledge) 每个个体的不同的器官组织的基因(Gene)都是相同的,但为什么有的表型为肝脏组织,帮助人类代谢?有的是肌肉组织,帮助人类运动?其原因是,不同的人体组织表达的基因并不相同。GTEx项目,通过收集健康人体的不同组织样本,尝试了解人类不同组织/器官的特异性基因表达。 从本期开始,我们将介绍GTEx数据库。这是一个值得大家深度学习的数据库。GTEx项目,全称Genotype-Tissue Expression (基因型-组织表达) ,主要由美国NIH(国立卫生研究院)的公共基金计划连续资助了10年(2010-2019)的项目。(特别希望我国也能支持,这种长期的大队列的人体基础研究,能使非敏感数据开源,接受国际同行的评议。功在当代、利在千秋!) GTEx项目是用来研究人类不同组织的特异性基因表达和调节的。GTEx 项目最终的数据库(第八版,V8),包括来自于838位生前健康的人类捐献者的DNA数据(包含Whole Genome Sequencing (WGS) 和 Whole Exome Sequencing (WES));17382 份RNA-seq 数据,其来自于近1000个人类个体,涵盖54个不同组织器官部位(目前世界唯一能收集这么全的健康人体组织样本);以及2个来自捐献者血液和皮肤的细胞系。该数据库应用:· 评价不同组织特异性基因表达和调节;· 进行GWAS研究 (genome-wide association study);· 可以用来探索遗传变异对复杂疾病和特征的影响。应用举例:GTEx的研究人员,通过GTEx数据库,设计一种统计方法,称为PrediXcan,该方法能够通过基因序列,推测基因的活性或表达量;然后,PrediXcan能够将推测的基因活性和观测到的疾病特征相关联,从而预测疾病。PrediXcan已经成功地发现与多种疾病相关的特异基因,这些疾病包括 冠状动脉疾病、克罗恩病、类风湿性关节炎、 1 型糖尿病 和 双相情感障碍。 该项目创建了GTEx Portal(https://gtexportal.org/home/),该平台提供开放获取的数据,包括基因表达、QTLs 及 生理组织学 图片。 GTEx项目,也同时建立了自己的生物银行(https://gtexportal.org/home/biobank),包含来自约960位生前健康的捐赠者的组织标本的,包括肺脏、脑、胰腺、皮肤等等。如果需要,还可以申请获取留存的生物样本。GTEx联盟,在世界顶刊上Science, Nature上发表的代表性文章列表:· 2015年,GTEx项目发布了第一个阶段性成果,一次性在Science上发表3篇研究成果:The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humansThe GTEx Consortium.Science. 8 May 2015. 348(6235):648-660. doi:10.1126/science. PMID: 25954001 The human transcriptome across tissues and individualsMelé M, Ferreira PG, Reverter F, DeLuca DS, Monlong J et al.Science. 8 May 2015. 348(6235):660-665. doi: 10.1126/science.aaa0355 Effect of predicted protein-truncating genetic variants on the human transcriptomeRivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK et al.Science. 8 May 2015. 348(6235):666-669. doi:10.1126/science.1261877. · 2017年,GTEx项目发布了进一步成果,一次性在Nature发表4篇研究成果:Genetic effects on gene expression across human tissuesThe GTEx Consortium.Nature. 12 Oct 2017. 550: 204-213. Epub 11 Oct 2017. doi:10.1038/nature24277The impact of rare variation on gene expression across tissuesLi X, Kim Y, Tsang EK, Davis JR, Damani FN et al.Nature. 12 Oct 2017. 550: 239-243. Epub 11 Oct 2017. doi:10.1038/nature24267Landscape of X chromosome inactivation across human tissuesTukiainen T, Villani AC, Yen A, Rivas MA, Marshall JL et al.Nature. 12 Oct 2017. 550: 244-248. Epub 11 Oct 2017. doi:10.1038/nature24265Dynamic landscape and regulation of RNA editing in mammalsTan MH, Li Q, Shanmugam R, Piskol R, Kohler J et al.Nature. 12 Oct 2017. 550:249-254. Epub 11 Oct 2017. doi:10.1038/nature24041· 2019-2022年,GTEx项目又连续发布了项目的成果,在Science发表7篇研究成果:2022Single-nucleus cross-tissue molecular reference maps toward understanding disease gene functionEraslan G, et al.Science. 376 (abl4290), 13 May 2022. doi:10.1126/science.abl42902020The GTEx Consortium atlas of genetic regulatory effects across human tissuesThe GTEx Consortium.Science. 369 (1318-1330), 10 Sep 2020. doi:10.1126/science.aaz1776Cell type specific genetic regulation of gene expression across human tissuesKim-Hellmuth* S, Aguet* F, Oliva M, Muñoz-Aguirre M, Kasela S, et al.Science. 369 (eaaz8528), 10 Sep 2020. doi:10.1126/science.aaz8528Transcriptomic signatures across human tissues identify functional rare genetic variationFerraro* NM, Strober* BJ, Einson J, Abell NS, Aguet F, et al.Science. 369 (aaz5900), 10 Sep 2020. doi:10.1126/science.aaz5900Determinants of telomere length across human tissuesDemanelis K, Jasmine F, Chen LS, Chernoff M, Tong L, et al.Science. 369 (aaz6876), 10 Sep 2020. doi:10.1126/science.aaz6876The impact of sex on gene expression across human tissuesOliva* M, Muñoz-Aguirre* M, Kim-Hellmuth* S, Wucher V, Gewirtz ADH, et al.Science. 369 (aba3066), 10 Sep 2020. doi:10.1126/science.aba30662019RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissuesYizhak K, Aguet F, Kim J, Hess JM, Kübler K et al.Science. 07 June 2019. 364(6444). doi:10.1126/science.aaw0726 如果你可以看youtube视频,可以看一下Prof. Eric Lander (Funding director, Broad Institute) 等对GTEx的简单介绍:https://www.youtube.com/watch?v=PhK186A7Ryo---end---—如果喜欢,快分享给你的朋友们吧—关注公众号,更多精彩内容等着你!原文链接:https://www.huaplusmd.com/knowledgeHua+医学大数据 出品 (医学大数据综合分析,HuaPlusMD坚持专业和认真)。如果您有医学大数据综合分析方面需求欢迎联系我们:https://www.huaplusmd.com/往期回顾:医学大数据及其综合分析(总纲)医学大数据及其综合分析(一)—— GEO数据库介绍 (1)医学大数据及其综合分析(一)—— GEO数据库介绍 (2)医学大数据及其综合分析(二)—— BXD小鼠数据库介绍 (1)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (2)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (3)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (4)医学大数据及其综合分析(三)—— eQTLGen Consortium数据库简介(1)医学大数据及其综合分析(三)—— eQTLGen Consortium数据库简介(2)医学大数据及其综合分析(X)—— 实例分析1:中年发福:人体代谢率 不背此锅新冠肺炎(COVID-19)的致死率参考文献:[1] https://commonfund.nih.gov/GTex.[2] https://gtexportal.org/home/发布于 2022-10-24 04:27大数据赞同 192 条评论分享喜欢收藏申请
Genotype-Tissue Expression Project (GTEx)
Genotype-Tissue Expression Project (GTEx)
Skip to main content
Skip to navigation
Skip to search
Skip to slider
Skip to about
Skip to
subscription
Skip to footer
National Human Genome Research Institute
ABOUTGENOMICS
About Genomics
Introduction to Genomics
Educational
Resources
Policy
Issues in Genomics
The Human Genome
Project
RESEARCHFUNDING
RESEARCHFUNDING
Funding
Opportunities
Funded Programs & Projects
Division and Program Directors
Scientific
Program Analysts
Contact
by Research Area
News & Events
RESEARCHAT NHGRI
RESEARCHAT NHGRI
Research
Areas
Research
investigators
Research
Projects
Clinical
Research
Data
Tools & Resources
News &
Events
ABOUTHEALTH
ABOUT HEALTH
Genomics
& Medicine
Family
Health History
For
Patients & Families
For
Health Professionals
Careers & Training
Careers & Training
Jobs
at NHGRI
Training at NHGRI
Funding for Research
Training
Professional
Development Programs
NHGRI
Culture
News &Events
News & Events
News
Events
Social
Media
Broadcast Media
Video
Image
Gallery
Press Resources
AboutNHGRI
About NHGRI
Organization
NHGRI
Director
Mission & Vision
Policies & Guidance
Budget
Institute Advisors
Strategic Vision
Leadership Initiatives
Diversity, Equity, and Inclusion
Partner with NHGRI
Staff
Search
Contact
Us
Breadcrumb
Home
Research Funding
Funded Programs and Projects
Genotype-Tissue Expression Project (GTEx)
Home
Research Funding
Funded Programs and Projects
Genotype-Tissue Expression Project (GTEx)
An NIH Common Fund Project
The aim of the Genotype - Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes contribute to common human diseases, in order to improve health care for future generations.
GTEx Publishes Final Dataset (V8)
On Sept. 11, 2020, the final set of analyses from the GTEx Consortium were published in Science. The latest GTEx data release represents the largest atlas of human gene expression and catalog of trait loci to date.
Overview
Launched by the National Institutes of Health (NIH) in September 2010 (See: NIH launches Genotype-Tissue Expression project), GTEx will create a resource that researchers can use to study how inherited changes in genes lead to common diseases. It will establish a database and a tissue bank that can be used by many researchers around the world for future studies.
GTEx researchers are studying genes in different tissues obtained from many different people. Thus every donor's generous gift of tissues and medical information to the GTEx project makes possible research that will help improve our understanding of diseases, giving hope that we will find better ways to prevent, diagnose, treat and eventually cure these diseases in the future.
In addition, the GTEx project includes a study to explore the effectiveness of the GTEx donor consent process. We hope to better understand how participating in the study might affect the attitudes, beliefs and feelings of donors and the families of deceased donors using interviews and surveys of participants and their families. This study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study.
GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues and for testing them in the lab. These tissues and organs are collected and stored through the National Cancer Institute's cancer Human Biobank initiative on behalf of GTEx. Until now, no project has analyzed genetic variation and expression in as many tissues in such a large population as planned for GTEx.
GTEx is funded through the NIH Common Fund, which supports innovative projects involving multiple NIH Institutes. GTEx is managed by the NIH Office of the Director, in partnership with the National Human Genome Research Institute, National Institute of Mental Health, National Cancer Institute, and numerous other NIH institutes. Additional information about the NIH Common Fund can be found at http://commonfund.nih.gov.
To learn more about the science behind the GTEx project, we invite you to visit: http://commonfund.nih.gov/GTEx.
Overview
Launched by the National Institutes of Health (NIH) in September 2010 (See: NIH launches Genotype-Tissue Expression project), GTEx will create a resource that researchers can use to study how inherited changes in genes lead to common diseases. It will establish a database and a tissue bank that can be used by many researchers around the world for future studies.
GTEx researchers are studying genes in different tissues obtained from many different people. Thus every donor's generous gift of tissues and medical information to the GTEx project makes possible research that will help improve our understanding of diseases, giving hope that we will find better ways to prevent, diagnose, treat and eventually cure these diseases in the future.
In addition, the GTEx project includes a study to explore the effectiveness of the GTEx donor consent process. We hope to better understand how participating in the study might affect the attitudes, beliefs and feelings of donors and the families of deceased donors using interviews and surveys of participants and their families. This study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations of participants in the study.
GTEx is a pioneering project that uses state-of-the-art protocols for obtaining and storing a large range of organs and tissues and for testing them in the lab. These tissues and organs are collected and stored through the National Cancer Institute's cancer Human Biobank initiative on behalf of GTEx. Until now, no project has analyzed genetic variation and expression in as many tissues in such a large population as planned for GTEx.
GTEx is funded through the NIH Common Fund, which supports innovative projects involving multiple NIH Institutes. GTEx is managed by the NIH Office of the Director, in partnership with the National Human Genome Research Institute, National Institute of Mental Health, National Cancer Institute, and numerous other NIH institutes. Additional information about the NIH Common Fund can be found at http://commonfund.nih.gov.
To learn more about the science behind the GTEx project, we invite you to visit: http://commonfund.nih.gov/GTEx.
Donors
The generosity of donors and donor families make this project possible. The goal of GTEX is to increase our understanding of how changes in genes contribute to common human diseases. This knowledge will improve health care for future generations.
GTEx will create information that will be useful to many researchers, studying many different diseases. The gift of your tissue or your loved one's tissue may lead to research which could help improve treatment for many people in the future.
There are two types of donor groups that participate in the GTEx project: 1) organ and tissue donors, and 2) surgical donors.
Organ and tissue donors include individuals who have agreed to donate organs (like kidneys, heart, and liver) and/or tissues (like bone and cornea) for use as medical transplants after they died. Family members may also make the decision to give consent for organ or tissue donation after their loved one has passed on. These donors or their family members have the opportunity to indicate whether any organs or tissues ineligible for transplants may be donated to benefit research studies like GTEx. Donating to GTEx would not interfere with the use of the organ or tissues for transplantation, which takes priority. Compared to surgical donors, many more types of tissues can be obtained for research studies from organ and tissue donors. People who may not qualify to donate organs or tissue for transplants may still qualify to donate tissues to GTEx for research.
Surgical tissue donors include people who undergo certain kinds of surgery. If a surgery patient agrees ahead of time, tiny amounts of tissue removed during surgery, such as fat, skin, or muscle, can be donated for use in the GTEx project. Only tissue which needs to be removed for medical reasons can be donated to the GTEx project. Donating to the GTEx project will not cause any additional tissue to be removed.
GTEx Findings
It has been said that someone has "good genes" when they are particularly healthy, but what does that mean? How does understanding of genetics translate into better health? NIH designed the Genotype Tissue Expression (GTEx) project to start to answer this question. The project is looking at the differences in people's genes.
Genes are made up of DNA and DNA is made up of different pieces too. One of GTEx's goals is to identify the pieces of DNA that control how genes behave. These pieces of DNA are called expression quantitative trait loci or eQTLs. These eQTLs control the behavior of genes like a thermostat regulates the temperature of a home. GTEx studies found that the number of eQTLs varies from person to person and from tissue to tissue. Researchers also discovered eQTLs act in different ways. Some eQTLs may affect a set of genes in one tissue, while other eQTLs affect genes in many tissues.
The GTEx consortium has also built an eQTL web-browser (http://www.gtexportal.org/home/) to help visualize and discover new relationships between genes and the DNA that affects them. This website provides a resource for the many researchers who are exploring the human genome. Understanding how the eQTLs change gene behavior in different tissues can help us understand how diseases develop in people. This knowledge, in turn, may help us develop new therapies and treatments, improving our health overall.
Donors
The generosity of donors and donor families make this project possible. The goal of GTEX is to increase our understanding of how changes in genes contribute to common human diseases. This knowledge will improve health care for future generations.
GTEx will create information that will be useful to many researchers, studying many different diseases. The gift of your tissue or your loved one's tissue may lead to research which could help improve treatment for many people in the future.
There are two types of donor groups that participate in the GTEx project: 1) organ and tissue donors, and 2) surgical donors.
Organ and tissue donors include individuals who have agreed to donate organs (like kidneys, heart, and liver) and/or tissues (like bone and cornea) for use as medical transplants after they died. Family members may also make the decision to give consent for organ or tissue donation after their loved one has passed on. These donors or their family members have the opportunity to indicate whether any organs or tissues ineligible for transplants may be donated to benefit research studies like GTEx. Donating to GTEx would not interfere with the use of the organ or tissues for transplantation, which takes priority. Compared to surgical donors, many more types of tissues can be obtained for research studies from organ and tissue donors. People who may not qualify to donate organs or tissue for transplants may still qualify to donate tissues to GTEx for research.
Surgical tissue donors include people who undergo certain kinds of surgery. If a surgery patient agrees ahead of time, tiny amounts of tissue removed during surgery, such as fat, skin, or muscle, can be donated for use in the GTEx project. Only tissue which needs to be removed for medical reasons can be donated to the GTEx project. Donating to the GTEx project will not cause any additional tissue to be removed.
GTEx Findings
It has been said that someone has "good genes" when they are particularly healthy, but what does that mean? How does understanding of genetics translate into better health? NIH designed the Genotype Tissue Expression (GTEx) project to start to answer this question. The project is looking at the differences in people's genes.
Genes are made up of DNA and DNA is made up of different pieces too. One of GTEx's goals is to identify the pieces of DNA that control how genes behave. These pieces of DNA are called expression quantitative trait loci or eQTLs. These eQTLs control the behavior of genes like a thermostat regulates the temperature of a home. GTEx studies found that the number of eQTLs varies from person to person and from tissue to tissue. Researchers also discovered eQTLs act in different ways. Some eQTLs may affect a set of genes in one tissue, while other eQTLs affect genes in many tissues.
The GTEx consortium has also built an eQTL web-browser (http://www.gtexportal.org/home/) to help visualize and discover new relationships between genes and the DNA that affects them. This website provides a resource for the many researchers who are exploring the human genome. Understanding how the eQTLs change gene behavior in different tissues can help us understand how diseases develop in people. This knowledge, in turn, may help us develop new therapies and treatments, improving our health overall.
Progress
As of December 2015, GTEx finished enrollment of the additional donors, for a total of 961 donors. Analysis of the samples and data will continue for another 18 months. Over 30,000 samples have been collected.
In fall of 2015, information on gene expression for over 450 donors was released to the scientific community through the database of Genotype and Phenotype (dbGaP). Additionally, the new version of the GTEx Genome Browser has been launched and features new visualization tools.
In 2014, The National Institutes of Health awarded eight new grants to researchers to use tissues donated to GTEx to explore how human genes are expressed and regulated in different tissues.
In 2020, the GTEx Consortium published its final set of studies analyzing genotype data from approximately 948 post-mortem donors and approximately 17,382 RNA-seq samples across 54 tissue sites and 2 cell lines, with adequate power to detect Expression Quantitative Trait Loci in 48 tissues.
Progress
As of December 2015, GTEx finished enrollment of the additional donors, for a total of 961 donors. Analysis of the samples and data will continue for another 18 months. Over 30,000 samples have been collected.
In fall of 2015, information on gene expression for over 450 donors was released to the scientific community through the database of Genotype and Phenotype (dbGaP). Additionally, the new version of the GTEx Genome Browser has been launched and features new visualization tools.
In 2014, The National Institutes of Health awarded eight new grants to researchers to use tissues donated to GTEx to explore how human genes are expressed and regulated in different tissues.
In 2020, the GTEx Consortium published its final set of studies analyzing genotype data from approximately 948 post-mortem donors and approximately 17,382 RNA-seq samples across 54 tissue sites and 2 cell lines, with adequate power to detect Expression Quantitative Trait Loci in 48 tissues.
Social Media
Engage
GTEx Portal on Twitter
Program Staff
Simona Volpi, Ph.D.
Program Director
Division of Genomic Medicine
Related Projects
Research Funding
Developmental Genotype-Tissue Expression (dGTEx)
Current Slide
Research Funding
Developmental Genotype-Tissue Expression (dGTEx)
Current Slide
Research Funding
Developmental Genotype-Tissue Expression (dGTEx)
Last updated: September 24, 2020
Get Updates
Enter your email address to receive updates about the latest advances in genomics research.
Subscribe
Social Media Stream
Footer Links
Contact
Accessibility
Site Map
Staff Search
Plug-Ins Used by HHS
FOIA
Privacy
Copyright
HHS Vulnerability Disclosure
GTEx Portal
PortalWe're sorry but gtex doesn't work properly without JavaScript enabled. Please enable it to contin2小时搞定TCGA+GTEx联合分析,多1分钟算我输 - 知乎
2小时搞定TCGA+GTEx联合分析,多1分钟算我输 - 知乎切换模式写文章登录/注册2小时搞定TCGA+GTEx联合分析,多1分钟算我输益加医益加医——专注做医学科研与临床技能培训视频分享传播的医学公众号需要脚本文件的点击下面附件~——TCGA+GTEx联合分析脚本文件——.docx190.7K · 百度网盘导语通常我们在挖掘TCGA数据库的时候,会发现该项目纳入的正常组织测序结果是非常少的,也就是说很多病人都不会有他的正常组织的转录组测序结果比如说乳腺癌吧,1200个左右的转录组数据,其中1100左右都是肿瘤组织的测序数据,只有区区100个左右的正常对照。这个时候我们就需要想办法加大正常组织测序样本量,既然TCGA数据库没有,我们就从其他数据库着手。这里值得大力推荐的是GTEx数据库 ,Genotype-Tissue Expression (GTEx)1 数据准备GTEx(Genotype-Tissue Expression,基因型-组织表达)数据库,研究从来自449名生前健康的人类捐赠者的7000多份尸检样本,涵盖44个组织(42个不同的组织类型),包括31个实体器官组织、10个脑分区、全血、2个来自捐赠者血液和皮肤的细胞系,作者利用这些样本研究基因表达在不同组织和个体中有何差异。数据下载直接在GTEx官网下载,网站会较难进入,我们可以在UCSC xene网站对GTEx及TCGA的数据进行下载。首先,点击Launch Xena,进入到数据下载页面,然后点击上方的DATA STES,进入到数据集页面。在数据集页面,就包括有TCGA,TARGET及GTEx等多个数据库的界面。点击GTEX,进入到GTEX的数据下载页面。需要下载FPKM文件及表型文件。以FPKM文件为例,直接点击TOIL RSEM fpkm,进入到下载页面当中,然后点击下载栏的链接,就可以开始下载。同样的,表型文件的下载方法也是一样的。TCGA的数据,在数下载页面有两个,一个是GDC TCGA,一个是TCGA,一般选择GDC TCGA进行下载。进入GDC TCGA后,界面和GTEX的类似,不过表型文件,包括两个,一个是表型文件,一个是临床数据,这两个数据在后续分析中均会用到。表达文件同样也是下载FPKM文件。数据下载完成后,就可以进行数据的整理了。首先对GTEX的数据进行ID转换,首先将下载的压缩包进行解压,然后直接用脚本进行ID转换,注释文件为human.gtf,方法和我们之前对TCGA进行ID转换类似,通过命令提示符进行脚本的运行。运行结束后,会在文件夹中新生成一个GTExSymbol的文件。即转换后的文件。由于GTEX是对所有的组织的样本进行的测序,所以我们需要提取对应的组织样本的的表达信息。样本信息可以直接从之前下载的样本文件获得。解压后打开。在site中找到对应组织,然后将选好的样本编号放到新建的TXT文档中。然后运行脚本,将我们所需要的样本的表达数据提取出来。并且给出样本的数目。样本数目需要记住,因为后期差异分析需要用到。接下来,就可以整理TCGA文件了,将TCGA的FPKM压缩包解压,然后用perl处理文件。运行完成后,会给出正常样本和肿瘤样本的数目。和GTEX不同的是,需要在perl脚本后加上需要处理的文件的名字。整理完后的TCGA文件,会将正常样本和肿瘤样本分开。然后对TCGA的数据进行ID转换,方法和之前的TCGA方法转换基本相同。准备好注释文件human.gtf及脚本GTEx.symbol.pl。然后通过命令提示符运行脚本。这个脚本的名称和之前GTEx的ID转换脚本名称相同,但是脚本内容不同,在TCGA中,不需要对FPKM进行+1处理,而GTEX数据由于原始的FPKM是没有进行+1的,所以在ID转换时,进行了FPKM+1的处理。GTEx和TCGA的数据都整理好时候,就可以对GTEX和TCGA的数据进行合并了。输入文件包括两个,一个是GTEX中提取的数据文件和TCGA转换后的文件。数据的合并是采用的R语言,修改路径后直接运行即可。运行结束后,在文件夹中会生成一个新的mere文件,即为合并后的GTEX和TCGA的合并文件。有了这个文件,就可以进行后续的差异分析等步骤了。2GTEX图形绘制因为GTEX是对人体中各个组织的表达数据,因此我们可以统计基因在每个组织中的表达量,因此我们可以绘制解剖图,箱型图等图形。首先统计每个组织中的表达情况。首先准备好GTEX的表型文件及基因表达文件。对表型文件进行整理,将表型文件中的病人ID,组织及性别复制到一个新的txt文档中。文档命名为site,因为后续脚本会识别文件名称。准备好位点文件和表达文件后,就可以运行脚本,对表达文件和位点文件进行合并了,并输出后续绘图所需的文件。因为绘制解剖图,只能针对某一个基因绘制,因此我们在合并时需要输入基因名称,这个基因一定要存在于表达文件中,并且要保持名字和表达文件中的名字一致。比如TP53。运行完成后会将男性和女性分别生存一个文件,并生存一个表达和位点的合并文件。接下来,就可以绘制解剖图和箱图了。解剖图包括两个,一个是男性的,一个女性的。修改脚本中的运行路径,直接运行即可。直接运行脚本,就可以看到TP53基因在各个组织中的表达情况了。随后,我们还可以绘制TP53在各个组织中的表达箱图。同样的,修改路径后直接运行脚本即可。3差异分析差异分析所用到的文件就是之前合并好的merge文件。这里要注意修改正常样本和肿瘤样本,其中正常样本应该是TCGA正常样本数加GTEX正常样本数。运行结束后,会和我们之前做差异分析的结果一样,会给出差异表达文件,差异基因表达值文件等等。然后我们就可以绘制常见的表达热图。热图绘制修改好路径及样本数目后,直接运行脚本即可。差异分析后,我们就可以进行生存分析,一次性将所有的差异基因的生存分析结果进行输出,首先准备生存分析所需的文件。生存文件从之前下载的TCGA生存文件下载下来就可以了。仅保存生存状态,生存时间和样本ID。然后把表头进行一下修改,把生存时间挪动到第二列。其中生存状态1表示死亡,0表示存活。将整理好的文件重新复制粘贴到新建的一个time.txt文件中。这样生存分析所需的文件都准备好了。接下来就可以进行临床数据和表达数据的合并了。然后通过命令提示符,运行GTEx.mergeExpTime.pl脚本。运行完成后,就可以获得合并后的文件。获得合并后的文件后,就可以对差异基因进行批量的生存分析了。就可以获得所有差异基因的生存曲线了,但是图片只生存生存显著性p<0.05图片,同时会生存一个survival文件,该文件就包括差异基因的生存p值。4功能分析首先进行ID转换,转换方法跟之前分享过的方法是一样的,将Genesymbol和logFC粘贴到新的txt文档中,然后运行R脚本。获得转换后的ID后,就可以进行GO和KEGG富集分析,并生存GO和KEGG的富集图。这里提供了两种图形输出的脚本,一个是输出常见的柱状图和气泡图,这两个图形采用GO和KEGG脚本即可。另外一种,则是输出GO和KEGG的圈图。全图脚本中是没有富集的脚本的,但是出图时需要富集结果,所以单独绘制圈图的时候,需要先进行GO和KEGG富集,并生成相关文件,富集脚本参考上一个主图和气泡图绘制的脚本即可。本文由公众号益加医原创,如需转载请在公众号后台回复“转载”即可。需要脚本文件的点击下面附件~——TCGA+GTEx联合分析脚本文件——.docx190.7K · 百度网盘编辑于 2020-08-10 16:23医学教育临床医学医学院赞同 9649 条评论分享喜欢收藏申请
GTEx数据库简介(3):数据的获取 - 知乎
GTEx数据库简介(3):数据的获取 - 知乎切换模式写文章登录/注册GTEx数据库简介(3):数据的获取HuaMD医学大数据分享医学大数据知识----医学大数据及其综合分析(四)Hua+医学大数据 出品(转载请注明出处链接,翻版必究)(HuaPlusMD通过整合多种人类和动物数据库,建立了可靠的大数据库,为您提供疾病动物模型和临床大数据综合分析。链接:https://www.huaplusmd.com)前言:“大数据”概念早已出现,目前我们对(医学)大数据了解有多少呢?本平台将对医学大数据进行系统的介绍,并对大数据综合分析进行分享(每周更新)。分享的内容将主要涵盖大数据库(基因、蛋白数据库等)/生物银行介绍(UK Biobank, Finnish Biobanks, China Kadoorie Biobank, BioBank Japan, TCGA, GWAS catalog等),疾病动物模型数据库(如GeneNetwork, BXD),大数据库的综合使用(如Mendelian randomization),组学数据分析等。同时也会定期对一些医学大数据的使用进行实例分析。(分享的其他系列内容请见:https://www.huaplusmd.com/knowledge) 本期将对GTEx的数据下载和使用进行简介。GTEx的主要优势是:可以获取人类各种组织器官的基因表达。一般当我们做研究或药物开发时,往往希望药物/干预发生在特定的组织器官,降低副作用。例如,关于肥胖研究,我们往往会将研究的重点放在脂肪组织。而目前大多数数据库,并不能获取特异组织表达器官的基因表达,尤其是人类数据库,可谓非常难得。· 如何获得GTEx数据库的数据:ü 打开GTEx Portal: https://gtexportal.org/home/点击download >>Open Access Dataü 进入下载页面,如下图所示。在左侧(红框中),我们可以看到不同的分析版本,我们都可以用,但推荐使用V8 和V9。其中V9目前只提供snRNA-Seq data(单细胞核RNA测序技术)和Long Read RNASeq data(长读转录组,这个转录组主要是研究遗传变异在转录副本结构中的作用)。ü 这里重点说一下V8版的数据,如下图。V8数据主要有:1) RNAseq的BAM文件,全外显子Seq,全基因组Seq2) 基因型Calls3) OMNI SNP 阵列文件4) Affymetrix表达阵列, 等ü 注释文件(Annotations):下载红框的文件就可以,主要是介绍样本的基本信息,包括样本ID,组织器官类型,RIN,测试使用的技术。ü RNAseq数据:也是我们最常使用的数据。包括Read counts, TPM, Exon-exon junction read counts, transcript read count/TPM, Exon read counts。数据也可以分组织进行下载(有read counts 和 TPM两种数据)。ü 另外,GTEx还做了很多的QTL分析(不了解QTL的同学,请翻书到前面 eQTL, cis-eQTL, trans-eQTL介绍和获取):包括Single-Tissue cis-QTL Data,Single-Tissue trans-QTL Data,Multi-Tissue QTL Data,Single Tissue cis-RNA Editing QTL Data等等--------------end--------------—如果喜欢,快分享给你的朋友们吧—关注公众号,更多精彩内容等着你!原文链接:https://www.huaplusmd.com/knowledgeHua+医学大数据 出品 (医学大数据综合分析,HuaPlusMD坚持专业和认真)。如果您有医学大数据综合分析方面需求欢迎联系我们:https://www.huaplusmd.com/往期回顾:医学大数据及其综合分析(总纲)医学大数据及其综合分析(一)—— GEO数据库介绍 (1)医学大数据及其综合分析(一)—— GEO数据库介绍 (2)医学大数据及其综合分析(二)—— BXD小鼠数据库介绍 (1)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (2)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (3)医学大数据及其综合分析(二)—— BXD小鼠数据库/GeneNetwork介绍 (4)医学大数据及其综合分析(三)—— eQTLGen Consortium数据库简介(1)医学大数据及其综合分析(三)—— eQTLGen Consortium数据库简介(2)医学大数据及其综合分析(四)—— GTEx数据库简介(1)医学大数据及其综合分析(四)—— GTEx数据库简介(2)医学大数据及其综合分析(五)---- 国际原子能机构“双标水”数据库 (IAEA DLW)医学大数据及其综合分析(X)—— 实例分析1:中年发福:人体代谢率 不背此锅新冠肺炎(COVID-19)的致死率参考文献:[1] https://gtexportal.org/home/发布于 2022-12-21 10:09・IP 属地加拿大数据库数据获取赞同 113 条评论分享喜欢收藏申请
Genotype-Tissue Expression (GTEx) | NIH Common Fund
Genotype-Tissue Expression (GTEx) | NIH Common Fund
Skip to main content
An official website of the United States government
Here's how you know
Here's how you know
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS
A lock (
Lock
Locked padlock
) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.
Header Top Menu
National Institutes of Health
Division of Program Coordination Planning and Strategic Initiatives (DPCPSI)
Search the Common Fund Website test
Sitemap
Home
Sitemap
Subscribe
Our Programs
Current Programs
4D Nucleome (4DN)
Acute to Chronic Pain Signatures (A2CPS)
Bridge to Artificial Intelligence (Bridge2AI)
Cellular Senescence Network (SenNet)
Common Fund Data Ecosystem (CFDE)
Community Partnerships to Advance Science for Society (ComPASS)
Complement Animal Research In Experimentation (Complement-ARIE)
Diversity Program Consortium (DPC): Enhancing the Diversity of the NIH-Funded Workforce
Extracellular RNA Communication (ExRNA)
Faculty Institutional Recruitment for Sustainable Transformation (FIRST)
Gabriella Miller Kids First (Kids First)
Global Health
Harnessing Data Science for Health Discovery and Innovation in Africa (DS-I Africa)
High-Risk, High-Reward Research (HRHR)
Human BioMolecular Atlas Program (HuBMAP)
Human Virome Program
Illuminating the Druggable Genome (IDG)
Metabolomics
Molecular Transducers of Physical Activity in Humans (MoTrPAC)
Nutrition for Precision Health, powered by the All of Us Research Program
Somatic Cell Genome Editing (SCGE)
Somatic Mosaicism across Human Tissues (SMaHT)
Stimulating Peripheral Activity to Relieve Conditions (SPARC)
Transformative High-Resolution Cryoelectron Microscopy (CryoEM)
Transformative Research to Address Health Disparities and Advance Health Equity
Venture Program
Archived Initiatives
Advancing Health Communication Science and Practice
Big Data to Knowledge (BD2K)
Bioinformatics and Computational Biology
Bridging Interventional Development Gaps (BrIDGs)
Building Blocks, Biological Pathways and Networks (BBPN)
Clinical Research Policy Analysis and Coordination (CRpac)
Clinical and Translational Science Awards (CTSAs)
Epigenomics
Genotype-Tissue Expression (GTEx)
Glycoscience
Gulf Oil Spill
Healthcare Systems Research (HCS) Collaboratory
Health Economics
Human Microbiome Project (HMP)
Interdisciplinary Research (IR)
Knockout Mouse Phenotyping Program (KOMP2)
Library of Integrated Network-based Cellular Signatures (LINCS)
Molecular Libraries and Imaging
Nanomedicine
National Electronics Clinical Trials and Research (NECTAR)
New Models of Data Stewardship (NMDS)
NIH Medical Research Scholars Program (MRSP)
Patient-Reported Outcomes Measurement Information System (PROMIS)
Protein Capture Reagents Program (PCRP)
Regenerative Medicine Program (RMP)
Regulatory Science
Science of Behavior Change (SOBC)
Single Cell Analysis Program (SCAP)
Strengthening the Biomedical Research Workforce
Structural Biology
Undiagnosed Diseases Network (UDN)
COVID-19 Research
Sex as a Biological Variable
Research Funding
Funding Opportunities
Funding Policy
Administrative Supplements
News & Media
Recent News & Videos
Science Highlights
News
Press Releases
Archives
Videos
Accessible Videos
Strategic Planning
Planning Process
Updates
Criteria
Reports
Evaluation & Assessment
Evaluation Report Library
Presentations
BEST Data
About Us
Who We Are & What We Do
History
Congressional Budget Requests
Office of Strategic Coordination
OSC Contacts
Careers
Genotype-Tissue Expression Program (GTEx)
Genotype-Tissue Expression Program (GTEx)
Breadcrumb
Home
Genotype-Tissue Expression (GTEx)
GTEx
Genotype-Tissue Expression Program (GTeX)
For the Public
Health Relevance
Science Highlights
For Researchers
Funding Opportunities
Funded Research
NIH Working Group
Program Publications
Scientific Meetings
Program Resources
Program Snapshot
The Common Fund's Genotype-Tissue Expression (GTEx) Program established a data resource and tissue bank to study the relationship between genetic variants (inherited changes in DNA sequence) and gene expression (how genes are turned on and off) in multiple human tissues and across individuals. GTEx also increased our understanding of how gene expression varies between male and female.
The GTEx program has transitioned from Common Fund support. Common Fund programs are strategic investments that achieve a set of high-impact goals within a 5-10 year timeframe. At the conclusion of each program, deliverables will transition to other sources of support or use within the scientific community.
The GTEx program supported by the Common Fund from 2010 to 2019. Currently, GTEx data are widely used as a reference dataset to design new methods and tools, such as a statistical method called PrediXcan. This novel method is used to predict the expression of a gene using DNA sequence data. PrediXcan also predicts visible traits of diseases. GTEx researchers used this method to identify specific genes associated with five diseases: bipolar disorder, coronary artery disease, Crohn's disease, rheumatoid arthritis and type 1 diabetes. The GTEx’s final dataset (V8) contains DNA data from 838 postmortem donors and 17,382 RNA-seq across 54 tissue sites and two cell lines. GTEx data is accessible through the National Center for Biotechnology Information’s database of Genotypes and Phenotypes (dbGaP), the National Human Genome Research Institute's (NHGRI) Genomic Analysis and Visualization and Informatics Labspace (AnVIL) and GTEx Portal. GTEx resources are valuable tools for exploring the impact of genetic variation on complex traits and diseases.
Program Major Accomplishments
Highlights of the Genotype-Tissue Expression (GTEx) Program major accomplishments are:
Established a comprehensive catalog of genetics variants that effect gene expression across multiple tissue for the research community to evaluate tissue-specific gene expression and regulation in many different tissues. Genetic variants that influence how genes behave are called expression quantitative trait loci (eQTLs). Researchers are using GTEx data to enhance the functional interpretation of genome-wide association study (GWAS) findings from and identification of disease-relevant genes.
Created an online data resource (GTEx Portal) for storing, cataloging, searching, and sharing aggregated level data. Researchers used data from the GTEx Portal to publish over 7,000 papers.
GTEx data was integrated into genomics browsers including the UCSC Genome Browser and Ensembl to visualize gene and variant information.
Developed a biobank of tissue biospecimens (e.g. lung, brain, pancreas, skin, etc) as well as RNA, DNA, blood samples and cell lines from ~960 donors. The GTEx biobank also features an image library of the tissue samples for researchers to browse the complete collection. These biospecimens are stored at the Broad Institute of Harvard and MIT.
Please note that since the GTEx program is no longer supported by the Common Fund, the program website is being maintained as an archive and will not be updated on a regular basis.
Video
Watch a video on the GTEx project for more details.
The GTEx (Genotype-Tissue Expression) Project identified genetic variants that influence how genes are turned on and off in human tissues and organs. Genetic variants that influence how genes behave are called expression quantitative trait loci (eQTLs). These eQTLs regulate the behavior of genes like a light-switch turns on a light in a room. A GTEx pilot study found that the number of eQTLs differ in multiple tissues and individuals.
GTEx collected multiple human tissues (i.e. brain, heart, lung, breast, skin and whole blood etc.) from ~960 donors and over 30,000 samples. These tissues and samples are stored through the National Cancer Institute's Cancer Human Biobank initiative on behalf of GTEx. The GTEx database is available to researchers through the GTEx Portal. GTEx is helping researchers understand the inherited susceptibility to common diseases such as cancer, heart disease, Parkinson’s and diabetes.
GTEx also included a study to understand the ethical, legal and social issues (ELSI) related to donor recruitment and consent to tissue donation for biobanking purposes. In 2017, the GTEx ELSI researchers published a paper in Genetic Testing and Molecular Biomarkers. The findings indicated that a clear discussion about risks and benefits associated with participation in biobanking research is needed during the consent process.
Program Initiatives
The GTEx Program supported the following initiatives:
Online data resource (GTEx Portal) for storing, cataloging, searching, and sharing aggregated level data
Novel Statistical Methods for Human Gene Expression Quantitative Trait Loci (eQTL) Analysis
Laboratory, Data Analysis, and Coordinating Center (LDACC) for acquiring and analyzing DNA and RNA from multiple human tissues
Enhanced GTEx projects: including additional dimensions beyond gene expression to the GTEx data
Annoucements
Expanding Our View of The Genomic Landscape Using the Genotype-Tissue Expression (GTEx) Data Set
GTEx Data Set Used to Study Biological Changes After Death
GTEx Creates a Reference Data Set to Study Genetic Changes and Gene Expression
GTEx Data Uncovering How Genetic Alterations Contribute to Schizophrenia
GTEx Dataset Helps Determine How Gene Duplications Lead to Genes with New Biological Functions
The GTEx version 8 is now available
The GTEx Portal has been updated to data release V8 (dbGaP accession phs000424.v8.p2)! This release includes genotype data from approximately 948 post-mortem donors and approximately 17,382 RNA-seq samples across 54 tissue sites and 2 cell lines, with adequate power to detect Expression Quantitative Trait Loci in 48 tissues. Full gene expression datasets are available for download through the GTEx Portal while genotypes and RNA-seq bam files are available via dbGaP.
Genotype-Tissue Expression Project (GTEx) Biospecimens Access Policy
The policy is a mechanism to allow researchers access to tissues in the GTEx biobank. The policy and related forms can be found on the GTEx Portal. Go directly to GTEx Sample Request Forms.
This page last reviewed on
January 8, 2024
Footer
Home
Our Programs
Research Funding
News & Media
Strategic Planning
Evaluation & Assessment
About Us
Sitemap
Connect
Footer Secondary Menu
NIH.gov
Home
Visitor Information
Frequently Asked Questions
HHS.gov
Freedom of Information Act
No Fear Act
Office of the Inspector General
HHS Vulnerability Disclosure
Web Policies and Notices
USA.gov
government made easy
NIH... Turning Discovery Into Health ®
National Institutes of Health, 9000 Rockville Pike, Bethesda, Maryland 20892 U.S. Department of Health and Human Services
GTEx:基因型和基因表达量关联数据库-腾讯云开发者社区-腾讯云
:基因型和基因表达量关联数据库-腾讯云开发者社区-腾讯云生信修炼手册GTEx:基因型和基因表达量关联数据库关注作者腾讯云开发者社区文档建议反馈控制台首页学习活动专区工具TVP最新优惠活动文章/答案/技术大牛搜索搜索关闭发布登录/注册首页学习活动专区工具TVP最新优惠活动返回腾讯云官网生信修炼手册首页学习活动专区工具TVP最新优惠活动返回腾讯云官网社区首页 >专栏 >GTEx:基因型和基因表达量关联数据库GTEx:基因型和基因表达量关联数据库生信修炼手册关注发布于 2019-12-19 10:50:507.7K0发布于 2019-12-19 10:50:50举报文章被收录于专栏:生信修炼手册生信修炼手册GTEx全称如下Genotype-Tissue Expression该项目对来自人体多个组合和器官的样本,同时进行了转录组测序和基因分型分析,构建了一个组织特异性的基因表达和调控的数据库。网址如下https://gtexportal.org/home/包含的组织类型和样本个数如下图所示对于所有的样本,主要进行了以下三种分析1. RNA seq通过illumina Truseq试剂盒构建polyA+文库,采用Hiseq 2000/2500进行测序,对于下机数据,采用STAR进行比对,参照选择的是gencode V19版本的gtf文件,进行了以下3个level的定量gene-level,采用RNAseQC软件,对基因的raw count和TPM两种方式进行定量exon-level, 对exon的raw count进行定量transcript-level,采用RSEM进行转录本水平的定量2. genotype通过WGS对样本进行分型, 采用的是GATK germline variants calling的流程,步骤如下bwa-mem alignmentpicard markduplicateBQSRindel realignhaplotypeCaller3. eQTL通过FastQTL软件进行cis-eQTL分析,将基因型和基因表达量进行关联。通过官网可以查看基因表达量和eQTL分析的结果,以TP53为例,每个基因给出了以下3个层级的表达量Isoform ExpressionExon ExpressionJunction Expression分别对应转录本,外显子,剪切序列的表达量,对于不同组织中的表达量,以热图的形式进行展示,示意如下对于基因结构,也进行了可视化,示意如下eQTL的结果示意如下提供了以下两种可视化方式,第一种是在单个组织内的小提琴图,eQTL violin plot, 示意如下第二种用于多个组织间的比较,Multi-tissue eQTL plot, 示意如下所有的分析结果可以通过官网进行下载,GTEx数据库不仅仅是一个正常组织的基因表达量数据库,其eQTL分析的策略更值得我们借鉴。本文参与 腾讯云自媒体分享计划,分享自微信公众号。原始发表:2019-08-13,如有侵权请联系 cloudcommunity@tencent.com 删除express数据库sql本文分享自 生信修炼手册 微信公众号,前往查看如有侵权,请联系 cloudcommunity@tencent.com 删除。本文参与 腾讯云自媒体分享计划 ,欢迎热爱写作的你一起参与!express数据库sql评论登录后参与评论0 条评论热度最新登录 后参与评论推荐阅读LV.关注文章0获赞0目录1. RNA seq2. genotype3. eQTL相关产品与服务数据库云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!产品介绍2024新春采购节领券社区专栏文章阅读清单互动问答技术沙龙技术视频团队主页腾讯云TI平台活动自媒体分享计划邀请作者入驻自荐上首页技术竞赛资源技术周刊社区标签开发者手册开发者实验室关于社区规范免责声明联系我们友情链接腾讯云开发者扫码关注腾讯云开发者领取腾讯云代金券热门产品域名注册云服务器区块链服务消息队列网络加速云数据库域名解析云存储视频直播热门推荐人脸识别腾讯会议企业云CDN加速视频通话图像分析MySQL 数据库SSL 证书语音识别更多推荐数据安全负载均衡短信文字识别云点播商标注册小程序开发网站监控数据迁移Copyright © 2013 - 2024 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有 深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档Copyright © 2013 - 2024 Tencent Cloud.All Rights Reserved. 腾讯云 版权所有登录 后参与评论00
GTEx联合TCGA数据库差异分析(更新) – 王进的个人网站
>GTEx联合TCGA数据库差异分析(更新) – 王进的个人网站
Skip to content
王进的个人网站
NO PAINS, NO GAINS.
首页
实验方法
分子生物学
CRISPR-Cas9
动物实验
细胞生物学
其他
常用软件
科研软件
图片处理
Image J
统计软件
Graphpad
SPSS
办公软件
小工具
其他
生信分析
ggplot2
R语言
生信资料
Linux系统
其他
新药研发
精彩生活
我的爱情
我爱罗
宝贝yiyi
科研互助
科研互助群
B站代码获取
我的简历
给我留言
GTEx联合TCGA数据库差异分析(更新)
Home20223月16GTEx联合TCGA数据库差异分析(更新)
Posted on 2022-03-162023-09-03
GTEx(Genotype-Tissue Expression,基因型-组织表达)数据库,研究从来自449名生前健康的人类捐献者的7000多份尸检样本,涵盖44个组织(42种不同的组织类型),包括31个实体器官组织、10个闹分区、全血、2个来自捐献者血液和皮肤的细胞系,作者利用这些样本研究基因表达在不同组织和个体中有何差异。
GTEx对几乎所有转录基因的基因表达模式进行了观察,从而能够确定基因组中影响基因表达的特定区域。
此外,合并GTEx与TCGA数据库数据能够有效解决TCGA数据库中正常组织样本量不足的缺陷,从而提高比较的准确性。
1. 数据来源
tcga_RSEM_gene_tpm.gz TCGA_phenotype_denseDataOnlyDownload.tsv.gzgtex_RSEM_gene_tpm.gz GTEX_phenotpye.gz gencode.v23.annotation.gene.probemapsamplepair.txt(TCGA和GTeX sample信息)
(数据比较大,如果下载困难可以留言)
2. 注释来自 TCGA 和 GTEx 的样本
library(stringr)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(data.table)
#################======= step1: clean GTEx pheno data =======#################
gtex <- read.table("samplepair.txt",header=T,sep='\t')
tcga_ref <- gtex[,1:2]
gtex$type <- paste0(gtex$TCGA,"_normal_GTEx")
gtex$sample_type <-"normal"
gtex <- gtex[,c("TCGA","GTEx","type","sample_type")]
names(gtex)[1:2] <- c("tissue","X_primary_site")
gp <- read.delim(file="GTEX_phenotype.gz",header=T,as.is = T)
gtex2tcga <- merge(gtex,gp,by="X_primary_site")
gtex_data <- gtex2tcga[,c(5,2:4)]
names(gtex_data)[1] <- "sample"
#write.table(gtex_data,"GTEx_pheno.txt",row.names=F,quote=F,sep='\t')
#################======= step2: clean a TCGA pheno data =======#################
tcga <- read.delim(file="TCGA_phenotype_denseDataOnlyDownload.tsv.gz",header=T,as.is = T)
tcga <- merge(tcga_ref,tcga,by.y="X_primary_disease",by.x="Detail",all.y = T)
tcga <- tcga[tcga$sample_type %in% c("Primary Tumor","Solid Tissue Normal"),]
tcga$type <- ifelse(tcga$sample_type=='Solid Tissue Normal',
paste(tcga$TCGA,"normal_TCGA",sep="_"),paste(tcga$TCGA,"tumor_TCGA",sep="_"))
tcga$sample_type <- ifelse(tcga$sample_type=='Solid Tissue Normal',"normal","tumor")
tcga<-tcga[,c(3,2,6,5)]
names(tcga)[2] <- "tissue"
#write.table(tcga,"tcga_pheno.txt",row.names = F,quote=F,sep='\t')
#################======= step3: remove samples without tpm data =======############
gtex_exp <- fread("gtex_RSEM_gene_tpm.gz",data.table = F)
gtexS <- gtex_data[ gtex_data$sample%in%colnames(gtex_exp)[-1],]
tcga_exp <- fread("tcga_RSEM_gene_tpm.gz",data.table = F)
tcgaS <- tcga[tcga$sample %in%colnames(tcga_exp)[-1],]
tcga_gtex <- rbind(tcgaS,gtexS)
write.table(tcga_gtex,"tcga_gtex_sample.txt",row.names = F,quote=F,sep='\t')
3. 提取感兴趣的基因
library(stringr)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(data.table)
library(tibble)
rm(list = ls())
options(stringsAsFactors = FALSE)
target <- "YTHDC2"
idmap <- read.delim("gencode.v23.annotation.gene.probemap",as.is=T)
tcga_exp <- fread("tcga_RSEM_gene_tpm.gz",data.table = F)
gtex_exp <- fread("gtex_RSEM_gene_tpm.gz",data.table=F)
tcga_gtex <- read.table("tcga_gtex_sample.txt",sep='\t',header = T)
id <- idmap$id[which(idmap$gene==target)]
tcga_data <- t(tcga_exp[tcga_exp$sample==id,colnames(tcga_exp)%in%c("sample",tcga_gtex$sample)])
tcga_data <- data.frame(tcga_data[-1,])
tcga_data <- rownames_to_column(tcga_data,"sample")
names(tcga_data)[2] <- "tpm"
gtex_data <- t(gtex_exp[gtex_exp$sample==id,colnames(gtex_exp)%in%c("sample",tcga_gtex$sample)])
gtex_data <- data.frame(gtex_data[-1,])
gtex_data <- rownames_to_column(gtex_data,"sample")
names(gtex_data)[2] <- "tpm"
tmp <- rbind(tcga_data,gtex_data)
exp <- merge(tmp,tcga_gtex,by="sample",all.x=T)
exp <- exp[,c("tissue","sample_type","tpm")]
exp <- arrange(exp,tissue)
write.table(exp,"Merge gene expression/YTHDC2 expression.txt",row.names = F,quote=F,sep='\t')
4. 可视化基因表达
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
rm(list = ls())
options(stringsAsFactors = FALSE)
exp <- read.table("Merge gene expression/YTHDC2 expression.txt",header=T,sep='\t')
ylabname <- paste("YTHDC2", "expression")
colnames(exp) <- c("Tissue", "Group", "Gene")
p1 <- ggboxplot(exp, x = "Tissue", y = "Gene", fill = 'Group',
ylab = ylabname,
color = "Group",
palette = c("#00AFBB", "#FC4E07"),
ggtheme = theme_minimal())
##计算每种肿瘤正常和肿瘤组织的样本量
count_N<-exp %>% group_by(Tissue, Group) %>% tally
count_N$n <- paste("n =",count_N$n)
##添加N = 到图中
p1 <-p1+geom_text(data=count_N, aes(label=n, y=-9,color=Group), position=position_dodge2(0.9),size = 3,angle=90, hjust = 0)+
theme(axis.text.x = element_text(angle = 45,hjust = 1.2))
#计算t检验显著性
comp<- compare_means(Gene ~ Group, group.by = "Tissue", data = exp,
method = "t.test", symnum.args = list(cutpoints = c(0,0.001, 0.01, 0.05, 1), symbols = c( "***", "**", "*", "ns")),
p.adjust.method = "holm")
#添加显著性标记
p2 <- p1 + stat_pvalue_manual(comp, x = "Tissue", y.position = 7.5,
label = "p.signif", position = position_dodge(0.8))
p2
#dev.off()
##保存图片
### pdf version
ggsave("figure/pancancer_Plot.pdf", width = 14, height = 5)
### png version
#png("figure/pancancer_Plot.png", width = 465, height = 225, units='mm', res = 300)
代码参考GitHub:https://github.com/cmutd/TCGA_GTEx
B站视频更新代码(从Xena下载的数据),其余代码一样:
exp <- read.table("YTHDC2.tsv",header=T,sep='\t')
exp <- exp[c(1,3)]
exp <- merge(exp, tcga_gtex,by="sample")
colnames(exp)[c(2,3,5)] <- c("Gene","Tissue","Group")
ylabname <- paste("YTHDC2", "expression")
exp <- exp %>% plotly::filter(Gene != -9.966)
打赏赞(69)微海报分享
By 进哥哥
R语言Tags: GTEx, R语言, TCGA
文章导航
批量重命名文件名(cmd和Excel)GraphPad Prism8实现重复测量数据方差分析
194 Replies to “GTEx联合TCGA数据库差异分析(更新)”
Comment navigation
Older Comments
东东说道:
2023-12-17 09:03
王老师,您好。我想请教下 tcga_RSEM_gene_tpm.gz 这个数据从哪儿下载的,如果我想换成count数据怎么办?
回复
Nan说道:
2023-10-12 22:40
王博您好,你的这个教程十分受用,非常棒。我用同样的代码换个基因运行后出现一个问题,部分肿瘤正常组织箱式图的75%下限位于-10以下,图片中不显示出来。这个是什么原因呢?
回复
进哥哥说道:
2023-10-15 13:20
您好,基因的表达太低,又有些缺失值,在这个数据里面也就是-9.9658,删掉看看
回复
南闾说道:
2023-10-12 13:52
进哥能详细讲述一下文章开头处数据来源如何下载的吗?
回复
进哥哥说道:
2023-10-15 13:24
XENA上下载的,也有链接,我B站上有更加方便的方法,就是最下面的视频,你看看先
回复
慧说道:
2023-09-14 13:24
您好,这个文件( samplepair.txt)在那儿里下载呀,我下载不下载fpkm值改成count值可以吗
回复
RONG说道:
2023-09-12 15:11
谢谢!
回复
RONG说道:
2023-09-11 19:08
您好,请问这里是否需要去除批次效应?
回复
进哥哥说道:
2023-09-12 10:18
3大数据库超2万RNA-seq数据重新统一处理——关于TCGA-GTEx是否需要标准化 – 王进的个人网站
https://www.jingege.wang/2023/05/24/3%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%ba%93%e8%b6%852%e4%b8%87rna-seq%e6%95%b0%e6%8d%ae%e9%87%8d%e6%96%b0%e7%bb%9f%e4%b8%80%e5%a4%84%e7%90%86-%e5%85%b3%e4%ba%8etcga-gtex%e6%98%af%e5%90%a6/
这里有解释过的,已经是处理过后的数据
回复
梪子说道:
2023-09-07 14:56
博主,您好,我想问一下TCGA 和 GTEx 样本的TPM在做差异分析之前是否需要进行去批次效应,我这边进行去批次效应后表达出现了负值,无法进行后续分析
回复
进哥哥说道:
2023-09-12 10:33
3大数据库超2万RNA-seq数据重新统一处理——关于TCGA-GTEx是否需要标准化 – 王进的个人网站
https://www.jingege.wang/2023/05/24/3%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%ba%93%e8%b6%852%e4%b8%87rna-seq%e6%95%b0%e6%8d%ae%e9%87%8d%e6%96%b0%e7%bb%9f%e4%b8%80%e5%a4%84%e7%90%86-%e5%85%b3%e4%ba%8etcga-gtex%e6%98%af%e5%90%a6/
已经是去过批次的,另外有负值也是正常的 用的是log转化的数据
回复
何说道:
2023-09-06 18:27
进哥您好,萌新一枚,请问我运行您视频第54行代码时提示Error in ggboxplot(exp, x = “Tissue”, y = “Gene”, fill = “Group”, ylab = ylabname, :
could not find function “ggboxplot”
请问是什么原因呢
回复
何说道:
2023-09-06 20:29
解决啦,谢谢进哥
回复
进哥哥说道:
2023-09-12 10:37
加载一下那个包library(ggpubr)
回复
何说道:
2023-09-06 18:24
进哥您好,我运行您视频中得第54行时提示Error in ggboxplot(exp, x = “Tissue”, y = “Gene”, fill = “Group”, ylab = ylabname, : could not find function “ggboxplot”请问是什么原因呢
回复
进哥哥说道:
2023-09-12 10:38
library(ggpubr)
加载一下包
回复
任阿七说道:
2023-09-03 16:29
您好,请问差异分析不是一般用counts数据么?为什么这里用的是tpm数据呢?
回复
进哥哥说道:
2023-09-04 17:06
不是呀,Count是没有进行标准化的数据,对于Count,用DESeq2进行差异分析
如果获得的是TPM或FPKM,也可以用Limma进行差异分析
对于单个基因的分析,秩和检验或t检验都可以,这个针对你的分析要求决定,方法中说明即可
回复
李说道:
2023-06-26 19:28
博主您好,想问一下评论里说的按肿瘤分割好的网盘链接在哪里呀?我似乎没有找到网盘的链接。
回复
进哥哥说道:
2023-06-29 15:30
底下old comments
重新发一下:https://pan.baidu.com/s/17blyTZb-Kni8u9yqIsweyA?pwd=eqsp
提取码:eqsp
回复
李说道:
2023-06-29 21:09
非常感谢
回复
沈一说道:
2023-06-12 12:29
王博士您好,我想问一下如果那个样本信息是如何得到的呢?
回复
进哥哥说道:
2023-06-13 13:17
您好,请问指的是这个吗?https://xenabrowser.net/datapages/?dataset=TCGA_GTEX_category.txt&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
您看一下
回复
沈一说道:
2023-06-15 11:05
感谢您的回复~我想应该不是这个,是这个文件里面的信息samplepair.txt。
回复
进哥哥说道:
2023-06-19 12:02
OK 这个可以自己下载每个癌症的样本信息 自己整合
回复
阿飞说道:
2023-05-30 11:32
请问网盘链接是被删除了么,没有看到
回复
进哥哥说道:
2023-06-01 07:28
前面文件名上直接到xena的下载链接
回复
孙西瓜说道:
2023-05-25 06:04
请问一下,看您的数据用的tpm格式的,差异分析的时候不是一般用count值吗,为什么这里用了tpm呢。谢谢您的解答
回复
孙西瓜说道:
2023-05-24 19:57
请问,这些数据是从USCS上下载的吗,但是我看着说这上面数据更新比较慢,和tcga数据库比数据可能不一样
回复
进哥哥说道:
2023-05-24 21:17
是的 存在滞后,如果一定需要最新数据进行分析,那就需要自行下载了
回复
ggboy说道:
2023-04-17 21:51
要如何把单一肿瘤的数据拆分出来
回复
进哥哥说道:
2023-04-19 10:53
底下评论有网盘链接 我分割好的
回复
杨贤森说道:
2023-04-16 16:11
博主,您好,我自己写代码分析了Gtex和TCGA数据库的数据。首先将二者转换成TPM数据类型,然后对二者分别进行了normalize,接着进行了去批次,最后是对数据进行了调整,绘制单基因泛癌表达。但得到的部分样本的表达量小于0,是负值。然后网上有幸搜到您的代码,自己运行了一遍,发现跑出来的结果和我的类似,也会有一部分样本的表达量为负值,这样的图觉得很奇怪。请问,这种情况应该怎么调整?
回复
进哥哥说道:
2023-04-17 08:52
您好,如果使用的是log2转换之后的数据,必然会有负值,也就是TPM介于0-1的,这是正常的,而TPM本身不会有负值
实际TPM=0的经过log2(TPM+0.001)之后数值为-9.8.。。。如果不想纳入表达为0的 可以将这些数据删掉
回复
Comment navigation
Older Comments
发表评论 取消回复邮箱地址不会被公开。 必填项已用*标注评论 名称 *
电子邮件 *
站点
在此浏览器中保存我的姓名、电子邮件和站点地址。
Δ
Search for:
Search
关于我
王 进(Jingle)
本网站主要用于个人科研方法整理以及生活分享,欢迎各位留言一起学习探讨,共同进步。如果想更多的了解我,欢迎查看我的简历。
很多留言不能及时给大家回复讨论,深感歉意!现在太忙了,如果有急需要讨论合作的可以直接加微信,也可以进科研互助群讨论。
近期文章
24年新版TCGA GDC data portal 2.0界面介绍及数据下载教程
单因素/多因素Logistic回归模型基本介绍及SPSS/GraphPad分析步骤
更新:转录因子靶基因多数据库预测在线工具(主要针对KnockTF数据库)
CRISPRi和CRISPRa:基因表达干预的新利器
2016-2023年NSFC国家自然科学基金信息App
近期评论柠檬酸合酶发表在《m6A-IP(MeRIP)-qPCR计算相对表达量》j发表在《2016-2023年NSFC国家自然科学基金信息App》尹发表在《给我留言》张张发表在《亚硫酸盐的测序法(bisulfite sequencing PCR,BSP)》山东大学王永亮发表在《DNAMAN 9.0 | 分子生物学应用软件神器》标签COX
CRISPR-Cas9
Cytoscape
DNA甲基化
endnote
GEO
ggplot2
Graphpad
GTEx
IC50
Image J
Linux
lncRNA
m6A
miRNA
Motif
PCR
PD1/PDL1
PubMed
pull-down
R语言
SCI写作
Shiny
shRNA
SPSS
TCGA
Western Blot
免疫浸润
免疫组化
基因敲除
基金写作
实验动物
富集分析
引物
慢病毒
新药
流式
热图
爬虫
科研热点
类器官
网络
肺癌
衰老
转录因子
分类目录分类目录
选择分类目录
Uncategorized (4)
实验方法 (204)
CRISPR-Cas9 (13)
其他 (34)
写作投稿 (13)
分子生物学 (126)
动物实验 (16)
细胞生物学 (40)
常用软件 (104)
Graphpad (14)
Image J (19)
SPSS (8)
其他 (4)
办公软件 (8)
图片处理 (22)
小工具 (29)
科研软件 (29)
统计软件 (15)
新药研发 (16)
生信分析 (196)
Linux系统 (5)
Python (2)
R语言 (138)
其他 (13)
机器学习 (2)
生信资料 (68)
精彩生活 (33)
宝贝yiyi (20)
我爱罗 (7)
功能
登录
条目feed
评论feed
WordPress.org
文章归档 文章归档
选择月份
2024年2月
2024年1月
2023年12月
2023年11月
2023年9月
2023年8月
2023年7月
2023年6月
2023年5月
2023年4月
2023年3月
2023年2月
2023年1月
2022年12月
2022年11月
2022年10月
2022年9月
2022年8月
2022年7月
2022年6月
2022年5月
2022年4月
2022年3月
2022年2月
2022年1月
2021年12月
2021年11月
2021年10月
2021年9月
2021年8月
2021年7月
2021年6月
2021年5月
2021年4月
2021年3月
2021年2月
2021年1月
2020年12月
2020年11月
2020年10月
2020年9月
2020年8月
2020年7月
2020年6月
2020年5月
个人风采
Copyright © 2024 王进的个人网站. All Rights Reserved | 备案号:苏ICP备14058221号| Blog Diary by Theme Palace
TCGA的28篇教程-GTEx数据库-TCGA数据挖掘的好帮手-腾讯云开发者社区-腾讯云
的28篇教程-GTEx数据库-TCGA数据挖掘的好帮手-腾讯云开发者社区-腾讯云生信技能树TCGA的28篇教程-GTEx数据库-TCGA数据挖掘的好帮手关注作者腾讯云开发者社区文档建议反馈控制台首页学习活动专区工具TVP最新优惠活动文章/答案/技术大牛搜索搜索关闭发布登录/注册首页学习活动专区工具TVP最新优惠活动返回腾讯云官网生信技能树首页学习活动专区工具TVP最新优惠活动返回腾讯云官网社区首页 >专栏 >TCGA的28篇教程-GTEx数据库-TCGA数据挖掘的好帮手TCGA的28篇教程-GTEx数据库-TCGA数据挖掘的好帮手生信技能树关注发布于 2018-12-19 11:26:568.9K0发布于 2018-12-19 11:26:56举报文章被收录于专栏:生信技能树生信技能树通常我们在挖掘TCGA数据库的时候,会发现该项目纳入的正常组织测序结果是非常少的,也就是说很多病人都不会有他的正常组织的转录组测序结果,比如说乳腺癌吧,1200个左右的转录组数据,其中1100左右都是肿瘤组织的测序数据,只有区区100个左右的正常对照。这个时候我们就需要想办法加大正常组织测序样本量,既然TCGA数据库没有,我们就从其他数据库着手。这里值得大力推荐的是GTEx数据库 ,Genotype-Tissue Expression (GTEx)背景知识一期2015年,GTEx发布了第一个阶段性成果,一次性在Science杂志上发表三篇研究成果,该成果还被选为封面文章。GTEx的研究从175名死者身上采集到了1641个尸检样本,这些样本来自54个不同的身体部位,对几乎所有转录基因的基因表达模式进行了观察,从而够确定基因组中影响基因表达的特定区域。另外两篇文章之一从人所有组织中的基因表达谱进行了描述,证明了组织特异性的某些基因往往决定了组织特异性基因的表达调控;另一篇解释了截短的蛋白变异体如何影响组织中的基因表达。The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humansThe human transcriptome across tissues and individualsEffect of predicted protein-truncating genetic variants on the human transcriptome二期在2017年,一次性在nature发表4篇研究成果,GTEx研究联盟的研究收集并研究了来自449名生前健康的人类捐献者的7000多份尸检样本,涵盖44个组织(42种不同的组织类型),包括31个实体器官组织、10个脑分区、全血、两个来自捐献者血液和皮肤的细胞系,作者利用这些样本研究基因表达在不同组织和个体中有何差异。题为“Landscape of X chromosome inactivation across human tissues”和“Dynamic landscape and regulation of RNA editing in mammals”的论文,采用GTEx数据探讨了与基因表达相关联的基因变异如何能够调节RNA编辑和X染色体失活现象。数据库内容介绍通常是直接去 https://gtexportal.org/ 找到可以下载的数据集,如下:其中,对我们来说最重要的就是 表达矩阵, 可以下载图中 gene read counts 这个496M的文件,表达矩阵里面的样本ID肯定是数据库组织者自定义的,所以我们还需要找到样本ID的注释信息。更多的是关于这个数据库的网页使用介绍,我们生信工程师通常不需要,就不赘述了。注意一下 数据库的版本信息:The current release is V7 including 11,688 samples, 53 tissues and 714 donors首先看数据库的注释信息重点是: # SMTS Tissue Type, area from which the tissue sample was taken.# SMTSD Tissue Type, more specific detail of tissue type复制可以看到每个样本属于哪一种组织,这样就方便提取他们的信息来辅助自己的研究。把 gene read counts 这个496M的表达矩阵导入R:if(F){
options(stringsAsFactors = F)
GTEx=read.table('~/Downloads/GTEx_Analysis_2016-01-15_v7_RNASeQCv1.1.8_gene_reads.gct.gz'
,header = T,sep = '\t',skip = 2)
GTEx[1:4,1:4]
h=head(GTEx)
save(h,file = 'GTEx_head.Rdata')
}复制挑选感兴趣的组织的表达矩阵上面我们详细了解了不同样本注释到的组织,所以代码很简单: load('~/Desktop/GTEx_all.Rdata')
a[1:4,1:4]
colnames(a)
# SMTS Tissue Type, area from which the tissue sample was taken.
# SMTSD Tissue Type, more specific detail of tissue type
b=read.table('GTEx_v7_Annotations_SampleAttributesDS.txt',
header = T,sep = '\t',quote = '')
table(b$SMTS)
breat_gtex=a[,gsub('[.]','-',colnames(a)) %in% b[b$SMTS=='Breast',1]]
rownames(breat_gtex)=a[,1]
dat=breat_gtex复制就是把属于breast这个组织的样本名挑选出来,在上面的表达矩阵里面取子集即可。值得注意的是这个时候的表达矩阵基因名不是symbol,是需要进行ID转换的,代码如下:dat=breat_gtex
ids=a[,1:2]
head(ids)
colnames(ids)=c('probe_id','symbol')
dat=dat[ids$probe_id,]
dat[1:4,1:4]
ids$median=apply(dat,1,median)
ids=ids[order(ids$symbol,ids$median,decreasing = T),]
ids=ids[!duplicated(ids$symbol),]
dat=dat[ids$probe_id,]
rownames(dat)=ids$symbol
dat[1:4,1:4]
breat_gtex=dat
save(breat_gtex,file = 'breat_gtex_counts.Rdata')复制表达矩阵如下所示:正常乳腺组织样本表达矩阵可以进行的分析通常情况下应该是去和肿瘤数据进行分析,那样的分析就多元化了,这里来个简单点的,可以进行pam50分类:if(T){
ddata=t(dat)
ddata[1:4,1:4]
s=colnames(ddata);head(s)
library(org.Hs.eg.db)
s2g=toTable(org.Hs.egSYMBOL)
g=s2g[match(s,s2g$symbol),1];head(g)
# probe Gene.symbol Gene.ID
dannot=data.frame(probe=s,
"Gene.Symbol" =s,
"EntrezGene.ID"=g)
ddata=ddata[,!is.na(dannot$EntrezGene.ID)]
dannot=dannot[!is.na(dannot$EntrezGene.ID),]
head(dannot)
library(genefu)
# c("scmgene", "scmod1", "scmod2","pam50", "ssp2006", "ssp2003", "intClust", "AIMS","claudinLow")
s<-molecular.subtyping(sbt.model = "pam50",data=ddata,
annot=dannot,do.mapping=TRUE)
table(s$subtype)
tmp=as.data.frame(s$subtype)
subtypes=as.character(s$subtype)
}
library(genefu)
pam50genes=pam50$centroids.map[c(1,3)]
pam50genes[pam50genes$probe=='CDCA1',1]='NUF2'
pam50genes[pam50genes$probe=='KNTC2',1]='NDC80'
pam50genes[pam50genes$probe=='ORC6L',1]='ORC6'
x=dat
x=x[pam50genes$probe[pam50genes$probe %in% rownames(x)] ,]
tmp=data.frame(subtypes=subtypes)
rownames(tmp)=colnames(x)
library(pheatmap)
pheatmap(x,show_rownames = T,show_colnames = F,
annotation_col = tmp,
filename = 'ht_by_pam50_raw.png')
x=t(scale(t(x)))
x[x>1.6]=1.6
x[x< -1.6]= -1.6
pheatmap(x,show_rownames = T,show_colnames = F,
annotation_col = tmp,
filename = 'ht_by_pam50_scale.png') 复制单独取出pam50包含的50个基因的表达矩阵进行热图聚类:由上图可以看到不同基因的表达量是 差异很大的,通常我们不会去比较不同基因的表达量,而只是比较同一个基因在不同样本的表达量差异的。所以我们没有必要去看不同基因的表达量高低,那么就可以进行一定程度的归一化,重新绘图如下:可以很明显的看到哪怕是对正常组织的转录组测序结果走pam50的分类也是可以拿到各种各样的分类结果的。但是pam50的分类是在乳腺癌患者的芯片表达矩阵进行训练的模型,是因为我们用错了地方,可以看看在METEBRIC里面的分类结果:上面的分类是pam50算法的结果,下面的分类是临床信息。可以看到basal的结果还是很统一的,而且都比较符合TNBC的定义,就是PGR,ESR1,ERBB2都表达量都很低。如果真的要把GTEx数据库的转录组表达矩阵和TCGA的进行比较,还需要一定程度的去除批次效应。我以前在生信技能树多次讲解,这里也不再赘述。本文参与 腾讯云自媒体分享计划,分享自微信公众号。原始发表:2018-11-27,如有侵权请联系 cloudcommunity@tencent.com 删除数据库数据挖掘其他express本文分享自 生信技能树 微信公众号,前往查看如有侵权,请联系 cloudcommunity@tencent.com 删除。本文参与 腾讯云自媒体分享计划 ,欢迎热爱写作的你一起参与!数据库数据挖掘其他express评论登录后参与评论0 条评论热度最新登录 后参与评论推荐阅读LV.关注文章0获赞0目录背景知识一期二期数据库内容介绍首先看数据库的注释信息挑选感兴趣的组织的表达矩阵正常乳腺组织样本表达矩阵可以进行的分析相关产品与服务数据库云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!产品介绍2024新春采购节领券社区专栏文章阅读清单互动问答技术沙龙技术视频团队主页腾讯云TI平台活动自媒体分享计划邀请作者入驻自荐上首页技术竞赛资源技术周刊社区标签开发者手册开发者实验室关于社区规范免责声明联系我们友情链接腾讯云开发者扫码关注腾讯云开发者领取腾讯云代金券热门产品域名注册云服务器区块链服务消息队列网络加速云数据库域名解析云存储视频直播热门推荐人脸识别腾讯会议企业云CDN加速视频通话图像分析MySQL 数据库SSL 证书语音识别更多推荐数据安全负载均衡短信文字识别云点播商标注册小程序开发网站监控数据迁移Copyright © 2013 - 2024 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有 深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档Copyright © 2013 - 2024 Tencent Cloud.All Rights Reserved. 腾讯云 版权所有登录 后参与评论00