Open Access

Identifying Scientific Project-generated Data Citation from Full-text Articles: An Investigation of TCGA Data Citation


Cite

Figure 1

Computational workflow for identifying TCGA data usage.
Computational workflow for identifying TCGA data usage.

Figure 2

Number of TCGA-related publications in PMC.
Number of TCGA-related publications in PMC.

Figure 3

Geographical distribution of TCGA-related publications.
Geographical distribution of TCGA-related publications.

Figure 4

Distribution of TCGA cancer types.
Distribution of TCGA cancer types.

Figure 5

Distribution of the TCGA high-throughput platform.
Distribution of the TCGA high-throughput platform.

Figure 6

Manual link literature that includes TCGA data.
Manual link literature that includes TCGA data.

Examples of TCGA cancer-type concepts.

Concept IDNameTCGA defined terms [abbr] – [full name]SynonymsDO mapping
D0001GlioblastomaGBM – Glioblastoma MultiformeGlioblastoma, GBM, adult glioblastoma multiforme, primary glioblastoma multiforme, spongioblastoma multiformeDOID: 3068
D0002Breast cancerBRCA – Breast Invasive CarcinomaBreast cancer, breast tumor, breast neoplasm, mammary cancer, mammary tumor, mammary neoplasm, malignant tumor of breast,DOID: 1612
D0003Ovarian cancerOV – Ovarian Serous CystadenocarcinomaOvarian cancer, ovarian tumor, ovarian neoplasm, ovary cancer, ovary tumor, ovary neoplasm, malignant tumor of ovaryDOID: 2394
D0004Acute myeloid leukemiaLAML – Acute Myeloid LeukemiaAcute myeloid leukemia, AML, acute myeloblastic leukemia, acute myelogenous leukemiaDOID: 9119

Distribution of TCGA key terms in full-text articles.

FeatureRetrieved PMC article set (%)Benchmark dataset (%)
TCGA term positonTitle14
Abstract1128
Introduction/Background1220
Method/Material3168
Result7496
Discussion/Conclusion2036
TCGA related conceptCancer type mention73100
mentionPlatform mention6696

Examples of TCGA high-throughput platform concepts.

Concept IDNameTCGA-defined termsGenerated data
P0001RNASeqIlluminaGA_RNASeq,Nucleotide sequence, gene expression
IlluminaHiSeq_RNASeq
P0002miRNASeqIlluminaGA_miRNASeqmiRNAs, microRNA, microRNA sequence
P0003SNPGenome_Wide_SNPSNPs, single nucleotide polymorphisms, CNV, copy number variation
P0004MethylationHuman methylationDNA methylation
eISSN:
2543-683X
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining