Skip to content

Phenotype in huang1 is (if you believe the authors): "DNA replication stress" #6

@jaredroach

Description

@jaredroach

The authors don't claim to be studying "stress" (GO:0006950)
triple_object:
value_for_encoding: GO:0006950
Rather, the authors claim to be studying "DNA replication stress". These are distinct concepts, and one is not a subset of the other.

But one cannot accept the authors implied intent of being specific.
If you look at Table S1 in this paper, the authors list all the gene sets they think are related to "DNA replication stress". These pathways are:

KEGG_HOMOLOGOUS_RECOMBINATION
REACTOME_ACTIVATION_OF_ATR_IN_RESPONSE_TO_REPLICATION_STRESS
REACTOME_E2F_ENABLED_INHIBITION_OF_PRE_REPLICATION_COMPLEX_FORMATION
REACTOME_CELL_CYCLE
REACTOME_MITOTIC_SPINDLE_CHECKPOINT
REACTOME_G2_M_DNA_DAMAGE_CHECKPOINT
REACTOME_HOMOLOGOUS_DNA_PAIRING_AND_STRAND_EXCHANGE
REACTOME_HDR_THROUGH_SINGLE_STRAND_ANNEALING_SSA
REACTOME_FORMATION_OF_SENESCENCE_ASSOCIATED_HETEROCHROMATIN_FOCI_SAHF
REACTOME_CYCLIN_A_B1_B2_ASSOCIATED_EVENTS_DURING_G2_M_TRANSITION
REACTOME_CHK1_CHK2_CDS1_MEDIATED_INACTIVATION_OF_CYCLIN_B_CDK1_COMPLEX
REACTOME_CONVERSION_FROM_APC_C_CDC20_TO_APC_C_CDH1_IN_LATE_ANAPHASE
REACTOME_DISEASES_OF_MISMATCH_REPAIR_MMR
REACTOME_G1_S_DNA_DAMAGE_CHECKPOINTS
REACTOME_RECOGNITION_OF_DNA_DAMAGE_BY_PCNA_CONTAINING_REPLICATION_COMPLEX
WHITFIELD_CELL_CYCLE_M_G1
REACTOME_DNA_REPLICATION_PRE_INITIATION
REACTOME_FORMATION_OF_TC_NER_PRE_INCISION_COMPLEX
REACTOME_DNA_DAMAGE_RECOGNITION_IN_GG_NER
KEGG_PYRIMIDINE_METABOLISM
REACTOME_FANCONI_ANEMIA_PATHWAY

The result is that the phenotype the authors are studying is very broad (e.g., the list includes "cell cycle").

This makes it difficult to extract useful triples from most of the Supplemental Tables.

Table S17 might be parsable. It is a list of differentially expressed genes in a meta-analysis of 5 prostate cancer cohorts. The genes in Table 17 are objects, the predicate is "differentially expressed in" and the object is "prostate cancer". The authors don't explicitly say what these cohorts are, but they are presumably these: "Somatic mutation and copy number alteration data were retrieved from The Cancer Genome Atlas (TCGA) database (http://cancergenome.nih.gov/). mRNA expression data and corresponding clinical information of the TCGA-PRAD cohort (n = 488) were downloaded from XENA (https://xena.ucsc.edu/) in November 2020. A total of 4 external validating datasets including GSE70769 (n = 92), GSE70768 (n = 111), GSE94767 (n = 132), and DKFZ-PRAD (n = 82) were collected from the Gene Expression Omnibus (GEO,"

These Ns sum to 905, corresponding to the existing
sample_size:
encoding_method: value
value_for_encoding: 905

In Table S6, the pvalue is in column E, not F
p_value:
encoding_method: column_of_values
value_for_encoding: F

In Table S17, the pvalue is in column G, not H
p_value:
encoding_method: column_of_values
value_for_encoding: H

TL/DR: maybe skip this paper entirely, or just parse S17 asa list of genes differentially expressed in prostate cancer.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions