Datasets

SEA Libraries

SEA Libraries consist of the following files

    <library_id>_compounds.Rdata: Compound informaiton Rdata file with a data.frame the following columns
       compound: ZINC ID (e.g. ZINC57058)
       smiles: character string representing the molecule (e.g. NCCc1c[nH]c2ccc(O)cc12)
       data: Information about the molecule collected from ChEMBL, ZINC and depicted with CACTVS
       
    <library_id>_sets.Rdata: Target set information Rdata file with a data.frame with the following columns
       target: Identifier for the target (e.g. DRD2_HUMAN)
       name: Target name
       compound: compound identifier
       affinity: affinity threshold
       description: target description
       
    <library_id>.fit: Parameters for the null background model

    <library_id>.sea: SEA library file

All datasets are prepared with using the BioChemPantry R package.

Library ID	Library Source	Library Description	Download Link
chembl27_rdkit_ecfp4	ChEMBL 27	Uniprot Entry to Zinc ID with Extended Connectivity Fingerprint with diameter 4 (Hashed) fingerprints	Download

SEA Scores

SEA Scores consist of all associations between targets in the reference library vs. targets in the query library

The results table contains the following columns

  
    target1: Uniprot Entry of reference target (e.g. DRD2_HUMAN)
    entrez_id1: Gene id of reference target (e.g. 1813)
    gene_name1: Gene name of reference target (e.g. DRD2)
    description1: Description of reference target (e.g. D(2) dopamine receptor)
    target1_class_1: ChEMBL class_1 for reference target (e.g. Membrane receptor)
    target1_class_2: ChEMBL class_2 for reference target (e.g. 7TM1)
    target1_class_3: ChEMBL class_3 for reference target (e.g. SmallMol)
    target1_class_4: ChEMBL class_4 for reference target (e.g. Monoamine receptor)
    target1_class_5: ChEMBL class_5 for reference target (e.g. Dopamine receptor)
    target1_class_6: ChEMBL class_5 for reference target (e.g. Dopamine receptor)
    target2: Uniprot Entry of query target (e.g. ESR1_HUMAN)
    entrez_id2: Gene id of query target (e.g. 2099)
    gene_name2: Gene name of query target (e.g. ESR1)
    description2: Description of query target (e.g. Estrogen receptor)
    target2_class_1: ChEMBL class_1 for query target (e.g. Transcription Factor)
    target2_class_2: ChEMBL class_2 for query target (e.g. Nuclear Receptor)
    target2_class_3: ChEMBL class_3 for query target (e.g. NR3)
    target2_class_4: ChEMBL class_4 for query target (e.g. NR3A)
    target2_class_5: ChEMBL class_5 for query target (e.g. NR3A1)
    target2_class_6: ChEMBL class_5 for query target (e.g. NR3A1)
    MaxTC: Maximum tanimoto similarity between compounds from ref_target to compounds from query_target in [0,1] with 1 being identical upto the the resolution of the fingerprint
    Zscore: Raw Z-score under the reference SEA background null model
    Pvalue: P-value of Z-score
    Qvalue: FDR corrected P-value for significance across all comparisons
    Evalue: Bonferroni correction for significance across all comparisons

Reference Library	Query Library	Fingerprint Type	Download Link
chembl27_rdkit_ecfp4	chembl27_rdkit_ecfp4	Extended Connectivity Fingerprint (Path-based, Bits, Hashed, radius 2)	Download