kg_covid_19.transform_utils.ttd package¶
Submodules¶
kg_covid_19.transform_utils.ttd.ttd module¶
-
exception
kg_covid_19.transform_utils.ttd.ttd.
TTDNotEnoughFields
¶ Bases:
Exception
-
class
kg_covid_19.transform_utils.ttd.ttd.
TTDTransform
(input_dir: str = None, output_dir: str = None)¶ Bases:
kg_covid_19.transform_utils.transform.Transform
-
get_gene_name
(data: dict) → str¶
-
get_targ_type
(data: dict) → str¶
-
get_uniproids
(data: dict, name_2_id_map: dict, uniprot_curie_prefix: str) → List[str]¶
-
parse_line
(line: str, id_sep='; ') → list¶ Parse one line of data from P1-01-TTD_target_download, and return list comprised of:
[target_id, abbrev, data_list]
where: target_id is the target_id abbrev is a member of ‘TARGETID’, ‘FORMERID’, etc] (see above) data_list is a list of all items in field3 … last field, split on ‘ ‘
- Parameters
line – line from P1-01-TTD_target_download
id_sep – character string that separates ID strings, as in ID1; ID2 [“; “]
- Returns
[target_id, abbrev, data_list]
-
parse_ttd_file
(file: str) → dict¶ Parse entire TTD download file (a few megs, not very mem efficient, but should be okay), and return a dict of dicts of lists
[target_id] -> [abbreviation] -> [list with data]
where ‘abbreviation’ is one of: [‘TARGETID’, ‘FORMERID’, ‘UNIPROID’, ‘TARGNAME’, ‘GENENAME’, ‘TARGTYPE’,
‘SYNONYMS’, ‘FUNCTION’, ‘PDBSTRUC’, ‘BIOCLASS’, ‘ECNUMBER’, ‘SEQUENCE’, ‘DRUGINFO’, ‘KEGGPATH’, ‘WIKIPATH’, ‘WHIZPATH’, ‘REACPATH’, ‘NET_PATH’, ‘INTEPATH’, ‘PANTPATH’, ‘BIOCPATH’]
:param file :return: dict of dicts of lists
-
run
(data_file: Optional[str] = None)¶
-