kg_covid_19.transform_utils.ttd package

Submodules

kg_covid_19.transform_utils.ttd.ttd module

exception kg_covid_19.transform_utils.ttd.ttd.TTDNotEnoughFields

Bases: Exception

class kg_covid_19.transform_utils.ttd.ttd.TTDTransform(input_dir: str = None, output_dir: str = None)

Bases: kg_covid_19.transform_utils.transform.Transform

get_gene_name(data: dict) → str
get_targ_type(data: dict) → str
get_uniproids(data: dict, name_2_id_map: dict, uniprot_curie_prefix: str) → List[str]
parse_line(line: str, id_sep='; ') → list

Parse one line of data from P1-01-TTD_target_download, and return list comprised of:

[target_id, abbrev, data_list]

where: target_id is the target_id abbrev is a member of ‘TARGETID’, ‘FORMERID’, etc] (see above) data_list is a list of all items in field3 … last field, split on ‘ ‘

Parameters
  • line – line from P1-01-TTD_target_download

  • id_sep – character string that separates ID strings, as in ID1; ID2 [“; “]

Returns

[target_id, abbrev, data_list]

parse_ttd_file(file: str) → dict

Parse entire TTD download file (a few megs, not very mem efficient, but should be okay), and return a dict of dicts of lists

[target_id] -> [abbreviation] -> [list with data]

where ‘abbreviation’ is one of: [‘TARGETID’, ‘FORMERID’, ‘UNIPROID’, ‘TARGNAME’, ‘GENENAME’, ‘TARGTYPE’,

‘SYNONYMS’, ‘FUNCTION’, ‘PDBSTRUC’, ‘BIOCLASS’, ‘ECNUMBER’, ‘SEQUENCE’, ‘DRUGINFO’, ‘KEGGPATH’, ‘WIKIPATH’, ‘WHIZPATH’, ‘REACPATH’, ‘NET_PATH’, ‘INTEPATH’, ‘PANTPATH’, ‘BIOCPATH’]

:param file :return: dict of dicts of lists

run(data_file: Optional[str] = None)

Module contents