kg_covid_19.utils package¶

Submodules¶

kg_covid_19.utils.download_utils module¶

kg_covid_19.utils.download_utils.download_from_api(yaml_item, outfile) → None¶

Args:: yaml_item: item to be download, parsed from yaml outfile: where to write out file

Returns:

kg_covid_19.utils.download_utils.download_from_yaml(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None¶

Given an download info from an download.yaml file, download all files

Args:: yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]
Returns:: None.

kg_covid_19.utils.download_utils.elastic_search_query(es_connection, index, query, scroll: str = '1m', request_timeout: int = 60, preserve_order: bool = True)¶

Fetch records from the given URL and query parameters.

Args:: es_connection: elastic search connection index: the elastic search index for query query: query scroll: scroll parameter passed to elastic search request_timeout: timeout parameter passed to elastic search preserve_order: preserve order param passed to elastic search
Returns:: All records for query

kg_covid_19.utils.transform_utils module¶

exception kg_covid_19.utils.transform_utils.ItemInDictNotFound¶

Bases: kg_covid_19.utils.transform_utils.TransformError

Raised when the input value is too small

exception kg_covid_19.utils.transform_utils.TransformError¶

Bases: Exception

Base class for other exceptions

kg_covid_19.utils.transform_utils.collapse_uniprot_curie(uniprot_curie: str) → str¶

Given a UniProtKB curie for an isoform such as UniprotKB:P63151-1 or UniprotKB:P63151-2, collapse to parent protein (UniprotKB:P63151 / UniprotKB:P63151)

Parameters: uniprot_curie –
Returns: collapsed UniProtKB ID

kg_covid_19.utils.transform_utils.data_to_dict(these_keys, these_values) → dict¶

Zip up two lists to make a dict

Parameters

these_keys – keys for new dict
these_values – values for new dict

Returns

dictionary

kg_covid_19.utils.transform_utils.get_header_items(table_data: Any) → List¶

Utility fxn to get header from (first page of) a table.

Args:: table_data: Data, as list of dicts from tabula.io.read_pdf().
Returns:: header_items: An array of header items.

kg_covid_19.utils.transform_utils.get_item_by_priority(items_dict: dict, keys_by_priority: list) → str¶

Retrieve item from a dict using a list of keys, in descending order of priority

Parameters

items_dict –
keys_by_priority – list of keys to use to find values

Returns

str: first value in dict for first item in keys_by_priority

that isn’t blank, or None

kg_covid_19.utils.transform_utils.guess_bl_category(identifier: str) → str¶

Guess category for a given identifier.

Note: This is a temporary solution and should not be used long term.

Args:: identifier: A CURIE
Returns:: The category for the given CURIE

kg_covid_19.utils.transform_utils.multi_page_table_to_list(multi_page_table: Any) → List[Dict]¶

Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.

Args:: multi_page_table:
Returns:: table_data: A list of dicts, where each dict is item from one row.

kg_covid_19.utils.transform_utils.parse_header(header_string: str, sep: str = '\t') → List¶

Parses header data.

Args:: header_string: A string containing header items. sep: A string containing a delimiter.
Returns:: A list of header items.

kg_covid_19.utils.transform_utils.ungzip_to_tempdir(gzipped_file: str, tempdir: str) → str¶

kg_covid_19.utils.transform_utils.uniprot_make_name_to_id_mapping(dat_gz_file: str) → dict¶

Given a Uniprot dat.gz file, like this: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz

makes dict with name to id mapping

Parameters: dat_gz_file –
Returns: dict with mapping

kg_covid_19.utils.transform_utils.uniprot_name_to_id(name_to_id_map: dict, name: str) → Optional[str]¶

Uniprot name to ID mapping

Parameters

name_to_id_map – mapping dict[name] -> id
name – name

Returns

id string, or None

kg_covid_19.utils.transform_utils.unzip_to_tempdir(zip_file_name: str, tempdir: str) → None¶

kg_covid_19.utils.transform_utils.write_node_edge_item(fh: Any, header: List, data: List, sep: str = '\t')¶: Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]

Module contents¶

kg_covid_19.utils.download_from_yaml(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None¶

Given an download info from an download.yaml file, download all files

Args:: yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]
Returns:: None.

kg_covid_19.utils.multi_page_table_to_list(multi_page_table: Any) → List[Dict]¶

Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.

Args:: multi_page_table:
Returns:: table_data: A list of dicts, where each dict is item from one row.

kg_covid_19.utils.write_node_edge_item(fh: Any, header: List, data: List, sep: str = '\t')¶: Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]