kg_covid_19.utils package¶
Submodules¶
kg_covid_19.utils.download_utils module¶
-
kg_covid_19.utils.download_utils.
download_from_api
(yaml_item, outfile) → None¶ - Args:
yaml_item: item to be download, parsed from yaml outfile: where to write out file
Returns:
-
kg_covid_19.utils.download_utils.
download_from_yaml
(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None¶ Given an download info from an download.yaml file, download all files
- Args:
yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]
- Returns:
None.
-
kg_covid_19.utils.download_utils.
elastic_search_query
(es_connection, index, query, scroll: str = '1m', request_timeout: int = 60, preserve_order: bool = True)¶ Fetch records from the given URL and query parameters.
- Args:
es_connection: elastic search connection index: the elastic search index for query query: query scroll: scroll parameter passed to elastic search request_timeout: timeout parameter passed to elastic search preserve_order: preserve order param passed to elastic search
- Returns:
All records for query
kg_covid_19.utils.transform_utils module¶
-
exception
kg_covid_19.utils.transform_utils.
ItemInDictNotFound
¶ Bases:
kg_covid_19.utils.transform_utils.TransformError
Raised when the input value is too small
-
exception
kg_covid_19.utils.transform_utils.
TransformError
¶ Bases:
Exception
Base class for other exceptions
-
kg_covid_19.utils.transform_utils.
collapse_uniprot_curie
(uniprot_curie: str) → str¶ Given a UniProtKB curie for an isoform such as UniprotKB:P63151-1 or UniprotKB:P63151-2, collapse to parent protein (UniprotKB:P63151 / UniprotKB:P63151)
- Parameters
uniprot_curie –
- Returns
collapsed UniProtKB ID
-
kg_covid_19.utils.transform_utils.
data_to_dict
(these_keys, these_values) → dict¶ Zip up two lists to make a dict
- Parameters
these_keys – keys for new dict
these_values – values for new dict
- Returns
dictionary
-
kg_covid_19.utils.transform_utils.
get_header_items
(table_data: Any) → List¶ Utility fxn to get header from (first page of) a table.
- Args:
table_data: Data, as list of dicts from tabula.io.read_pdf().
- Returns:
header_items: An array of header items.
-
kg_covid_19.utils.transform_utils.
get_item_by_priority
(items_dict: dict, keys_by_priority: list) → str¶ Retrieve item from a dict using a list of keys, in descending order of priority
- Parameters
items_dict –
keys_by_priority – list of keys to use to find values
- Returns
str: first value in dict for first item in keys_by_priority
that isn’t blank, or None
-
kg_covid_19.utils.transform_utils.
guess_bl_category
(identifier: str) → str¶ Guess category for a given identifier.
Note: This is a temporary solution and should not be used long term.
- Args:
identifier: A CURIE
- Returns:
The category for the given CURIE
-
kg_covid_19.utils.transform_utils.
multi_page_table_to_list
(multi_page_table: Any) → List[Dict]¶ Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.
- Args:
multi_page_table:
- Returns:
table_data: A list of dicts, where each dict is item from one row.
-
kg_covid_19.utils.transform_utils.
parse_header
(header_string: str, sep: str = '\t') → List¶ Parses header data.
- Args:
header_string: A string containing header items. sep: A string containing a delimiter.
- Returns:
A list of header items.
-
kg_covid_19.utils.transform_utils.
ungzip_to_tempdir
(gzipped_file: str, tempdir: str) → str¶
-
kg_covid_19.utils.transform_utils.
uniprot_make_name_to_id_mapping
(dat_gz_file: str) → dict¶ Given a Uniprot dat.gz file, like this: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz
makes dict with name to id mapping
- Parameters
dat_gz_file –
- Returns
dict with mapping
-
kg_covid_19.utils.transform_utils.
uniprot_name_to_id
(name_to_id_map: dict, name: str) → Optional[str]¶ Uniprot name to ID mapping
- Parameters
name_to_id_map – mapping dict[name] -> id
name – name
- Returns
id string, or None
-
kg_covid_19.utils.transform_utils.
unzip_to_tempdir
(zip_file_name: str, tempdir: str) → None¶
-
kg_covid_19.utils.transform_utils.
write_node_edge_item
(fh: Any, header: List, data: List, sep: str = '\t')¶ Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]
Module contents¶
-
kg_covid_19.utils.
download_from_yaml
(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None¶ Given an download info from an download.yaml file, download all files
- Args:
yaml_file: A string pointing to the download.yaml file, to be parsed for things to download. output_dir: A string pointing to where to write out downloaded files. ignore_cache: Ignore cache and download files even if they exist [false]
- Returns:
None.
-
kg_covid_19.utils.
multi_page_table_to_list
(multi_page_table: Any) → List[Dict]¶ Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.
- Args:
multi_page_table:
- Returns:
table_data: A list of dicts, where each dict is item from one row.