kg_idg.utils.transform_utils

Functions

collapse_uniprot_curie(uniprot_curie)

Given a UniProtKB curie for an isoform such as UniprotKB:P63151-1 or UniprotKB:P63151-2, collapse to parent protein (UniprotKB:P63151 / UniprotKB:P63151)

data_to_dict(these_keys, these_values)

Zip up two lists to make a dict

get_header_items(table_data)

Utility fxn to get header from (first page of) a table.

get_item_by_priority(items_dict, …)

Retrieve item from a dict using a list of keys, in descending order of priority

guess_bl_category(identifier)

Guess category for a given identifier.

multi_page_table_to_list(multi_page_table)

Method to turn table data returned from tabula.io.read_pdf(), possibly broken over several pages, into a list of dicts, one dict for each row.

parse_header(header_string[, sep])

Parses header data.

parse_line(this_line, header_items[, sep])

Methods processes a line of text from the csv file.

ungzip_to_tempdir(gzipped_file, tempdir)

uniprot_make_name_to_id_mapping(dat_gz_file)

Given a Uniprot dat.gz file, like this: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping.dat.gz makes dict with name to id mapping

uniprot_name_to_id(name_to_id_map, name)

Uniprot name to ID mapping

unzip_to_tempdir(zip_file_name, tempdir)

write_node_edge_item(fh, header, data[, sep])

Write out a single line for a node or an edge in *.tsv :param fh: file handle of node or edge file :param header: list of header items :param data: data for line to write out :param sep: separator [ ]

Exceptions

ItemInDictNotFound

Raised when the input value is too small

TransformError

Base class for other exceptions