kg_covid_19 package

Submodules

kg_covid_19.download module

kg_covid_19.download.download(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None

Downloads data files from list of URLs (default: download.yaml) into data directory (default: data/).

Args:

yaml_file: A string pointing to the yaml file utilized to facilitate the downloading of data. output_dir: A string pointing to the location to download data to. ignore_cache: Ignore cache and download files even if they exist [false]

Returns:

None.

kg_covid_19.make_holdouts module

kg_covid_19.make_holdouts.df_to_tsv(df: pandas.core.frame.DataFrame, outfile: str, sep='\t', index=False) → None
kg_covid_19.make_holdouts.make_holdouts(nodes: str, edges: str, output_dir: str, train_fraction: float, validation: bool, seed=42) → None

Prepare positive and negative edges for testing and training (see run.py holdouts command for documentation)

Args:

:param nodes nodes of input graph, in KGX TSV format [data/merged/nodes.tsv] :param edges: edges for input graph, in KGX TSV format [data/merged/edges.tsv] :param output_dir: directory to output edges and new graph [data/edges/] :param train_fraction: fraction of edges to emit as training :param validation: should we make validation edges? [False] :param seed: random seed [42]

Returns:

None.

kg_covid_19.make_holdouts.make_negative_edges(nodes_df: pandas.core.frame.DataFrame, edges_df: pandas.core.frame.DataFrame, edge_label: str = 'negative_edge', relation: str = 'negative_edge') → pandas.core.frame.DataFrame

Given a graph (as nodes and edges pandas dataframes), select num_edges holdouts that are NOT present in the graph

Parameters
  • nodes_df – pandas dataframe containing node info

  • edges_df – pandas dataframe containing edge info

  • relation – string to put in relation column

  • edge_label – string to put in edge_label column

Returns

kg_covid_19.make_holdouts.make_positive_edges(nodes_df: pandas.core.frame.DataFrame, edges_df: pandas.core.frame.DataFrame, train_fraction: float) → List[pandas.core.frame.DataFrame]

Positive edges are randomly selected from the edges in the graph, IFF both nodes participating in the edge have a degree greater than min_degree (to avoid creating disconnected components). This edge is then removed in the output graph. Negative edges are selected by randomly selecting pairs of nodes that are not connected by an edge.

Parameters
  • nodes_df – pandas dataframe with node info, generated from KGX TSV file

  • edges_df – pandas dataframe with edge info, generated from KGX TSV file

  • train_fraction – fraction of input edges to emit as test (and optionally validation) edges

Returns

pandas dataframes:

training_edges_df: a dataframe with training edges with positive edges we

selected for test removed from graph

test_edges_df: a dataframe with test positive edges

kg_covid_19.make_holdouts.tsv_to_df(tsv_file: str, *args, **kwargs) → pandas.core.frame.DataFrame

Read in a TSV file and return a pandas dataframe

Parameters

tsv_file – file to read in

Returns

pandas dataframe

kg_covid_19.query module

kg_covid_19.query.parse_query_rq(rq_file) → dict
Args:

rq_file: sparql query in grlc rq format

Returns: dict with parsed info about sparql query

kg_covid_19.query.result_dict_to_tsv(result_dict: dict, outfile: str) → None
kg_covid_19.query.run_query(query: str, endpoint: str, return_format='json') → dict

kg_covid_19.transform module

kg_covid_19.transform.transform(input_dir: str, output_dir: str, sources: List[str] = None) → None

Call scripts in kg_covid_19/transform/[source name]/ to transform each source into a graph format that KGX can ingest directly, in either TSV or JSON format: https://github.com/NCATS-Tangerine/kgx/blob/master/data-preparation.md

Args:

input_dir: A string pointing to the directory to import data from. output_dir: A string pointing to the directory to output data to. sources: A list of sources to transform.

Returns:

None.

Module contents

kg_covid_19.download(yaml_file: str, output_dir: str, ignore_cache: bool = False) → None

Downloads data files from list of URLs (default: download.yaml) into data directory (default: data/).

Args:

yaml_file: A string pointing to the yaml file utilized to facilitate the downloading of data. output_dir: A string pointing to the location to download data to. ignore_cache: Ignore cache and download files even if they exist [false]

Returns:

None.

kg_covid_19.transform(input_dir: str, output_dir: str, sources: List[str] = None) → None

Call scripts in kg_covid_19/transform/[source name]/ to transform each source into a graph format that KGX can ingest directly, in either TSV or JSON format: https://github.com/NCATS-Tangerine/kgx/blob/master/data-preparation.md

Args:

input_dir: A string pointing to the directory to import data from. output_dir: A string pointing to the directory to output data to. sources: A list of sources to transform.

Returns:

None.