kg_bacdive package

Subpackages

Submodules

kg_bacdive.download module

Download resources from YAML file.

kg_bacdive.download.download(yaml_file, output_dir, snippet_only, ignore_cache=False)

Download data files from list of URLs.

DL based on config (default: download.yaml) into data directory (default: data/).

Parameters:

yaml_file (str) – A string pointing to the yaml file

:param utilized to facilitate the downloading of data. :type output_dir: str :param output_dir: A string pointing to the location to download data to. :type snippet_only: bool :param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache: bool :param ignore_cache: Ignore cache and download files even if they exist [false] :rtype: None :return: None.

kg_bacdive.query module

Query module.

kg_bacdive.query.parse_query_yaml(yaml_file)

Parse a YAML file and return the results as a dictionary.

Parameters:

yaml_file – YAML file to parse.

Return type:

dict

Returns:

A dictionary of results from the YAML file.

kg_bacdive.query.result_dict_to_tsv(result_dict, outfile)

Write a dictionary to a TSV file.

Parameters:
  • result_dict (dict) – Dictionary to write to TSV file.

  • outfile (str) – TSV file to write to.

Return type:

None

kg_bacdive.query.run_query(query, endpoint, return_format='json')

Run a SPARQL query and return the results as a dictionary.

Parameters:
  • query (str) – SPARQL query to run.

  • endpoint (str) – SPARQL endpoint to query.

  • return_format – Format of the returned data.

Return type:

dict

Returns:

A dictionary of results from the SPARQL query.

kg_bacdive.run module

Drive KG download, transform, merge steps.

kg_bacdive.transform module

Transform module.

kg_bacdive.transform.transform(input_dir, output_dir, sources=None)

Transform based on resource and class declared in DATA_SOURCES.

Call scripts in kg_bacdive/transform/[source name]/ to transform each source into a graph format that KGX can ingest directly, in either TSV or JSON format: https://github.com/biolink/kgx/blob/master/data-preparation.md

Parameters:
  • input_dir (Optional[Path]) – A string pointing to the directory to import data from.

  • output_dir (Optional[Path]) – A string pointing to the directory to output data to.

  • sources (Optional[List[str]]) – A list of sources to transform.

Return type:

None

Module contents

kg-bacdive package.

kg_bacdive.download(yaml_file, output_dir, snippet_only, ignore_cache=False)

Download data files from list of URLs.

DL based on config (default: download.yaml) into data directory (default: data/).

Parameters:

yaml_file (str) – A string pointing to the yaml file

:param utilized to facilitate the downloading of data. :type output_dir: str :param output_dir: A string pointing to the location to download data to. :type snippet_only: bool :param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache: bool :param ignore_cache: Ignore cache and download files even if they exist [false] :rtype: None :return: None.