kg_bacdive package
Subpackages
- kg_bacdive.merge_utils package
- kg_bacdive.transform_utils package
- Subpackages
- Submodules
- kg_bacdive.transform_utils.constants module
- kg_bacdive.transform_utils.transform module
- Module contents
- kg_bacdive.utils package
Submodules
kg_bacdive.download module
Download resources from YAML file.
- kg_bacdive.download.download(yaml_file, output_dir, snippet_only, ignore_cache=False)
Download data files from list of URLs.
DL based on config (default: download.yaml) into data directory (default: data/).
- Parameters:
yaml_file (
str) – A string pointing to the yaml file
:param utilized to facilitate the downloading of data. :type output_dir:
str:param output_dir: A string pointing to the location to download data to. :type snippet_only:bool:param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache:bool:param ignore_cache: Ignore cache and download files even if they exist [false] :rtype:None:return: None.
kg_bacdive.query module
Query module.
- kg_bacdive.query.parse_query_yaml(yaml_file)
Parse a YAML file and return the results as a dictionary.
- Parameters:
yaml_file – YAML file to parse.
- Return type:
dict- Returns:
A dictionary of results from the YAML file.
- kg_bacdive.query.result_dict_to_tsv(result_dict, outfile)
Write a dictionary to a TSV file.
- Parameters:
result_dict (
dict) – Dictionary to write to TSV file.outfile (
str) – TSV file to write to.
- Return type:
None
- kg_bacdive.query.run_query(query, endpoint, return_format='json')
Run a SPARQL query and return the results as a dictionary.
- Parameters:
query (
str) – SPARQL query to run.endpoint (
str) – SPARQL endpoint to query.return_format – Format of the returned data.
- Return type:
dict- Returns:
A dictionary of results from the SPARQL query.
kg_bacdive.run module
Drive KG download, transform, merge steps.
kg_bacdive.transform module
Transform module.
- kg_bacdive.transform.transform(input_dir, output_dir, sources=None)
Transform based on resource and class declared in DATA_SOURCES.
Call scripts in kg_bacdive/transform/[source name]/ to transform each source into a graph format that KGX can ingest directly, in either TSV or JSON format: https://github.com/biolink/kgx/blob/master/data-preparation.md
- Parameters:
input_dir (
Optional[Path]) – A string pointing to the directory to import data from.output_dir (
Optional[Path]) – A string pointing to the directory to output data to.sources (
Optional[List[str]]) – A list of sources to transform.
- Return type:
None
Module contents
kg-bacdive package.
- kg_bacdive.download(yaml_file, output_dir, snippet_only, ignore_cache=False)
Download data files from list of URLs.
DL based on config (default: download.yaml) into data directory (default: data/).
- Parameters:
yaml_file (
str) – A string pointing to the yaml file
:param utilized to facilitate the downloading of data. :type output_dir:
str:param output_dir: A string pointing to the location to download data to. :type snippet_only:bool:param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache:bool:param ignore_cache: Ignore cache and download files even if they exist [false] :rtype:None:return: None.