kg_bacdive package
Subpackages
- kg_bacdive.merge_utils package
- kg_bacdive.transform_utils package
- Subpackages
- Submodules
- kg_bacdive.transform_utils.constants module
- kg_bacdive.transform_utils.transform module
- Module contents
- kg_bacdive.utils package
Submodules
kg_bacdive.download module
Download resources from YAML file.
- kg_bacdive.download.download(yaml_file, output_dir, snippet_only, ignore_cache=False)
Download data files from list of URLs.
DL based on config (default: download.yaml) into data directory (default: data/).
- Parameters:
yaml_file (
str
) – A string pointing to the yaml file
:param utilized to facilitate the downloading of data. :type output_dir:
str
:param output_dir: A string pointing to the location to download data to. :type snippet_only:bool
:param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache:bool
:param ignore_cache: Ignore cache and download files even if they exist [false] :rtype:None
:return: None.
kg_bacdive.query module
Query module.
- kg_bacdive.query.parse_query_yaml(yaml_file)
Parse a YAML file and return the results as a dictionary.
- Parameters:
yaml_file – YAML file to parse.
- Return type:
dict
- Returns:
A dictionary of results from the YAML file.
- kg_bacdive.query.result_dict_to_tsv(result_dict, outfile)
Write a dictionary to a TSV file.
- Parameters:
result_dict (
dict
) – Dictionary to write to TSV file.outfile (
str
) – TSV file to write to.
- Return type:
None
- kg_bacdive.query.run_query(query, endpoint, return_format='json')
Run a SPARQL query and return the results as a dictionary.
- Parameters:
query (
str
) – SPARQL query to run.endpoint (
str
) – SPARQL endpoint to query.return_format – Format of the returned data.
- Return type:
dict
- Returns:
A dictionary of results from the SPARQL query.
kg_bacdive.run module
Drive KG download, transform, merge steps.
kg_bacdive.transform module
Transform module.
- kg_bacdive.transform.transform(input_dir, output_dir, sources=None)
Transform based on resource and class declared in DATA_SOURCES.
Call scripts in kg_bacdive/transform/[source name]/ to transform each source into a graph format that KGX can ingest directly, in either TSV or JSON format: https://github.com/biolink/kgx/blob/master/data-preparation.md
- Parameters:
input_dir (
Optional
[Path
]) – A string pointing to the directory to import data from.output_dir (
Optional
[Path
]) – A string pointing to the directory to output data to.sources (
Optional
[List
[str
]]) – A list of sources to transform.
- Return type:
None
Module contents
kg-bacdive package.
- kg_bacdive.download(yaml_file, output_dir, snippet_only, ignore_cache=False)
Download data files from list of URLs.
DL based on config (default: download.yaml) into data directory (default: data/).
- Parameters:
yaml_file (
str
) – A string pointing to the yaml file
:param utilized to facilitate the downloading of data. :type output_dir:
str
:param output_dir: A string pointing to the location to download data to. :type snippet_only:bool
:param snippet_only: Downloads only the first 5 kB of the source,for testing and file checks. :type ignore_cache:bool
:param ignore_cache: Ignore cache and download files even if they exist [false] :rtype:None
:return: None.