kg_obo package¶
Submodules¶
kg_obo.transform module¶
-
kg_obo.transform.
download_ontology
(url: str, file: str, logger: object) → bool¶ Download ontology from URL
- Parameters
url – url to download from
file – file to download into
logger –
- Returns
boolean indicating whether download worked
-
kg_obo.transform.
get_owl_iri
(input_file_name: str) → tuple¶ Extracts version IRI from OWL definitions. Here, the IRI is the full URL of the origin OWL, as naming conventions vary. Avoids much file parsing as the IRI should be near the top of the file. Does some string parsing to get a shorter version number. Versions may take multiple formats across OBOs.
- Parameters
input_file_name – name of OWL format file to extract IRI from
- Returns
tuple of (str of IRI, str of version)
-
kg_obo.transform.
kgx_transform
(input_file: list, input_format: str, output_file: str, output_format: str, logger: object) → tuple¶ Call KGX transform and report success status (bool)
- Parameters
input_file – list of files to transform
input_format – input format
output_file – output file root (appended with nodes/edges.[format])
output_format – output format
logger – logger
- Returns
tuple - (bool for did transform work?, bool for any errors encountered)
-
kg_obo.transform.
retrieve_obofoundry_yaml
(yaml_url: str = 'https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml', skip: list = [], get_only: list = []) → list¶ Retrieve YAML containing list of all ontologies in OBOFoundry :param yaml_url: a stable URL containing a YAML file that describes all the OBO ontologies: :param skip: which ontologies should we skip :return: parsed yaml describing ontologies to transform
-
kg_obo.transform.
run_transform
(skip: list = [], get_only: list = [], bucket='bucket', save_local=False, s3_test=False, lock_file_remote_path: str = 'kg-obo/lock', log_dir='logs', data_dir='data', remote_path='kg-obo', track_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool¶ Perform setup, then kgx-mediated transforms for all specified OBOs. :param skip: list of OBOs to skip, by ID :param get_only: list of OBOs to transform, by ID (otherwise do all) :param bucket: str of S3 bucket, to be specified as argument :param save_local: bool for whether to retain transform results on local disk :param s3_test: bool for whether to perform mock S3 upload only :param lock_file_remote_path: str of path for lock file on S3 :param log_dir: str of local dir where any logs should be saved :param data_dir: str of local dir where data should be saved :param remote_path: str of remote path on S3 bucket :param track_file_local_path: str of local path for tracking file :param tracking_file_remote_path: str of path of tracking file on S3 :return: boolean indicating success or existing run encountered (False for unresolved error)
-
kg_obo.transform.
track_obo_version
(name: str = '', iri: str = '', version: str = '', bucket: str = '', track_file_local_path: str = 'data/tracking.yaml', track_file_remote_path: str = 'kg-obo/tracking.yaml') → None¶ Writes OBO version as per IRI to tracking.yaml. Note this tracking file is on the root of the S3 kg-obo directory. :param name: name of OBO, as OBO ID :param iri: full OBO VersionIRI, as URL :param version: short OBO version :param track_file_local_path: where to look for local tracking.yaml file :param track_file_remote_path: where to look for remote tracking.yaml file
-
kg_obo.transform.
transformed_obo_exists
(name: str, iri: str, s3_test=False, bucket: str = '', tracking_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool¶ Read tracking.yaml to determine if transformed version of this OBO exists.
- Parameters
name – string of short logger name, e.g., bfo
iri – iri of OBO version
- Returns
boolean, True if this OBO and version already exist as transformed
kg_obo.upload module¶
-
kg_obo.upload.
check_lock
(s3_bucket: str, s3_bucket_dir: str) → bool¶ Checks on existence of a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if lock file exists, and False otherwise.
-
kg_obo.upload.
check_tracking
(s3_bucket: str, s3_bucket_dir: str) → bool¶ Checks on existence of the tracking.yaml file on S3. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if tracking file exists, and False otherwise.
-
kg_obo.upload.
mock_check_lock
(s3_bucket: str, s3_bucket_dir: str) → bool¶ Mock checking on existence of a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if lock file exists, and False otherwise.
-
kg_obo.upload.
mock_check_tracking
(s3_bucket: str, s3_bucket_dir: str) → bool¶ Mock checking on existence of the tracking.yaml file on S3. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if tracking file exists, and False otherwise.
-
kg_obo.upload.
mock_set_lock
(s3_bucket: str, s3_bucket_dir: str, unlock: bool) → bool¶ Mocks creating a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if completed successfully, and False otherwise.
-
kg_obo.upload.
mock_upload_dir_to_s3
(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None¶ Mock the upload of a local directory to a specified AWS S3 bucket. Though this is a test, it is here so it may be more easily called through command options. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3
-
kg_obo.upload.
set_lock
(s3_bucket: str, s3_bucket_dir: str, unlock: bool) → bool¶ Creates a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if completed successfully, and False otherwise.
-
kg_obo.upload.
upload_dir_to_s3
(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None¶ Upload a local directory to a specified AWS S3 bucket. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3
-
kg_obo.upload.
upload_index_files
(bucket: str, remote_path: str, local_path: str, data_dir: str, update_root=False) → bool¶ Checks the obo directory and version directory, creating index.html where it does not exist. (Or, if update_root is True, just the given directory and not its parent). If index exists, update it if needed. :param bucket: str of S3 bucket :param remote_path: str of path to upload to :param versioned_obo_path: str of directory containing the files to create index for :param data_dir: str of the data directory, so we can get the relative path :param update_root: bool, True to update root index (in this case, versioned_obo_path will be the data_dir) :return: bool returns True if all index files created successfully
Module contents¶
-
kg_obo.
base_url_if_exists
(oid)¶
-
kg_obo.
retrieve_obofoundry_yaml
(yaml_url: str = 'https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml', skip: list = [], get_only: list = []) → list¶ Retrieve YAML containing list of all ontologies in OBOFoundry :param yaml_url: a stable URL containing a YAML file that describes all the OBO ontologies: :param skip: which ontologies should we skip :return: parsed yaml describing ontologies to transform
-
kg_obo.
kgx_transform
(input_file: list, input_format: str, output_file: str, output_format: str, logger: object) → tuple¶ Call KGX transform and report success status (bool)
- Parameters
input_file – list of files to transform
input_format – input format
output_file – output file root (appended with nodes/edges.[format])
output_format – output format
logger – logger
- Returns
tuple - (bool for did transform work?, bool for any errors encountered)
-
kg_obo.
get_owl_iri
(input_file_name: str) → tuple¶ Extracts version IRI from OWL definitions. Here, the IRI is the full URL of the origin OWL, as naming conventions vary. Avoids much file parsing as the IRI should be near the top of the file. Does some string parsing to get a shorter version number. Versions may take multiple formats across OBOs.
- Parameters
input_file_name – name of OWL format file to extract IRI from
- Returns
tuple of (str of IRI, str of version)
-
kg_obo.
track_obo_version
(name: str = '', iri: str = '', version: str = '', bucket: str = '', track_file_local_path: str = 'data/tracking.yaml', track_file_remote_path: str = 'kg-obo/tracking.yaml') → None¶ Writes OBO version as per IRI to tracking.yaml. Note this tracking file is on the root of the S3 kg-obo directory. :param name: name of OBO, as OBO ID :param iri: full OBO VersionIRI, as URL :param version: short OBO version :param track_file_local_path: where to look for local tracking.yaml file :param track_file_remote_path: where to look for remote tracking.yaml file
-
kg_obo.
download_ontology
(url: str, file: str, logger: object) → bool¶ Download ontology from URL
- Parameters
url – url to download from
file – file to download into
logger –
- Returns
boolean indicating whether download worked
-
kg_obo.
run_transform
(skip: list = [], get_only: list = [], bucket='bucket', save_local=False, s3_test=False, lock_file_remote_path: str = 'kg-obo/lock', log_dir='logs', data_dir='data', remote_path='kg-obo', track_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool¶ Perform setup, then kgx-mediated transforms for all specified OBOs. :param skip: list of OBOs to skip, by ID :param get_only: list of OBOs to transform, by ID (otherwise do all) :param bucket: str of S3 bucket, to be specified as argument :param save_local: bool for whether to retain transform results on local disk :param s3_test: bool for whether to perform mock S3 upload only :param lock_file_remote_path: str of path for lock file on S3 :param log_dir: str of local dir where any logs should be saved :param data_dir: str of local dir where data should be saved :param remote_path: str of remote path on S3 bucket :param track_file_local_path: str of local path for tracking file :param tracking_file_remote_path: str of path of tracking file on S3 :return: boolean indicating success or existing run encountered (False for unresolved error)
-
kg_obo.
upload_dir_to_s3
(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None¶ Upload a local directory to a specified AWS S3 bucket. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3
-
kg_obo.
upload_index_files
(bucket: str, remote_path: str, local_path: str, data_dir: str, update_root=False) → bool¶ Checks the obo directory and version directory, creating index.html where it does not exist. (Or, if update_root is True, just the given directory and not its parent). If index exists, update it if needed. :param bucket: str of S3 bucket :param remote_path: str of path to upload to :param versioned_obo_path: str of directory containing the files to create index for :param data_dir: str of the data directory, so we can get the relative path :param update_root: bool, True to update root index (in this case, versioned_obo_path will be the data_dir) :return: bool returns True if all index files created successfully