kg_obo package

Submodules

kg_obo.obolibrary_utils module

kg_obo.obolibrary_utils.base_url_if_exists(oid)

kg_obo.transform module

kg_obo.transform.download_ontology(url: str, file: str, logger: object) → bool

Download ontology from URL

Parameters
  • url – url to download from

  • file – file to download into

  • logger

Returns

boolean indicating whether download worked

kg_obo.transform.get_owl_iri(input_file_name: str) → tuple

Extracts version IRI from OWL definitions. Here, the IRI is the full URL of the origin OWL, as naming conventions vary. Avoids much file parsing as the IRI should be near the top of the file. Does some string parsing to get a shorter version number. Versions may take multiple formats across OBOs.

Parameters

input_file_name – name of OWL format file to extract IRI from

Returns

tuple of (str of IRI, str of version)

kg_obo.transform.kgx_transform(input_file: list, input_format: str, output_file: str, output_format: str, logger: object) → tuple

Call KGX transform and report success status (bool)

Parameters
  • input_file – list of files to transform

  • input_format – input format

  • output_file – output file root (appended with nodes/edges.[format])

  • output_format – output format

  • logger – logger

Returns

tuple - (bool for did transform work?, bool for any errors encountered)

kg_obo.transform.retrieve_obofoundry_yaml(yaml_url: str = 'https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml', skip: list = [], get_only: list = []) → list

Retrieve YAML containing list of all ontologies in OBOFoundry :param yaml_url: a stable URL containing a YAML file that describes all the OBO ontologies: :param skip: which ontologies should we skip :return: parsed yaml describing ontologies to transform

kg_obo.transform.run_transform(skip: list = [], get_only: list = [], bucket='bucket', save_local=False, s3_test=False, lock_file_remote_path: str = 'kg-obo/lock', log_dir='logs', data_dir='data', remote_path='kg-obo', track_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool

Perform setup, then kgx-mediated transforms for all specified OBOs. :param skip: list of OBOs to skip, by ID :param get_only: list of OBOs to transform, by ID (otherwise do all) :param bucket: str of S3 bucket, to be specified as argument :param save_local: bool for whether to retain transform results on local disk :param s3_test: bool for whether to perform mock S3 upload only :param lock_file_remote_path: str of path for lock file on S3 :param log_dir: str of local dir where any logs should be saved :param data_dir: str of local dir where data should be saved :param remote_path: str of remote path on S3 bucket :param track_file_local_path: str of local path for tracking file :param tracking_file_remote_path: str of path of tracking file on S3 :return: boolean indicating success or existing run encountered (False for unresolved error)

kg_obo.transform.track_obo_version(name: str = '', iri: str = '', version: str = '', bucket: str = '', track_file_local_path: str = 'data/tracking.yaml', track_file_remote_path: str = 'kg-obo/tracking.yaml') → None

Writes OBO version as per IRI to tracking.yaml. Note this tracking file is on the root of the S3 kg-obo directory. :param name: name of OBO, as OBO ID :param iri: full OBO VersionIRI, as URL :param version: short OBO version :param track_file_local_path: where to look for local tracking.yaml file :param track_file_remote_path: where to look for remote tracking.yaml file

kg_obo.transform.transformed_obo_exists(name: str, iri: str, s3_test=False, bucket: str = '', tracking_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool

Read tracking.yaml to determine if transformed version of this OBO exists.

Parameters
  • name – string of short logger name, e.g., bfo

  • iri – iri of OBO version

Returns

boolean, True if this OBO and version already exist as transformed

kg_obo.upload module

kg_obo.upload.check_lock(s3_bucket: str, s3_bucket_dir: str) → bool

Checks on existence of a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if lock file exists, and False otherwise.

kg_obo.upload.check_tracking(s3_bucket: str, s3_bucket_dir: str) → bool

Checks on existence of the tracking.yaml file on S3. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if tracking file exists, and False otherwise.

kg_obo.upload.mock_check_lock(s3_bucket: str, s3_bucket_dir: str) → bool

Mock checking on existence of a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if lock file exists, and False otherwise.

kg_obo.upload.mock_check_tracking(s3_bucket: str, s3_bucket_dir: str) → bool

Mock checking on existence of the tracking.yaml file on S3. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if tracking file exists, and False otherwise.

kg_obo.upload.mock_set_lock(s3_bucket: str, s3_bucket_dir: str, unlock: bool) → bool

Mocks creating a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if completed successfully, and False otherwise.

kg_obo.upload.mock_upload_dir_to_s3(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None

Mock the upload of a local directory to a specified AWS S3 bucket. Though this is a test, it is here so it may be more easily called through command options. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3

kg_obo.upload.set_lock(s3_bucket: str, s3_bucket_dir: str, unlock: bool) → bool

Creates a lock file on S3 to avoid concurrent runs. :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3 :return: boolean returns True if completed successfully, and False otherwise.

kg_obo.upload.upload_dir_to_s3(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None

Upload a local directory to a specified AWS S3 bucket. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3

kg_obo.upload.upload_index_files(bucket: str, remote_path: str, local_path: str, data_dir: str, update_root=False) → bool

Checks the obo directory and version directory, creating index.html where it does not exist. (Or, if update_root is True, just the given directory and not its parent). If index exists, update it if needed. :param bucket: str of S3 bucket :param remote_path: str of path to upload to :param versioned_obo_path: str of directory containing the files to create index for :param data_dir: str of the data directory, so we can get the relative path :param update_root: bool, True to update root index (in this case, versioned_obo_path will be the data_dir) :return: bool returns True if all index files created successfully

Module contents

kg_obo.base_url_if_exists(oid)
kg_obo.retrieve_obofoundry_yaml(yaml_url: str = 'https://raw.githubusercontent.com/OBOFoundry/OBOFoundry.github.io/master/registry/ontologies.yml', skip: list = [], get_only: list = []) → list

Retrieve YAML containing list of all ontologies in OBOFoundry :param yaml_url: a stable URL containing a YAML file that describes all the OBO ontologies: :param skip: which ontologies should we skip :return: parsed yaml describing ontologies to transform

kg_obo.kgx_transform(input_file: list, input_format: str, output_file: str, output_format: str, logger: object) → tuple

Call KGX transform and report success status (bool)

Parameters
  • input_file – list of files to transform

  • input_format – input format

  • output_file – output file root (appended with nodes/edges.[format])

  • output_format – output format

  • logger – logger

Returns

tuple - (bool for did transform work?, bool for any errors encountered)

kg_obo.get_owl_iri(input_file_name: str) → tuple

Extracts version IRI from OWL definitions. Here, the IRI is the full URL of the origin OWL, as naming conventions vary. Avoids much file parsing as the IRI should be near the top of the file. Does some string parsing to get a shorter version number. Versions may take multiple formats across OBOs.

Parameters

input_file_name – name of OWL format file to extract IRI from

Returns

tuple of (str of IRI, str of version)

kg_obo.track_obo_version(name: str = '', iri: str = '', version: str = '', bucket: str = '', track_file_local_path: str = 'data/tracking.yaml', track_file_remote_path: str = 'kg-obo/tracking.yaml') → None

Writes OBO version as per IRI to tracking.yaml. Note this tracking file is on the root of the S3 kg-obo directory. :param name: name of OBO, as OBO ID :param iri: full OBO VersionIRI, as URL :param version: short OBO version :param track_file_local_path: where to look for local tracking.yaml file :param track_file_remote_path: where to look for remote tracking.yaml file

kg_obo.download_ontology(url: str, file: str, logger: object) → bool

Download ontology from URL

Parameters
  • url – url to download from

  • file – file to download into

  • logger

Returns

boolean indicating whether download worked

kg_obo.run_transform(skip: list = [], get_only: list = [], bucket='bucket', save_local=False, s3_test=False, lock_file_remote_path: str = 'kg-obo/lock', log_dir='logs', data_dir='data', remote_path='kg-obo', track_file_local_path: str = 'data/tracking.yaml', tracking_file_remote_path: str = 'kg-obo/tracking.yaml') → bool

Perform setup, then kgx-mediated transforms for all specified OBOs. :param skip: list of OBOs to skip, by ID :param get_only: list of OBOs to transform, by ID (otherwise do all) :param bucket: str of S3 bucket, to be specified as argument :param save_local: bool for whether to retain transform results on local disk :param s3_test: bool for whether to perform mock S3 upload only :param lock_file_remote_path: str of path for lock file on S3 :param log_dir: str of local dir where any logs should be saved :param data_dir: str of local dir where data should be saved :param remote_path: str of remote path on S3 bucket :param track_file_local_path: str of local path for tracking file :param tracking_file_remote_path: str of path of tracking file on S3 :return: boolean indicating success or existing run encountered (False for unresolved error)

kg_obo.upload_dir_to_s3(local_directory: str, s3_bucket: str, s3_bucket_dir: str, make_public=False) → None

Upload a local directory to a specified AWS S3 bucket. :param local_directory: str name of directory to upload :param s3_bucket: str ID of the bucket to upload to :param s3_bucket_dir: str of name of directory to create on S3

kg_obo.upload_index_files(bucket: str, remote_path: str, local_path: str, data_dir: str, update_root=False) → bool

Checks the obo directory and version directory, creating index.html where it does not exist. (Or, if update_root is True, just the given directory and not its parent). If index exists, update it if needed. :param bucket: str of S3 bucket :param remote_path: str of path to upload to :param versioned_obo_path: str of directory containing the files to create index for :param data_dir: str of the data directory, so we can get the relative path :param update_root: bool, True to update root index (in this case, versioned_obo_path will be the data_dir) :return: bool returns True if all index files created successfully