kg_bacdive.utils package

Submodules

kg_bacdive.utils.ner_utils module

NLP utilities.

kg_bacdive.utils.ner_utils.annotate(df, prefix, exclusion_list, outfile, llm=False)

Annotate dataframe column text using oaklib + llm.

Parameters:
  • df (DataFrame) – Input DataFrame

  • prefix (str) – Ontology to be used.

  • exclusion_list (List) – Tokens that can be ignored.

kg_bacdive.utils.pandas_utils module

Pandas utilities.

kg_bacdive.utils.pandas_utils.drop_duplicates(file_path)

Read TSV, drop duplicates and export to same file.

Parameters:
  • df – Dataframe

  • file_path (Path) – file path.

kg_bacdive.utils.pandas_utils.establish_transitive_relationship(file_path, subject_prefix, intermediate_prefix, predicate, object_prefix)

Establish transitive relationship given the predicate is the same.

e.g.: Existent relations:
  1. A => predicate => B

  2. B => predicate => C

This function adds the relation A => predicate => C

Parameters:
  • file_path (Path) – Filepath of the edge file.

  • subject_prefix (str) – Subject prefix (A in the example)

  • intermediate_prefix (str) – Intermediate prefix that connects the subject to object (B in the example).

  • predicate (str) – The common predicate between all relations.

  • object_prefix (str) – Object prefix (C in the example)

Return type:

DataFrame

Returns:

Core dataframe with additional deduced rows.

kg_bacdive.utils.robot_utils module

Utility to implement ROBOT over ontology files.

kg_bacdive.utils.robot_utils.convert_to_json(path, ont)

Convert OWL to JSON using ROBOT and the subprocess library.

Parameters:
  • path (str) – Path to ROBOT and the input OWL files.

  • ont (str) – Ontology

Returns:

None

kg_bacdive.utils.robot_utils.extract_convert_to_json(path, ont_name, terms, mode)

Extract all children of provided CURIE.

ROBOT Method options:

  • STAR: The STAR-module contains mainly the terms in the seed and the

inter-relations between them (not necessarily sub- and super-classes).

  • TOP: The TOP-module contains mainly the terms in the seed, plus all

their sub-classes and the inter-relations between them.

  • BOT: The BOT, or BOTTOM, -module contains mainly the terms in the seed,

plus all their super-classes and the inter-relations between them.

  • MIREOT : The MIREOT method preserves the hierarchy of the input ontology

(subclass and subproperty relationships), but does not try to preserve the full set of logical entailments.

Parameters:
  • path (str) – path of file to be converted

  • ont_name (str) – Name of the ontology

  • terms (str) – Either CURIE or a file of CURIEs list

  • mode (str) – Method options as listed below.

Returns:

None

kg_bacdive.utils.robot_utils.initialize_robot(path)

Initialize ROBOT with necessary configuration.

Parameters:

path (str) – Path to ROBOT files.

Return type:

list

Returns:

A list consisting of robot shell script name and environment variables.

kg_bacdive.utils.robot_utils.remove_convert_to_json(path, ont_name, terms)

Remove all children of provided CURIE(s).

Parameters:
  • path (str) – path of file to be converted

  • ont_name (str) – Name of the ontology

  • terms (Union[List, Path]) – Either CURIE or a file of CURIEs list.

Returns:

None

Module contents

ROBOT utility.