kg_bacdive.utils package
Submodules
kg_bacdive.utils.ner_utils module
NLP utilities.
- kg_bacdive.utils.ner_utils.annotate(df, prefix, exclusion_list, outfile, llm=False)
Annotate dataframe column text using oaklib + llm.
- Parameters:
df (
DataFrame
) – Input DataFrameprefix (
str
) – Ontology to be used.exclusion_list (
List
) – Tokens that can be ignored.
kg_bacdive.utils.pandas_utils module
Pandas utilities.
- kg_bacdive.utils.pandas_utils.drop_duplicates(file_path)
Read TSV, drop duplicates and export to same file.
- Parameters:
df – Dataframe
file_path (
Path
) – file path.
- kg_bacdive.utils.pandas_utils.establish_transitive_relationship(file_path, subject_prefix, intermediate_prefix, predicate, object_prefix)
Establish transitive relationship given the predicate is the same.
- e.g.: Existent relations:
A => predicate => B
B => predicate => C
This function adds the relation A => predicate => C
- Parameters:
file_path (
Path
) – Filepath of the edge file.subject_prefix (
str
) – Subject prefix (A in the example)intermediate_prefix (
str
) – Intermediate prefix that connects the subject to object (B in the example).predicate (
str
) – The common predicate between all relations.object_prefix (
str
) – Object prefix (C in the example)
- Return type:
DataFrame
- Returns:
Core dataframe with additional deduced rows.
kg_bacdive.utils.robot_utils module
Utility to implement ROBOT over ontology files.
- kg_bacdive.utils.robot_utils.convert_to_json(path, ont)
Convert OWL to JSON using ROBOT and the subprocess library.
- Parameters:
path (
str
) – Path to ROBOT and the input OWL files.ont (
str
) – Ontology
- Returns:
None
- kg_bacdive.utils.robot_utils.extract_convert_to_json(path, ont_name, terms, mode)
Extract all children of provided CURIE.
ROBOT Method options:
STAR: The STAR-module contains mainly the terms in the seed and the
inter-relations between them (not necessarily sub- and super-classes).
TOP: The TOP-module contains mainly the terms in the seed, plus all
their sub-classes and the inter-relations between them.
BOT: The BOT, or BOTTOM, -module contains mainly the terms in the seed,
plus all their super-classes and the inter-relations between them.
MIREOT : The MIREOT method preserves the hierarchy of the input ontology
(subclass and subproperty relationships), but does not try to preserve the full set of logical entailments.
- Parameters:
path (
str
) – path of file to be convertedont_name (
str
) – Name of the ontologyterms (
str
) – Either CURIE or a file of CURIEs listmode (
str
) – Method options as listed below.
- Returns:
None
- kg_bacdive.utils.robot_utils.initialize_robot(path)
Initialize ROBOT with necessary configuration.
- Parameters:
path (
str
) – Path to ROBOT files.- Return type:
list
- Returns:
A list consisting of robot shell script name and environment variables.
- kg_bacdive.utils.robot_utils.remove_convert_to_json(path, ont_name, terms)
Remove all children of provided CURIE(s).
- Parameters:
path (
str
) – path of file to be convertedont_name (
str
) – Name of the ontologyterms (
Union
[List
,Path
]) – Either CURIE or a file of CURIEs list.
- Returns:
None
Module contents
ROBOT utility.