kg_bacdive.transform_utils.traits package

Submodules

kg_bacdive.transform_utils.traits.traits module

Transform the traits data from NCBI and GTDB.

class kg_bacdive.transform_utils.traits.traits.TraitsTransform(input_dir, output_dir, nlp=True)

Bases: Transform

Ingest traits dataset (NCBI/GTDB).

Essentially just ingests and transforms this file: https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/blob/master/output/condensed_traits_NCBI.csv And extracts the following columns:

  • tax_id

  • org_name

  • metabolism

  • pathways

  • shape

  • carbon_substrates

  • cell_shape

  • isolation_source

Also implements:
  • OAK to run NLP via the ‘ner_utils’ module and

  • ROBOT using ‘robot_utils’ module.

run(data_file=None)

Call method and perform needed transformations for trait data (NCBI/GTDB).

Parameters:

data_file (Union[Path, None, str]) – Input file name.

Module contents

Traits transform.

class kg_bacdive.transform_utils.traits.TraitsTransform(input_dir, output_dir, nlp=True)

Bases: Transform

Ingest traits dataset (NCBI/GTDB).

Essentially just ingests and transforms this file: https://github.com/bacteria-archaea-traits/bacteria-archaea-traits/blob/master/output/condensed_traits_NCBI.csv And extracts the following columns:

  • tax_id

  • org_name

  • metabolism

  • pathways

  • shape

  • carbon_substrates

  • cell_shape

  • isolation_source

Also implements:
  • OAK to run NLP via the ‘ner_utils’ module and

  • ROBOT using ‘robot_utils’ module.

run(data_file=None)

Call method and perform needed transformations for trait data (NCBI/GTDB).

Parameters:

data_file (Union[Path, None, str]) – Input file name.