Skip to contents

Run parallel BLAST for set of sequences

Usage

parallel_blast(
  asvs,
  db_path,
  out_file,
  out_RDS,
  num_threads,
  blast_type,
  total_cores,
  perc_id,
  perc_qcov_hsp,
  num_alignments,
  verbose = FALSE,
  env_name = "blast-env"
)

Arguments

asvs

Character vector with sequences

db_path

Complete path do formatted BLAST database.

out_file

Complete path to output .csv file on an existing folder.

out_RDS

Complete path to output RDS file on an existing folder.

num_threads

Number of threads to run BLAST on. Passed on to BLAST+ argument -num_threads.

blast_type

BLAST+ executable to be used on search.

total_cores

Total available cores to run BLAST in parallel. Check your max with future::availableCores()

perc_id

Lowest identity percentage cutoff. Passed on to BLAST+ -perc_identity.

perc_qcov_hsp

Lowest query coverage per HSP percentage cutoff. Passed on to BLAST+ -qcov_hsp_perc.

num_alignments

Number of alignments to retrieve from BLAST. Max = 6.

verbose

Should condathis::run() internal command be shown?

env_name

The name of the conda environment with the parameter (i.e. "blast-env")

Value

A tibble with the BLAST tabular output.

Examples

if (FALSE) { # \dontrun{
blast_res <- BLASTr::parallel_blast(
  asvs = ASVs_test, # vector of sequences to be searched
  db_path = "/data/databases/nt/nt", # path to a formatted blast database
  out_file = NULL, # path to a .csv file to be created with results (on an existing folder)
  out_RDS = NULL, # path to a .RDS file to be created with results (on an existing folder)
  perc_id = 80, # minimum identity percentage cutoff
  perc_qcov_hsp = 80, # minimum percentage coverage of query sequence by subject sequence cutoff
  num_threads = 1, # number of threads/cores to run each blast on
  total_cores = 8, # number of total threads/cores to allocate all blast searches
  # maximum number of alignments/matches to retrieve results for each query sequence
  num_alignments = 3,
  blast_type = "blastn" # blast search engine to use
)
} # }