% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vs_uchime_denovo.R
\name{vs_uchime_denovo}
\alias{vs_uchime_denovo}
\alias{uchime_denovo}
\alias{chimera}
\title{Detect chimeras without external references (i.e. de novo)}
\usage{
vs_uchime_denovo(
  fasta_input,
  nonchimeras = NULL,
  chimeras = NULL,
  sizein = TRUE,
  sizeout = TRUE,
  relabel = NULL,
  relabel_sha1 = FALSE,
  fasta_width = 0,
  sample = NULL,
  log_file = NULL,
  vsearch_options = NULL,
  tmpdir = NULL
)
}
\arguments{
\item{fasta_input}{(Required). A FASTA file path or a FASTA object with reads.
If a tibble is provided, any columns in addition to \code{Header} and
\code{Sequence} will be preserved in the output. See \emph{Details}.}

\item{nonchimeras}{(Optional). Name of the FASTA output file for the
non-chimeric sequences. If \code{NULL} (default), no output is written to
file.}

\item{chimeras}{(Optional). Name of the FASTA output file for the chimeric
sequences. If \code{NULL} (default), no output is written to file.}

\item{sizein}{(Optional). If \code{TRUE} (default), abundance annotations
present in sequence headers are taken into account.}

\item{sizeout}{(Optional). If \code{TRUE} (default), abundance annotations
are added to FASTA headers.}

\item{relabel}{(Optional). Relabel sequences using the given prefix and a
ticker to construct new headers. Defaults to \code{NULL}.}

\item{relabel_sha1}{(Optional). If \code{TRUE} (default), relabel sequences
using the SHA1 message digest algorithm. Defaults to \code{FALSE}.}

\item{fasta_width}{(Optional). Number of characters per line in the output
FASTA file. Defaults to \code{0}, which eliminates wrapping.}

\item{sample}{(Optional). Add the given sample identifier string to sequence
headers. For instance, if the given string is "ABC", the text ";sample=ABC"
will be added to the header. If \code{NULL} (default), no identifier is added.}

\item{log_file}{(Optional). Name of the log file to capture messages from
\code{VSEARCH}. If \code{NULL} (default), no log file is created.}

\item{vsearch_options}{(Optional). Additional arguments to pass to
\code{VSEARCH}. Defaults to \code{NULL}. See \emph{Details}.}

\item{tmpdir}{(Optional). Path to the directory where temporary files should
be written when tables are used as input or output. Defaults to
\code{NULL}, which resolves to the session-specific temporary directory
(\code{tempdir()}).}
}
\value{
A tibble or \code{NULL}.

If \code{nonchimeras} and \code{chimeras} are specified, the resulting
sequences after chimera detection written directly to the specified files in
FASTA format, and no tibbles are returned.

If \code{nonchimeras} and \code{chimeras} are \code{NULL}, a FASTA object
containing non-chimeric sequences is returned. This output tibble will
include any additional columns that were present in the \code{fasta_input}
tibble. An attribute named \code{"chimeras"} will contain a tibble of the
chimeric sequences, also with the additional columns preserved.

Additionally, the returned tibble (when applicable) has an attribute
\code{"statistics"} containing a tibble with chimera detection statistics.

The statistics tibble has the following columns:
\itemize{
  \item \code{num_nucleotides}: Total number of nucleotides used as input
  for chimera detection.
  \item \code{num_sequences}: Total number of sequences used as input for
  chimera detection.
  \item \code{min_length_input_seq}: Length of the shortest sequence used
  as input for chimera detection.
  \item \code{max_length_input_seq}: Length of the longest sequence used as
  input for chimera detection.
  \item \code{avg_length_input_seq}: Average length of the sequences used as
  input for chimera detection.
  \item \code{num_non_chimeras}: Number of non-chimeric sequences.
  \item \code{num_chimeras}: Number of chimeric sequences.
  \item \code{input}: Name of the input file/object for the chimera
  detection.
}
}
\description{
\code{vs_uchime_denovo} detects chimeras present in the FASTA
sequences in using \code{VSEARCH}'s \code{uchime_denovo} algorithm.
Automatically sorts sequences by decreasing abundance to enhance chimera
detection accuracy.
}
\details{
Chimeras in the input FASTA sequences are detected using \code{VSEARCH}´s
\code{uchime_denovo}. In de novo mode, input FASTA file/object must present
abundance annotations (i.e. a pattern [;]size=integer[;] in the header).
Input order matters for chimera detection, so it is recommended to sort
sequences by decreasing abundance.

\code{fasta_input} can either be a FASTA file or a FASTA object. FASTA objects
are tibbles that contain the columns \code{Header} and \code{Sequence}, see
\code{\link[microseq]{readFasta}}.

When providing a tibble as \code{fasta_input}, you can include additional
columns with metadata (e.g., OTU IDs, sample origins). The function will
preserve these columns by joining them back to the results based on the
DNA sequence. This allows you to keep your metadata associated with your
sequences throughout the chimera detection process.

If \code{nonchimeras} and \code{chimeras} are specified, resulting
non-chimeric and chimeric sequences are written to these files in FASTA
format.

If \code{nonchimeras} and \code{chimeras} are \code{NULL}, results are
returned as a FASTA-objects.

\code{nonchimeras} and \code{chimeras} must either both be specified or both
be \code{NULL}.

\code{vsearch_options} allows users to pass additional command-line arguments
to \code{VSEARCH} that are not directly supported by this function. Refer to
the \code{VSEARCH} manual for more details.
}
\examples{
\dontrun{
# Define arguments
fasta_input <- file.path(file.path(path.package("Rsearch"), "extdata"),
                         "small_R1.fq")
nonchimeras <- "nonchimeras.fa"
chimeras <- "chimeras.fa"

# Detect chimeras with default parameters and return FASTA files
vs_uchime_denovo(fasta_input = fasta_input,
                 nonchimeras = nonchimeras,
                 chimeras = chimeras)

# Detect chimeras with default parameters and return a FASTA tibble
nonchimeras.tbl <- vs_uchime_denovo(fasta_input = fasta_input,
                                    nonchimeras = NULL,
                                    chimeras = NULL)

# Get chimeras tibble
chimeras.tbl <- attr(nonchimeras.tbl, "chimeras")

# Get statistics tibble
statistics.tbl <- attr(nonchimeras.tbl, "statistics")
}

}
\references{
\url{https://github.com/torognes/vsearch}

\url{https://github.com/torognes/vsearch}
}
