% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/primersearch.R
\name{primersearch_raw}
\alias{primersearch_raw}
\title{Use EMBOSS primersearch for in silico PCR}
\usage{
primersearch_raw(input = NULL, file = NULL, forward, reverse, mismatch = 5)
}
\arguments{
\item{input}{(\code{character}) One of the following: 
\describe{
  \item{A character vector of sequences}{See the example below for what this
  looks like. The parser \code{\link{read_fasta}} produces output like this.}
  \item{A list of character vectors}{Each vector should have one base per element.}
  \item{A "DNAbin" object}{This is the result of parsers like
  \code{\link[ape]{read.FASTA}}.}
  \item{A list of "SeqFastadna" objects}{This is the result of parsers like
  \code{\link[seqinr]{read.fasta}}.}
  Either "input" or "file" must be supplied but not both.
}}

\item{file}{The path to a FASTA file containing sequences to use. Either
"input" or "file" must be supplied but not both.}

\item{forward}{(\code{character} of length 1) The forward primer sequence}

\item{reverse}{(\code{character} of length 1) The reverse primer sequence}

\item{mismatch}{An integer vector of length 1. The percentage of mismatches allowed.}
}
\value{
A table with one row per predicted amplicon with the following info:

\preformatted{
           (f_primer)
   5' AAGTACCTTAACGGAATTATAG ->        (r_primer)
                               <- TAAGCAAAGCATCCACCTCG 5'
5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3'
      ^                    ^      ^                  ^
   f_start              f_end   r_rtart             r_end
     
      |--------------------||----||------------------|
             f_match       amplicon       r_match  
      |----------------------------------------------|
                           product
                           
  
f_mismatch: The number of mismatches on the forward primer
r_mismatch: The number of mismatches on the reverse primer
input: The index of the input sequence
}
}
\description{
A pair of primers are aligned against a set of sequences.
The location of the best hits, quality of match, and predicted amplicons are returned.
Requires the EMBOSS tool kit (\url{https://emboss.sourceforge.net/}) to be installed.
}
\details{
It can be confusing how the primer sequence relates to the binding sites on a
reference database sequence. A simplified diagram can help. For example, if
the top strand below (5' -> 3') is the database sequence, the forward primer
has the same sequence as the target region, since it will bind to the other
strand (3' -> 5') during PCR and extend on the 3' end. However, the reverse
primer must bind to the database strand, so it will have to be the complement
of the reference sequence. It also has to be reversed to make it in the
standard 5' -> 3' orientation. Therefore, the reverse primer must be the
reverse complement of its binding site on the reference sequence.
\preformatted{
Primer 1: 5' AAGTACCTTAACGGAATTATAG 3'
Primer 2: 5' GCTCCACCTACGAAACGAAT   3'
 
                               <- TAAGCAAAGCATCCACCTCG 5'
5' ...AAGTACCTTAACGGAATTATAG......ATTCGTTTCGTAGGTGGAGC... 3'

3' ...TTCATGGAATTGCCTTAATATC......TAAGCAAAGCATCCACCTCG... 5'
   5' AAGTACCTTAACGGAATTATAG ->
}
However, a database might have either the top or the bottom strand as a
reference sequence. Since one implies the sequence of the other, either is
valid, but this is another source of confusion. If we take the diagram above
and rotate it 180 degrees, it would mean the same thing, but which primer we would
want to call "forward" and which we would want to call "reverse" would
change. Databases of a single locus (e.g. Greengenes) will likely have a
convention for which strand will be present, so relative to this convention,
there is a distinct "forward" and "reverse". However, computers dont know
about this convention, so the "forward" primer is whichever primer has the
same sequence as its binding region in the database (as opposed to the
reverse complement). For this reason, primersearch will redefine which primer
is "forward" and which is "reverse" based on how it binds the reference
sequence. See the example code for a demonstration of this.
}
\section{Installing EMBOSS}{


The command-line tool "primersearch" from the EMBOSS tool kit is needed to use this function.
How you install EMBOSS will depend on your operating system:

\strong{Linux:}

Open up a terminal and type:

\code{sudo apt-get install emboss}

\strong{Mac OSX:}

The easiest way to install EMBOSS on OSX is to use \href{https://brew.sh/}{homebrew}.
After installing homebrew, open up a terminal and type:

\code{brew install homebrew/science/emboss}

\strong{Windows:}

There is an installer for Windows here:

ftp://emboss.open-bio.org/pub/EMBOSS/windows/mEMBOSS-6.5.0.0-setup.exe
}

\examples{
\dontrun{

### Dummy test data set ###

primer_1_site <- "AAGTACCTTAACGGAATTATAG"
primer_2_site <- "ATTCGTTTCGTAGGTGGAGC"
amplicon <- "NNNAGTGGATAGATAGGGGTTCTGTGGCGTTTGGGAATTAAAGATTAGAGANNN"
seq_1 <- paste0("AA", primer_1_site, amplicon, primer_2_site, "AAAA")
seq_2 <- rev_comp(seq_1)
f_primer <- "ACGTACCTTAACGGAATTATAG" # Note the "C" mismatch at position 2
r_primer <- rev_comp(primer_2_site)
seqs <- c(a = seq_1, b = seq_2)

result <- primersearch_raw(seqs, forward = f_primer, reverse = r_primer)


### Real data set ###

# Get example FASTA file
fasta_path <- system.file(file.path("extdata", "silva_subset.fa"),
                          package = "metacoder")

# Parse the FASTA file as a taxmap object
obj <- parse_silva_fasta(file = fasta_path)

# Simulate PCR with primersearch
pcr_result <- primersearch_raw(obj$data$tax_data$silva_seq, 
                               forward = c("U519F" = "CAGYMGCCRCGGKAAHACC"),
                               reverse = c("Arch806R" = "GGACTACNSGGGTMTCTAAT"),
                               mismatch = 10)

# Add result to input table 
#  NOTE: We want to add a function to handle running pcr on a
#        taxmap object directly, but we are still trying to figure out
#        the best way to implement it. For now, do the following:
obj$data$pcr <- pcr_result
obj$data$pcr$taxon_id <- obj$data$tax_data$taxon_id[pcr_result$input]

# Visualize which taxa were amplified
#  This work because only amplicons are returned by `primersearch`
n_amplified <- unlist(obj$obs_apply("pcr",
                                    function(x) length(unique(x)),
                                    value = "input"))
prop_amped <- n_amplified / obj$n_obs()
heat_tree(obj,
          node_label = taxon_names, 
          node_color = prop_amped, 
          node_color_range = c("grey", "red", "purple", "green"),
          node_color_trans = "linear",
          node_color_axis_label = "Proportion amplified",
          node_size = n_obs,
          node_size_axis_label = "Number of sequences",
          layout = "da", 
          initial_layout = "re")

}

}
