% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/semnet.r
\name{semnet_window}
\alias{semnet_window}
\title{Create a semantic network based on the co-occurence of tokens in token windows}
\usage{
semnet_window(
  tc,
  feature = "token",
  measure = c("con_prob", "cosine", "count_directed", "count_undirected", "chi2"),
  context_level = c("document", "sentence"),
  window.size = 10,
  direction = "<>",
  backbone = F,
  n.batches = 5,
  matrix_mode = c("positionXwindow", "windowXwindow")
)
}
\arguments{
\item{tc}{a tCorpus or a featureHits object (i.e. the result of search_features)}

\item{feature}{The name of the feature column}

\item{measure}{The similarity measure. Currently supports: "con_prob" (conditional probability),
"cosine" similarity, "count_directed" (i.e number of cooccurrences) and "count_undirected"
(same as count_directed, but returned as an undirected network, chi2 (chi-square score))}

\item{context_level}{Determine whether features need to co-occurr within "documents" or "sentences"}

\item{window.size}{The token distance within which features are considered to co-occurr}

\item{direction}{Determine whether co-occurrence is assymmetricsl ("<>") or takes the order of tokens
into account. If direction is '<', then the from/x feature needs to occur before the
to/y feature. If direction is '>', then after.}

\item{backbone}{If True, add an edge attribute for the backbone alpha}

\item{n.batches}{To limit memory use the calculation is divided into batches. This parameter controls
the number of batches.}

\item{matrix_mode}{There are two approaches for calculating window co-occurrence (see details). By
default we use positionXmatrix, but matrixXmatrix is optional because it might
be favourable for some uses, and might make more sense for cosine similarity.}
}
\value{
an Igraph graph in which nodes are features and edges are similarity scores
}
\description{
This function calculates the co-occurence of features and returns a network/graph
in the igraph format, where nodes are tokens and edges represent the similarity/adjacency of tokens.
Co-occurence is calcuated based on how often two tokens co-occurr within a given token distance.

If a featureHits object is given as input, then for for query hits that have multiple positions (i.e. terms
connected with AND statements or word proximity) the raw count score is biased. For the count_* measures
therefore only the first position of the query hit is used.
}
\details{
There are two approaches for calculating window co-occurrence.
One is to measure how often a feature occurs within a given token window, which
can be calculating by calculating the inner product of a matrix that contains the
exact position of features and a matrix that contains the occurrence window.
We refer to this as the "positionXwindow" mode. Alternatively, we can measure how
much the windows of features overlap, for which take the inner product of two window
matrices, which we call the "windowXwindow" mode. The positionXwindow approach has the advantage
of being easy to interpret (e.g. how likely is feature "Y" to occurr within 10
tokens from feature "X"?). The windowXwindow mode, on the other hand, has the interesting
feature that similarity is stronger if tokens co-occurr more closely together
(since then their windows overlap more), but this only works well for similarity measures that
normalize the similarity (e.g., cosine). Currently, we only use the positionXwindow mode,
but windowXwindow could be interesting to use as well, and for cosine it might actually make more
sense.
}
\examples{
text = c('A B C', 'D E F. G H I', 'A D', 'GGG')
tc = create_tcorpus(text, doc_id = c('a','b','c','d'), split_sentences = TRUE)

g = semnet_window(tc, 'token', window.size = 1)
g
igraph::get.data.frame(g)
\donttest{plot_semnet(g)}
}
