quanteda 0.7.2
==============
* phrasetotokens works with dictionaries and collocations, to transform
  multi-word expressions into single tokens in texts or corpora

* dictionaries now redefined as S4 classes

* improvements to collocations(), now does not include tokens that are separated 
  by punctuation

* created tokenizeOnly*() functions, for testing tokenizing separately from cleaning,
  and a cleanC(), where both new separate functions are implemented in C

* tokenize() now has a new option, cpp=TRUE, to use a C++ tokenizer and cleaner, resulting
  in much faster text tokenization and cleaning, including that used in dfm()

* textmodel_wordfish now implemented entirely in C for speed.  No std errors yet but 
  coming soon.  No predict method currently working either.

* ie2010Corpus, and exampleString now moved into quanteda (formerly were only in 
  quantedaData because of non-ASCII characters in each - solved with native2ascii
  and \uXXXX encodings).

* All dependencies, even conditional, to the quantedaData and austin packages
  have been removed.

quanteda 0.7.1
==============
Many major changes to the syntax in this version.

* trimdfm, flatten.dictionary, the textfile functions, dictionary converters
  are all gone from the NAMESPACE

* formals changed a bit in clean(), kwic().  

* compoundWords() -> phrasetotoken()

* Cleaned up minor issues in documentation.

* countSyllables data object renamed to englishSyllables.Rdata, and function
  renamed to syllables().

* stopwordsGet() changed to stopwords().  stopwordsRemove() changed to 
  removeFeatures().

* new dictionary() constructor function that also does import and conversion,
  replacing old readWStatdict and readLIWCdict functions.

* one function to read in text files, called `textsource`, that does the 
  work for different file types based on the filename extension, and works
  also for wildcard expressions (that can link to directories for example)


quanteda 0.7.0
==============
* dfm now sparse by default, implemented as subclasses of the 
  Matrix package.  Option dfm(..., matrixType="sparse") is now
  the default, although matrixType="dense" will still produce the
  old S3-class dfm based on a regular matrix, and all dfm methods
  will still work with this object.

* Improvements to: weight(), print() for dfms.

* New methods for dfms: docfreq(), weight(), summary(), as.matrix(), 
  as.data.frame.


quanteda 0.6.6
==============
* No more depends, all done through imports.  Passes clean check.
  The start of our reliance more on the master branch rather than
  having merges from dev to master happen only once in a blue moon.

* bigrams in dfm() when bigrams=TRUE and ignoredFeatures=<something>
  now removed if any bigram contains an ignoredFeature

* stopwordsRemove() now defined for sparse dfms and for collocations.

* stopwordsRemove() now requires an explicit stopwords=<char> argument,
  to emphasize the user's responsibility for applying stopwords.


quanteda 0.6.5
==============
* New engine for dfm now implemented as standard, using data.table and
  Matrix for fast, efficient (sparse) matrixes.

* Added trigram collocations (n=3) to collocations().

* Improvements to clean(): Minor fixes to clean() so that removeDigits=TRUE 
  removes €10bn entirely and not just the €10. clean() now removes http and 
  https URLs by default, although does not 
  preserve them (yet).  clean also handles numbers better, to remove
  1,000,000 and 3.14159 if removeDigits=TRUE but not crazy8 or 4sure.

* dfm works for documents that contain no features, including for
  dictionary counts. Thanks to Kevin Munger for catching this.

quanteda 0.6.4
==============
* first cut at REST APIs for Twitter and Facebook

* some minor improvements to sentence segmentation

* improvements to package dependencies and imports - but this is ongoing!

* Added more functions to dfms, getting there...

* Added the ability to segment a corpus on tags (e.g. ##TAG1 text text, ##TAG2) 
  and have the document split using the tags as a delimiter and the tag then
  added to the corpus as a docvar.


quanteda 0.6.3
==============

* added textmodel_lda support, including LDA, CTM, and STM.  Added a 
  converter dfm2stmformat() between dfm and stm's input format.

* as.dfm works now for data.frame objects

* added Arabic to list of stopwords.  (Still working on a stemmer for Arabic.)

quanteda 0.6.2
==============

* The first appearance of dfms(), to create a sparse Matrix using the 
  Matrix package.  Eventually this will become the default format for 
  all but small dfms.  Not only is this far more efficient, it is also
  much faster.

* Minor speed gains for clean() -- but still much more work to be done
  with clean().

quanteda 0.6.1
==============

* started textmodel_wordfish, textmodel_ca.  textmodel_wordfish takes an mcmc argument
  that calls JAGS wordfish.

* now depends on ca, austin rather than importing them

* dfm subsetting with [,] now works

* docnames()[], []<-, docvars()[] and []<- now work correctly


quanteda 0.6.0 
==============

* Added textmodel for scaling and prediction methods, including for starters,
  wordscores and naivebayes class models.  LIKELY TO BE BUGGY AND QUIRKY FOR A WHILE.

* Added smoothdfm() and weight() methods for dfms.

* Fixed a bug in segmentSentence().


quanteda 0.5.8 
==============

Classification and scaling methods
----------------------------------

* New dfm methods for fitmodel(), predict(), and specific model fitting
  and prediction methods called by these, for classification and scaling 
  of different "textmodel" types, such as wordscores and Naive Bayes
  (for starters).

quanteda 0.5.7
==============

* added compoundWords() to turn space-delimited phrases into single "tokens".
  Works with dfm(, dictionary=) if the text has been pre-processed with 
  compoundWords() and the dictionary joins phrases with the connector ("_").
  May add this functionality to be more automatic in future versions.

* new keep argument for trimdfm() now takes a regular expression for which 
  feature labels to retain.  New defaults for minDoc and minCount (1 each).

* added nfeature() method for dfm objects.

New arguments for dfm()
-----------------------

* thesaurus: works to record equivalency classes
  as lists of words or regular expressions for a given key/label.

* keep: regular expression pattern match for features to keep


quanteda 0.5.6
==============

* added readLIWCdict() to read LIWC-formatted dictionaries

* fixed a "bug"/feature in readWStatDict() that eliminated wildcards (and all other
  punctuation marks) - now only converts to lower.

* improved clean() functions to better handle Twitter, punctuation, and 
  removing extra whitespace

quanteda 0.5.5
==============

* fixed broken dictionary option in dfm()

* fixed a bug in dfm() that was preventing clean() options from being passed through

* added Dice and point-wise mutual information as association measures for
  collocations()

* added: similarity() to implement similarity measures for documents or features
  as vector representations

* begun: implementing dfm resample methods, but this will need more time to work.  
  (Solution: a three way table where the third dim is the resampled text.)

* added is.resample() for dfm and corpus objects

* added Twitter functions: getTweets() performs a REST search through twitteR,
  corpus.twitter creates a corpus object with test and docvars form each tweet
  (operational but needs work)

* added various resample functions, including making dfm a multi-dimensional object
  when created from a resampled corpus and dfm(, bootstrap=TRUE).

* modified the print.dfm() method.


quanteda 0.5.4 
==============

* updated corpus.directory to allow specification of the file extension mask

* updated docvars<- and metadoc<- to take the docvar names from the assigned data.frame if
  field is omitted. 

* added field to docvars()

* enc argument in corpus() methods now actually converts from enc to "UTF-8"

* started working on clean to give it exceptions for @ # _ for twitter text and
  to allow preservation of underscores used in bigrams/collocations.

* Added: a `+` method for corpus objects, to combine a corpus using this operator.

* Changed and fixed: collocations(), which was not only fatally slow and inefficient,
  but also wrong.  Now is much faster and O(n) because it uses data.table and vector 
  operations only.

* Added: resample() for corpus texts.

quanteda 0.5.3
==============

* added statLexdiv() to compute the lexical diversity of texts from a dfm.

* minor bug fixes; update to print.corpus() output messages.

* added a wrapper function for SnowballC::wordStem, called wordstem(), so that
  this can be imported without loading the whole package.


quanteda 0.5.2
==============

* Added a corpus constructor method for the VCorpus class object from the tm package.

* added zipfiles() to unzip a directory of text files from disk or a URL, for easy
  import into a corpus using corpus.directory(zipfiles())


quanteda 0.5.1
==============

* Fixed all the remaining issues causing warnings in R CMD CHECK, now all are fixed.  
  Mostly these related to documentation.

* Fixed corpus.directory to better implementing naming of docvars, if found.

* Moved twitter.R to the R_NEEDFIXING until it can be made to pass tests.  Apparently
  setup_twitter_oauth() is deprecated in the latest version of the twitteR package.

quanteda 0.5.0
==============

Lots of new functions
---------------------

* plot.dfm method for producing word clouds from dfm objects

* print.dfm, print.corpus, and summary.corpus methods now defined

* new accessor functions defined, such as docnames(), settings(), docvars(),
  metadoc(), metacorpus(), encoding(), and language()

* replacement functions defined that correspond to most of the above
  accessor functions, e.g. encoding(mycorpus) <- "UTF-8"

* segment(x, to=c("tokens", "sentences", "paragraphs", "other", ...) now
  provides an easy and powerful method for segmenting a corpus by units
  other than just tokens

* a settings() function has been added to manage settings that would commonly govern
  how texts are converted for processing, so that these can be preserved in a corpus
  and applied to operations that are relevant.  These settings also propagate to a
  dfm for both replication purposes and to govern operations for which they would be
  relevant, when applied to a dfm.

Old functions vastly improved
-----------------------------

* better ways now exist to manage corpus internals, such as through the 
  accessor functions, rather than trying to access the internal structure of 
  the corpus directly.

* basic functions such as tokenize(), clean(), etc are now faster, neater, and
  operate generally on vectors and return consistent object types

Better object and class design
------------------------------

* the corpus object has been redesigned with more flexible components, including
  a settings list, better corpus-level metadata, and smarter implementation of 
  document-level attributes including user-defined variables (docvars) and document-
  level meta-data (metadoc)

* the dfm now has a proper class definition, including additional attributes that 
  hold the settings used to produce the dfm.

* all important functions are now defined as methods for classes of built-in (e.g.
  character) objects, or quanteda objects such as a corpus or dfm.  Lots of functions
  operate on both, for instance dfm.corpus(x) and dfm.character(x).

more complete documentation
---------------------------

* all functions are now documented and have working examples

* quanteda.pdf provides a pdf version of the function documentation in one easy-to-access
  document

