Collocations
Compute significant bigrams and trigrams.
Inputs
Corpus: A collection of documents.
Outputs
Table: A list of bigrams or trigrams.
Collocations finds frequently co-occurring words in a corpus. It displays bigrams or trigrams by the score.
Settings: observe bigrams (sets of two co-occurring words) or trigrams (sets of three co-occurring words). Set the frequency threshold (remove n-grams with frequency lower than the threshold).
Scoring method:
Mi Like
Poisson Stirling
Raw Frequency
Example
Collocations is mostly intended for data exploration. Here, we show how to observe bigrams that occur more than five times in the corpus. Bigrams are computed using the Pointwise Mutual Information statistics.
We use the grimm-tales-selected data in the Corpus and send the data to Collocations.
References
Manning, Christopher, and Hinrich Schütze. 1999. Collocations. Available at: https://nlp.stanford.edu/fsnlp/promo/colloc.pdf