Preprocessor

class orangecontrib.text.preprocess.Preprocessor(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)

Holds pre-processing flags and other information, about stop word removal, lowercasing, text morphing etc.(the options are set via the Preprocess widget).

Preprocessor.__init__(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)
Parameters:
  • incl_punct (boolean) – Determines whether the tokenizer should include punctuation in the tokens.
  • lowercase (boolean) – If set, transform the tokens to lower case, before returning them.
  • stop_words ('english' or list or None) – Determines whether stop words should(“english”), or should not(None) be removed. If this is list, it should contain stopwords.
  • trans – An optional pre-processor object to perform the morphological transformation on the tokens before returning them.
Returns:

class:orangecontrib.text.preprocess.Preprocessor