Preprocessor¶

class orangecontrib.text.preprocess.Preprocessor(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)¶: Holds pre-processing flags and other information, about stop word removal, lowercasing, text morphing etc.(the options are set via the Preprocess widget).

Preprocessor.__init__(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)¶

Parameters:

incl_punct (boolean) – Determines whether the tokenizer should include punctuation in the tokens.
lowercase (boolean) – If set, transform the tokens to lower case, before returning them.
stop_words ('english' or list or None) – Determines whether stop words should(“english”), or should not(None) be removed. If this is list, it should contain stopwords.
trans – An optional pre-processor object to perform the morphological transformation on the tokens before returning them.

Returns:

class:	orangecontrib.text.preprocess.Preprocessor