Preprocessor¶
-
class
orangecontrib.text.preprocess.
Preprocessor
(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)¶ Holds pre-processing flags and other information, about stop word removal, lowercasing, text morphing etc.(the options are set via the Preprocess widget).
-
Preprocessor.
__init__
(incl_punct=False, lowercase=True, stop_words='english', trans=None, min_df=1)¶ Parameters: - incl_punct (boolean) – Determines whether the tokenizer should include punctuation in the tokens.
- lowercase (boolean) – If set, transform the tokens to lower case, before returning them.
- stop_words ('english' or list or None) – Determines whether stop words should(“english”), or should not(None) be removed. If this is list, it should contain stopwords.
- trans – An optional pre-processor object to perform the morphological transformation on the tokens before returning them.
Returns: class: orangecontrib.text.preprocess.Preprocessor