Tag¶
A module for tagging Corpus instances.
This module provides a default pos_tagger that can be used for POSTagging an English corpus:
>>> from orangecontrib.text.corpus import Corpus
>>> from orangecontrib.text.tag import pos_tagger
>>> corpus = Corpus.from_file('deerwester.tab')
>>> tagged_corpus = pos_tagger.tag_corpus(corpus)
>>> tagged_corpus.pos_tags[0] # you can use `pos_tags` attribute to access tags directly
['JJ', 'NN', 'NN', 'IN', 'NN', 'NN', 'NN', 'NNS']
>>> next(tagged_corpus.ngrams_iterator(include_postags=True)) # or `ngrams_iterator` to iterate over documents
['human_JJ', 'machine_NN', 'interface_NN', 'for_IN', 'lab_NN', 'abc_NN', 'computer_NN', 'applications_NNS']
-
class
orangecontrib.text.tag.
POSTagger
(tagger, name=’POS Tagger’)[source]¶ A class that wraps nltk.TaggerI and performs Corpus tagging.
-
tag_corpus
(corpus, **kwargs)[source]¶ Marks tokens of a corpus with POS tags.
Parameters: corpus (orangecontrib.text.corpus.Corpus) – A corpus instance.
-
-
class
orangecontrib.text.tag.
StanfordPOSTagger
(*args, **kwargs)[source]¶ -
classmethod
check
(path_to_model, path_to_jar)[source]¶ Checks whether provided path_to_model and path_to_jar are valid.
Raises: ValueError
– in case at least one of the paths is invalid.Notes
Can raise an exception if Java Development Kit is not installed or not properly configured.
Examples
>>> try: ... StanfordPOSTagger.check('path/to/model', 'path/to/stanford.jar') ... except ValueError as e: ... print(e) Could not find stanford-postagger.jar jar file at path/to/stanford.jar
-
classmethod