The Guardian

This module fetches data from The Guardian API.

To use first create TheGuardianCredentials:

>>> from orangecontrib.text.guardian import TheGuardianCredentials
>>> credentials = TheGuardianCredentials('<your-api-key>')

Then create TheGuardianAPI object and use it for searching:

>>> from orangecontrib.text.guardian import TheGuardianAPI
>>> api = TheGuardianAPI(credentials)
>>> corpus = api.search('Slovenia', max_documents=10)
>>> len(corpus)
10
class orangecontrib.text.guardian.TheGuardianCredentials(key)[source]

The Guardian API credentials.

__init__(key)[source]
Parameters:key (str) – The Guardian API key. Use test for testing purposes.
valid

Check if given API key is valid.

class orangecontrib.text.guardian.TheGuardianAPI(credentials, on_progress=None, should_break=None)[source]
__init__(credentials, on_progress=None, should_break=None)[source]
Parameters:
  • credentials (TheGuardianCredentials) – The Guardian Creentials.
  • on_progress (callable) – Function for progress reporting.
  • should_break (callable) – Function for early stopping.
search(query, from_date=None, to_date=None, max_documents=None, accumulate=False)[source]

Search The Guardian API for articles.

Parameters:
  • query (str) – A query for searching the articles by
  • from_date (str) – Search only articles newer than the date provided. Date should be in ISO format; e.g. ‘2016-12-31’.
  • to_date (str) – Search only articles older than the date provided. Date should be in ISO format; e.g. ‘2016-12-31’.
  • max_documents (int) – Maximum number of documents to retrieve. When not given, retrieve all documents.
  • accumulate (bool) – A flag indicating whether to accumulate results of multiple consequent search calls.
Returns:

Corpus