|(require pydrnlp/trends)||package: pydrnlp|
Core engine for the “Trends” tool.
= (let* ((this_module_revision 10)) (list this_module_revision (pydrnlp.language.revision)))
Tokenizes jsexpr. TODO: document this.
Tokens which do not satisfy should_use_token with this language are discarded.
The purpose of the “text” field is to provide an example of an actual use of the word, as the lemma FIXME, but some words (e.g. “DuFay”) shouldn’t be. (Also, some lemmas are strange, like “whatev”.)
def should_use_token(token, *, lang)
Recognizes tokens which should be included in counting with respect to the given spacy.language.Language instance.
Some kinds of tokens which are excluded:
stop words; and
tokens which have a “boring” part-of-speech tag.
Part-of-speech tags that are considered “boring” notably include "NUM" (numeral) and "SYM" (symbol). Currently, all part-of-speech tags are considered “boring” except for "NOUN" and "PROPN" (i.e. proper and common nouns).