nltk

Topics related to nltk:

Getting started with nltk

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.

The book

Natural Language Processing with Python provides a practical introduction to programming for language processing. Written by the creators of NLTK, it guides the reader through the fundamentals of writing Python programs, working with corpora, categorizing text, analyzing linguistic structure, and more. The book is being updated for Python 3 and NLTK 3. (The original Python 2 version is still available at http://nltk.org/book_1ed .)

Tokenizing

Stop Words

Stemming

Frequency Distributions

POS Tagging

Important points to note

  • The variable word is a list of tokens.
  • Even though item i in the list word is a token, tagging single token will tag each letter of the word.
  • nltk.tag.pos_tag_ accept a
    • list of tokens -- then separate and tags its elements or
    • list of string
  • You can not get the tag for one word, instead you can put it within a list.
  • POS tag