NLP IIT POS tagging

NLP POS-tagging (lecture by Pushpak Bhattacharyya)

screen shot 2017-12-31 at 12 57 24 pm

  • NLP= Ambiguity Processing
    • Lexical Ambiguity: dog (noun vs verb), (animal vs detesable person), contexts.
    • Structural Ambiguity
    • Semantic Ambiguity
    • Pragmatic Ambiguity
  • Main methodology
    • A: extract parts & features
    • B: which is in correspondence with A: extract parts and features
    • Learn mapping of these features and parts
    • Apply to new situations (decoding)

POS tagging

  • POS Tagging: attaches to each word in a sentence a part of speech tag from a given set of tags called the Tag-Set
  • A word can have multiple POS tags
  • New examples break rules, so we need a robust system.
  • Generative: HMM
    • screen shot 2017-12-31 at 2 10 46 pm
    • Training: Maximize the likelihood of observations
    • Testing: search the best POS tag sequence in the hypothesis space
    • generate POS tag sequences and score them
    • HMM
    • Given the observation sequence, find the possible state sequences- Viterbi
    • Given the observation sequence, find its probability- forward/backward algorithm
    • Given the observation sequence find the HMM prameters.- Baum-Welch algorithm
  • Discriminative
    • Training: Maximize entropy of probability distribution subject to the constraints from data
    • Testing: discriminate amongst hypotheses by scoring them
    • MEMM, CRF
    • screen shot 2017-12-31 at 2 22 25 pm
  • Classification
    • Training: Minimize loss function (Total sum square error, cross entropy- soft_max)
    • Testing: classify the input into one of the class
    • SVM, neural net
  • Deep Learning
    • Training: minimize the loss function
    • DL = feature discovery + classification
    • classifiy POS tag + discover required features
    • screen shot 2017-12-31 at 2 23 29 pm
  • Deep Learning II - sequence to sequence
    • screen shot 2017-12-31 at 2 30 43 pm

    • An RNN encoder (words -> hidden states) + a RNN decoder (hidden states -> tags)
    • get a richer representations from the sentences
    • long dependencies
      • Represent the source sentence by the set of output vectors from the encoder)
      • Replace RNN encoder to LSTM
    • RNN cann’t see future -> use a bdirectional RNN/LSTM
    • Due to the sequential nature of RNN, parallelism is limited.
    • Compare to CNN: good in parallel, can’t model the whole history, more local.
  • POS tagging vs MT:
    • 1-1 vs 1 to none ~ many
    • the order may change in MT
    • one to many/ many to one
  • Complexity:
    • POS, HMM: linear
    • MT, Bear search: exponential

screen shot 2017-12-31 at 12 57 32 pm