edu.stanford.nlp.parser.lexparser
Class Options.LexOptions

java.lang.Object
  extended by edu.stanford.nlp.parser.lexparser.Options.LexOptions
All Implemented Interfaces:
Serializable
Enclosing class:
Options

public static class Options.LexOptions
extends Object
implements Serializable

See Also:
Serialized Form

Field Summary
 boolean flexiTag
           
 boolean smartMutation
          Smarter smoothing for rare words.
 int smoothInUnknownsThreshold
          Words more common than this are tagged with MLE P(t|w).
 int unknownPrefixSize
          For certain Lexicons, a certain number of word-initial letters are used to subclassify the unknown token.
 int unknownSuffixSize
          For certain Lexicons, a certain number of word-final letters are used to subclassify the unknown token.
 boolean useUnicodeType
          Make use of unicode code point types in smoothing.
 int useUnknownWordSignatures
          Whether to use suffix and capitalization information for unknowns.
 String uwModel
          Model for unknown words that the lexicon should use
 
Constructor Summary
Options.LexOptions()
           
 
Method Summary
 void readData(BufferedReader in)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

useUnknownWordSignatures

public int useUnknownWordSignatures
Whether to use suffix and capitalization information for unknowns. Within the BaseLexicon model options have the following meaning: 0 means a single unknown token. 1 uses suffix, and capitalization. 2 uses a variant (richer) form of signature. Good. Use this one. Using the richer signatures in versions 3 or 4 seems to have very marginal or no positive value. 3 uses a richer form of signature that mimics the NER word type patterns. 4 is a variant of 2. 5 is another with more English specific morphology (good for English unknowns!). 6-9 are options for Arabic. 9 codes some patterns for numbers and derivational morophology, but also supports unknownPrefixSize and unknownSuffixSize. For German, 0 means a single unknown token, and non-zero means to use capitalization of first letter and a suffix of length unknownSuffixSize.


smoothInUnknownsThreshold

public int smoothInUnknownsThreshold
Words more common than this are tagged with MLE P(t|w). Default 100. The smoothing is sufficiently slight that changing this has little effect.


smartMutation

public boolean smartMutation
Smarter smoothing for rare words.


useUnicodeType

public boolean useUnicodeType
Make use of unicode code point types in smoothing.


unknownSuffixSize

public int unknownSuffixSize
For certain Lexicons, a certain number of word-final letters are used to subclassify the unknown token. This gives the number of letters.


unknownPrefixSize

public int unknownPrefixSize
For certain Lexicons, a certain number of word-initial letters are used to subclassify the unknown token. This gives the number of letters.


uwModel

public String uwModel
Model for unknown words that the lexicon should use


flexiTag

public boolean flexiTag
Constructor Detail

Options.LexOptions

public Options.LexOptions()
Method Detail

toString

public String toString()
Overrides:
toString in class Object

readData

public void readData(BufferedReader in)
              throws IOException
Throws:
IOException


Stanford NLP Group