Here is a list of resources available in
~cs674/project:
brill: directory for the Brill part of speech tagger
brown: directory for the Brown corpus (part-of-speech tagged)
muc4: directory for a small corpus annotated with
part-of-speech information (text is from the MUC4 corpus)
xwn: executable for WordNet
(be sure to set the environment variable WNSEARCHDIR to
~nlp/Archive/wordnet/dict before you run xwn)
Finally, this site contains a description of
the contents of the Penn Treebank II
collection of annotated text. We have this here at Cornell. You can
use any part of it that you'd like for your projects. (Talk to Francis
about how to access it.)