|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcz.cvut.felk.newsgroup.preprocess.ModelBuilder
public class ModelBuilder
Helper class used to build a model from thes set of training examples.
This class consumes files from the testing set and produces a model at its output.
Field Summary | |
---|---|
private static int |
LIMIT
|
private Set<String> |
partialModel
Partially constructed model. |
Constructor Summary | |
---|---|
ModelBuilder()
|
Method Summary | |
---|---|
Model |
createModel()
|
void |
parseFile(String targetClass,
BufferedReader fileContent)
Parses the given file and extracts the information into the model. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final int LIMIT
private Set<String> partialModel
Currently the partial model is a set of words. In order to improve it, you can start "counting the frequency" of each word.
Constructor Detail |
---|
public ModelBuilder()
Method Detail |
---|
public void parseFile(String targetClass, BufferedReader fileContent) throws IOException
Currently the method only takes the set of words in a file regardless of the newsgroup, from which the file comes from. One possible idea is to focus on such words, which have "interesting" statistical distribution among different newsgroups.
Another suggestion: You can to use the natural language parser and extract the subject and verb from the each sentence. See [Project Home]/lib/stanford-parser/ParserDemo.java for inspiration.
targetClass
- fileContent
-
IOException
public Model createModel()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |