-
Notifications
You must be signed in to change notification settings - Fork 15
OnlineLearner Tutorial
The OnlineTextClassifierLearner allows you to add documents to a learner by passing in a document string rather than a document span. The OnlineTextClassifierLearner also returns a TextClassifier with a call to getTextClassifier() which gives you the score of a document string rather than a document span.
OnlineTextClassifierLearner API:
public interface OnlineTextClassifierLearner{
/** Provide document string with a label and add to the learner */
public void addDocument(String label, String text);
/** Returns the TextClassifier */
public TextClassifier getTextClassifier();
/** Returns the Classifier */
public Classifier getClassifier();
/** Tells the learner that no more examples are coming */
public void completeTraining();
/** Erases all previous data from the learner */
public void reset();
/** Returns an array of SpanTypes that can be added to the learner */
public String[] getTypes();
/** Returns an annotated copy of TextLabels */
public TextLabels annotatedCopy(TextLabels labels);
}Currently OnlineBinaryTextClassifierLearner is the only implementation of the OnlineTextClassifierLearner. OnlineBinaryTextClassifierLearner constructors:
/** Accepts an OnlineLearner and a document string with no previous labeled data */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType)
/** Accepts an OnlineLearner, a SpanType, and labeled data to add to the learner */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData)
/** Accepts an OnlineLearner, a SpanType, labeled data to add to the learner, and a SpanFeatureExtractor */
public OnlineBinaryTextClassifierLearner(OnlineClassifierLearner learner, String spanType, TextLabels labeledData, SpanFeatureExtractor fe)TextClassifier API:
public interface TextClassifier{
/** Returns the weight for a string being in the positive class */
public double score(String text);
}MinorThird has a built-in test class for these OnlineText classes in ui/OnlineLearner. This class accept most of the same variables as TrainClassifier. To see all options do:
$ java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –helpThese two options are required:
-
-labels REPOSITORY_KEY- the data you would like to label and add to the learner -
-spanType SPAN_TYPE- what SpanType would you like to label
Note: SPAN_TYPE will appear in a pull down list in the GUI only if you specify -labeledData with the same SpanType otherwise you must specify this variable on the command line.
And you must specify one of these below:
-
-learner- theOnlineLearneryou would like to use -
-loadFrom- load a previously savedTextLearner
Optional variables:
-
-labeledData- previously labeled data that you would like to add to the learner before labeling more data -
-fe- the feature extractor you would like to use
Let's try an example:
$ java –Xmx500M edu.cmu.minorthird.ui.OnlineLearner –unlabeledData sample3.unlabeled –labels sample3.train –spanType fun –learner "new NaiveBayes()" –guiA window much like the other GUI windows should appear. From there you can edit any other parameters you would like to change. Once you are satisfied with your options, press Start Task. A window that looks like this should appear:

Note: you will have to expand the window to see the labels on all the buttons.
Note: you will be able to highlight MinorThird's prediction as soon as the window pops up, but you will only be able to compare to the actual label (in this case fun) once you have labeled one of the documents.
You will notice that –choose label- is a pull-down menu where you can select either the SpanType you trained on or NOT the SpanType you trained on. In this case the menu has the items fun and NOTfun.
Here is a summary of each of the buttons on the bottom of the window:
-
Up- scroll up one document -
Down- scroll down one document -
-choose label-- label the current document as one of the items in the pull down menu -
Add Doc(s)- add all labeled documents to the classifier, note once documents are added, their labels cannot be changed -
Show Classifier- pops up a window for the classifier for all trained data -
Save TextLearner- saves the TextLearner including all new data added -
Reset- erase all previous example, reset classifier -
Complete Training- let the classifier know there will be no new examples -
Save- save the labels you have added toDIRECTORY_NAME.labels