org.knowceans.corpus
Interface ISplitCorpus

All Known Implementing Classes:
LabelNumCorpus, NumCorpus

public interface ISplitCorpus

ISplitCorpus allows a corpus to resize and split a cross validation data set.

Author:
gregor

Method Summary
 int[][] getOrigDocIds()
          get the original ids of documents
 ICorpus getTestCorpus()
          called after split()
 ICorpus getTrainCorpus()
          called after split()
 void split(int order, int split, java.util.Random rand)
          splits two child corpora of size 1/nsplit off the original corpus, which itself is left unchanged (except storing the splits).
 

Method Detail

split

void split(int order,
           int split,
           java.util.Random rand)
splits two child corpora of size 1/nsplit off the original corpus, which itself is left unchanged (except storing the splits). The corpora can be retrieved using getTrainCorpus and getTestCorpus after using this function.

Parameters:
order - number of partitions
split - 0-based split of corpus returned
rand - random source (null for reusing existing splits)

getTrainCorpus

ICorpus getTrainCorpus()
called after split()

Returns:
the training corpus according to the last splitting operation

getTestCorpus

ICorpus getTestCorpus()
called after split()

Returns:
the test corpus according to the last splitting operation

getOrigDocIds

int[][] getOrigDocIds()
get the original ids of documents

Returns:
[training documents, test documents]