|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.corpus.NumCorpus org.knowceans.corpus.LabelNumCorpus
public class LabelNumCorpus
Represents a corpus of documents, using numerical data only.
Field Summary | |
---|---|
static java.lang.String[] |
EXTENSIONS
|
Fields inherited from interface org.knowceans.corpus.ILabelCorpus |
---|
LAUTHORS, LCATEGORIES, LDOCS, LREFERENCES, LTAGS, LTERMS, LVOLS, LYEARS |
Constructor Summary | |
---|---|
LabelNumCorpus()
|
|
LabelNumCorpus(NumCorpus corp)
create label corpus from standard one |
|
LabelNumCorpus(java.lang.String dataFilebase)
|
|
LabelNumCorpus(java.lang.String dataFilebase,
boolean parmode)
|
|
LabelNumCorpus(java.lang.String dataFilebase,
int readlimit,
boolean parmode)
|
Method Summary | |
---|---|
int[][] |
getDocLabels(int kind)
loads and returns the document labels of given kind |
int |
getLabelsMaxN(int kind)
return the maximum number of labels in any document |
int |
getLabelsV(int kind)
get the number of distinct labels in the label field |
int |
getLabelsW(int kind)
get the number of tokens in the label field |
static void |
main(java.lang.String[] args)
test corpus reading and splitting |
void |
split(int order,
int split,
java.util.Random rand)
splits two child corpora of size 1/nsplit off the original corpus, which itself is left unchanged (except storing the splits). |
void |
write(java.lang.String pathbase)
write the corpus to to a file. |
Methods inherited from class org.knowceans.corpus.NumCorpus |
---|
getDoc, getDocParBounds, getDocs, getDocTermsFreqs, getDocWordParBounds, getDocWords, getDocWords, getNumDocs, getNumTerms, getNumTerms, getNumWords, getNumWords, getOrigDocIds, getTestCorpus, getTrainCorpus, mergeDocPars, read, reduce, setDoc, setDocs, toString |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Methods inherited from interface org.knowceans.corpus.ICorpus |
---|
getDocWords, getDocWords, getNumDocs, getNumTerms, getNumWords |
Field Detail |
---|
public static final java.lang.String[] EXTENSIONS
Constructor Detail |
---|
public LabelNumCorpus()
public LabelNumCorpus(java.lang.String dataFilebase)
dataFilebase
- (filename without extension)public LabelNumCorpus(java.lang.String dataFilebase, boolean parmode)
dataFilebase
- (filename without extension)parmode
- if true read paragraph corpuspublic LabelNumCorpus(java.lang.String dataFilebase, int readlimit, boolean parmode)
dataFilebase
- (filename without extension)readlimit
- number of docs to reduce corpus when reading (-1 = unlimited)parmode
- if true read paragraph corpuspublic LabelNumCorpus(NumCorpus corp)
corp
- Method Detail |
---|
public int[][] getDocLabels(int kind)
getDocLabels
in interface ILabelCorpus
kind
- of labels
public int getLabelsMaxN(int kind)
kind
-
public int getLabelsW(int kind)
ILabelCorpus
getLabelsW
in interface ILabelCorpus
public int getLabelsV(int kind)
ILabelCorpus
getLabelsV
in interface ILabelCorpus
public void split(int order, int split, java.util.Random rand)
NumCorpus
split
in interface ISplitCorpus
split
in class NumCorpus
order
- number of partitionssplit
- 0-based split of corpus returnedrand
- random source (null for reusing existing splits)public void write(java.lang.String pathbase) throws java.io.IOException
NumCorpus
write
in class NumCorpus
java.io.IOException
public static void main(java.lang.String[] args)
args
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |