org.knowceans.lda
Class Corpus
java.lang.Object
org.knowceans.lda.Corpus
public class Corpus
- extends java.lang.Object
Represents a corpus of documents.
lda-c reference: struct corpus in lda.h and function in lda-data.c.
- Author:
- heinrich
Constructor Summary |
Corpus(java.lang.String dataFilename)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Corpus
public Corpus(java.lang.String dataFilename)
read
public void read(java.lang.String dataFilename)
- read a file in "pseudo-SVMlight" format. TODO: make robust against
irregular whitespace (duplicate spaces)
- Parameters:
dataFilename
-
getDocs
public Document[] getDocs()
- Returns:
getDoc
public Document getDoc(int index)
- Parameters:
index
-
- Returns:
setDoc
public void setDoc(int index,
Document doc)
- Parameters:
index
- doc
-
getNumDocs
public int getNumDocs()
- Returns:
getNumTerms
public int getNumTerms()
- Returns:
setDocs
public void setDocs(Document[] documents)
- Parameters:
documents
-
setNumDocs
public void setNumDocs(int i)
- Parameters:
i
-
setNumTerms
public void setNumTerms(int i)
- Parameters:
i
-
toString
public java.lang.String toString()
- Overrides:
toString
in class java.lang.Object