org.knowceans.lda
Class Corpus

java.lang.Object
  extended by org.knowceans.lda.Corpus

public class Corpus
extends java.lang.Object

Represents a corpus of documents.

lda-c reference: struct corpus in lda.h and function in lda-data.c.

Author:
heinrich

Constructor Summary
Corpus(java.lang.String dataFilename)
           
 
Method Summary
 Document getDoc(int index)
           
 Document[] getDocs()
           
 int getNumDocs()
           
 int getNumTerms()
           
 void read(java.lang.String dataFilename)
          read a file in "pseudo-SVMlight" format.
 void setDoc(int index, Document doc)
           
 void setDocs(Document[] documents)
           
 void setNumDocs(int i)
           
 void setNumTerms(int i)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Corpus

public Corpus(java.lang.String dataFilename)
Method Detail

read

public void read(java.lang.String dataFilename)
read a file in "pseudo-SVMlight" format. TODO: make robust against irregular whitespace (duplicate spaces)

Parameters:
dataFilename -

getDocs

public Document[] getDocs()
Returns:

getDoc

public Document getDoc(int index)
Parameters:
index -
Returns:

setDoc

public void setDoc(int index,
                   Document doc)
Parameters:
index -
doc -

getNumDocs

public int getNumDocs()
Returns:

getNumTerms

public int getNumTerms()
Returns:

setDocs

public void setDocs(Document[] documents)
Parameters:
documents -

setNumDocs

public void setNumDocs(int i)
Parameters:
i -

setNumTerms

public void setNumTerms(int i)
Parameters:
i -

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object