Serialized Form


Package org.knowceans.corpus.analysis

Class org.knowceans.corpus.analysis.VariationOfInformationAnalyser extends java.lang.Object implements Serializable

serialVersionUID: 1L

Serialized Fields

docCategories

int[][] docCategories
docCategories sparse matrix (will be )


catDocuments

org.knowceans.map.HashMultiMap<X,Y> catDocuments
sparse transpose of docCategories


nCats

int nCats
number of categories


nDocs

int nDocs
number of documents


nValidDocs

int nValidDocs
number of valid documents


nTopics

int nTopics
number of topics


theta

double[][] theta
the document--topic associations (theta)


iptc

IptcCategories iptc
IPTC-Codes


outfile

java.lang.String outfile

comment

java.lang.String comment

sumPCatDoc

double sumPCatDoc

hierup

boolean hierup

hierdown

boolean hierdown

includeunknown

boolean includeunknown

Package org.knowceans.corpus.parsers.dpa

Class org.knowceans.corpus.parsers.dpa.IptcCategories extends java.lang.Object implements Serializable

serialVersionUID: 1L

Serialized Fields

iptcMap

java.util.HashMap<K,V> iptcMap

iptcIndex

java.util.Vector<E> iptcIndex

Package org.knowceans.dirichlet.atm

Class org.knowceans.dirichlet.atm.AtmGibbsQuerySampler extends AtmGibbsSampler implements Serializable

serialVersionUID: -1772624211078485094L

Serialized Fields

atmstateq

AtmMarkovState atmstateq
stateq contains the query documents. Its nw and nwsum fields are shared with the corpus state.


stateSave

AtmMarkovState stateSave
stateSave contains the saved markov state (after initially loading the state. It is a complete copy of the


thetasumq

double[][] thetasumq

Class org.knowceans.dirichlet.atm.AtmGibbsSampler extends LdaGibbsSampler implements Serializable

serialVersionUID: 1L

Serialized Fields

atmstate

AtmMarkovState atmstate
State variables of the Lda gibbs sampler. The contract is to set this pointer equal to the pointer on LdaMarkovState LdaGibbsSampler.state.

Class org.knowceans.dirichlet.atm.AtmMarkovState extends LdaMarkovState implements Serializable

serialVersionUID: 1L

Serialized Fields

ad

int[][] ad
document authors [M][]


x

int[][] x
word authors


A

int A
total number of authors


Package org.knowceans.dirichlet.lda

Class org.knowceans.dirichlet.lda.ExtLdaConfiguration extends LdaConfiguration implements Serializable

serialVersionUID: 1451977000310757825L

Serialized Fields

hyperMethod

int hyperMethod

backupInterval

int backupInterval

interSamples

int[] interSamples

corpusbase

java.lang.String corpusbase

outfilebase

java.lang.String outfilebase

interSave

boolean interSave

interTopics

boolean interTopics

forQuery

boolean forQuery

doNative

boolean doNative

Class org.knowceans.dirichlet.lda.LdaConfiguration extends java.lang.Object implements Serializable

serialVersionUID: -812799395305637798L

Serialized Fields

K

int K
number of topics


alpha

double alpha
Dirichlet parameter (document--topic associations)


beta

double beta
Dirichlet parameter (topic--term associations)


burnIn

int burnIn
burn-in period


iterations

int iterations
max iterations


sampleLag

int sampleLag
sample lag (if -1 only one sample taken)


thinInterval

int thinInterval
sampling lag (?)

Class org.knowceans.dirichlet.lda.LdaGibbsQuerySampler extends LdaGibbsSampler implements Serializable

serialVersionUID: -1772624211078485094L

Serialized Fields

stateq

LdaMarkovState stateq
stateq contains the query documents. Its nw and nwsum fields are shared with the corpus state.


stateSave

LdaMarkovState stateSave
stateSave contains the saved markov state (after initially loading the state. It is a complete copy of the


thetasumq

double[][] thetasumq

Class org.knowceans.dirichlet.lda.LdaGibbsSampler extends java.lang.Object implements Serializable

serialVersionUID: 1L

Serialized Fields

conf

ExtLdaConfiguration conf
Configuration object with the current parameters.


state

LdaMarkovState state
State variables of the Lda gibbs sampler.


thetasum

double[][] thetasum
cumulative statistics of theta


phisum

double[][] phisum
cumulative statistics of phi


backupIteration

int backupIteration
iteration in the last backup


numstats

int numstats
size of statistics


dispcol

int dispcol

rand

java.util.Random rand
Random generator

Class org.knowceans.dirichlet.lda.LdaMarkovState extends java.lang.Object implements Serializable

serialVersionUID: 3380802444939432054L

Serialized Fields

w

int[][] w
w_m,n: word vectors of the corpus.


V

int V
V: size of vocabulary.


z

int[][] z
z_m,n: topic assignments z_m,n for each word (= term occurrence).


nw

int[][] nw
n_k,t: nw[k][t] number of instances of term t assigned to topic k. In subclasses, this can be generalised as the number of associations between topics and latent-semantic "minor" items (words, links, recommendations).


nd

int[][] nd
n_m,k: nd[m][k] number of words (not terms!) in document m assigned to topic k. This should be equal to the document length. In subclasses, this can be generalised as the number associations between latent-semantic "major" items (documents, authors, searchers, recommenders) and topics.


nwsum

int[] nwsum
n_k: nwsum[k] total number of words (not terms!) assigned to topic k.


ndsum

int[] ndsum
n_m: ndsum[m] total number of words in document m.


Package org.knowceans.dirichlet.sandbox

Class org.knowceans.dirichlet.sandbox.LdaGibbsSamplerDpa extends java.lang.Object implements Serializable

serialVersionUID: 1L

Serialized Fields

documents

int[][] documents
Deprecated. 
document data (term lists)


V

int V
Deprecated. 
vocabulary size


K

int K
Deprecated. 
number of topics


alpha

double alpha
Deprecated. 
Dirichlet parameter (document--topic associations)


beta

double beta
Deprecated. 
Dirichlet parameter (topic--term associations)


z

int[][] z
Deprecated. 
topic assignments for each word.


nw

int[][] nw
Deprecated. 
cwt[k][j] number of instances of word j (term?) assigned to topic k.


nd

int[][] nd
Deprecated. 
nd[d][k] number of words in document d assigned to topic k.


nwsum

int[] nwsum
Deprecated. 
nwsum[k] total number of words assigned to topic k.


ndsum

int[] ndsum
Deprecated. 
ndsum[d] total number of words in document d.


thetasum

double[][] thetasum
Deprecated. 
cumulative statistics of theta


phisum

double[][] phisum
Deprecated. 
cumulative statistics of phi


numstats

int numstats
Deprecated. 
size of statistics


interSamples

int[] interSamples
Deprecated. 
number of iteration at which intermediate (single) samples are taken.


THIN_INTERVAL

int THIN_INTERVAL
Deprecated. 
sampling lag (?)


BURN_IN

int BURN_IN
Deprecated. 
burn-in period


ITERATIONS

int ITERATIONS
Deprecated. 
max iterations


SAMPLE_LAG

int SAMPLE_LAG
Deprecated. 
sample lag (if -1 only one sample taken)


dispcol

int dispcol
Deprecated. 

corpusname

java.lang.String corpusname
Deprecated. 

outfilename

java.lang.String outfilename
Deprecated. 

messagetext

java.lang.String messagetext
Deprecated. 

messageheader

java.lang.String messageheader
Deprecated. 

messagerecipients

java.lang.String[] messagerecipients
Deprecated. 

interSave

boolean interSave
Deprecated. 

analyser

VariationOfInformationAnalyser analyser
Deprecated. 

interTopics

boolean interTopics
Deprecated. 

backupIteration

int backupIteration
Deprecated. 
iteration in the last backup


timeElapsed

long timeElapsed
Deprecated. 

backupInterval

int backupInterval
Deprecated. 

t0

long t0
Deprecated. 

Class org.knowceans.dirichlet.sandbox.LdaGibbsSamplerHyper extends LdaGibbsSampler implements Serializable

serialVersionUID: 1L

Serialized Fields

alphaSampler

InvGammaArms alphaSampler

betaSampler

InvGammaArms betaSampler

alphaParams

int[] alphaParams

betaParams

int[] betaParams

state

LdaMarkovStateHyper state

Class org.knowceans.dirichlet.sandbox.LdaMarkovStateHyper extends LdaMarkovState implements Serializable

Serialized Fields

valpha

double[] valpha
K hyperparameters alpha trained from observations, not used for a priori alpha.


vbeta

double[] vbeta
V hyperparameters beta trained from observations; not used for a priori beta.


alpha

double alpha
symmetric hyperparameter alpha or sum of valpha trained from observations, not used for a priori alpha.


beta

double beta
symmetric hyperparameter beta or sum of vbeta trained from observations; not used for a priori beta.


Package org.knowceans.sandbox

Class org.knowceans.sandbox.ListPrinter extends java.util.Vector<E> implements Serializable

serialVersionUID: 3256438097242765105L

Serialized Fields

linewidth

int linewidth

count

int count

entryStart

java.lang.String entryStart

entryEnd

java.lang.String entryEnd

entrySeparator

java.lang.String entrySeparator

listStart

java.lang.String listStart

listEnd

java.lang.String listEnd

Package org.knowceans.sandbox.hlda

Class org.knowceans.sandbox.hlda.HldaGibbsSampler extends LdaGibbsSampler implements Serializable

Serialized Fields

c

NestedCrpNode[][] c
c[m][ell] is the restaurant corresponding to the ell'th topic for document m. (Points at nodes in the hierarchy ncrp.)


ncrp

NestedCrpNode ncrp
the nested CRP structure into which c points


M

int M
number of documents


ncw

int[][][] ncw
ncw[ell][m][v] number of times topic ell was assigned word v in document m (need doc index because topics are indexed from the CRP tree nodes and need document-specific counts). TODO: also put the totals here which are now non-sparse in tree nodes.


ncwsum

int[][] ncwsum
ncwsum[ell][m] number of words from m assigned to topic ell


Package org.knowceans.sandbox.ilda

Class org.knowceans.sandbox.ilda.IldaGibbsSampler extends LdaGibbsSampler implements Serializable

serialVersionUID: 1L

Serialized Fields

growstep

int growstep
array grow step


nwunrep

int[] nwunrep
number of times a term is unrepresented by current topics


ndunrep

int[] ndunrep
nubmer of unrepresented words in a document


nwsumunrep

int nwsumunrep
total number of unrepresented words