|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.lda.LdaGibbsSampler org.knowceans.sandbox.hlda.HldaGibbsSampler
public class HldaGibbsSampler
Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus, using a hierarchical approach with a Chinese restaurant process prior. The algorithm is introduced in Blei et al. paper "Hierarchical Topic Models and the Nested Chinese Restaurant Process" by David M. Blei, Thomas L. Griffiths, Michael I. Jordan and Joshua B. Tenenbaum (2004).
Field Summary | |
---|---|
(package private) NestedCrpNode[][] |
c
c[m][ell] is the restaurant corresponding to the ell'th topic for document m. |
(package private) int |
M
number of documents |
(package private) NestedCrpNode |
ncrp
the nested CRP structure into which c points |
(package private) int[][][] |
ncw
ncw[ell][m][v] number of times topic ell was assigned word v in document m (need doc index because topics are indexed from the CRP tree nodes and need document-specific counts). |
(package private) int[][] |
ncwsum
ncwsum[ell][m] number of words from m assigned to topic ell |
Fields inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler |
---|
backupIteration, conf, dispcol, numstats, phisum, rand, state, thetasum |
Constructor Summary | |
---|---|
HldaGibbsSampler(int[][] documents,
int V)
Initialise the Gibbs sampler with data. |
Method Summary | |
---|---|
NestedCrpNode |
getNcrp()
Retrieve estimated nested topic structure. |
private void |
gibbs(int L,
double alpha,
double eta,
double gamma)
Main method: Select initial state. |
void |
initialState(int L,
double gamma)
Initialisation: LDA: Must start with an assignment of observations to topics. |
static void |
main(java.lang.String[] args)
|
protected NestedCrpNode[] |
sampleCrpFullConditional(int m)
Sample a tree path c_m from the full conditional distribution: p(c_m | c_-m, w_-m, z) \propto p(w_m | c, w_-m, z) p(c_m | c_-m), where p(c_m | c_-m) = m_i / (gamma + m - 1) for occupied tables and = gamma / (gamma + m - 1) for unoccupied tables is the CRP prior. |
protected void |
updateParams()
Add to the statistics the values of theta and phi for the current state. |
Methods inherited from class org.knowceans.dirichlet.lda.LdaGibbsSampler |
---|
getPhi, getState, getTheta, gibbs, gibbs, gibbsHeap, gibbsHeap, initialState, load, output, run, sampleCorpus, sampleLdaFullConditional, save, saveState, updatePhi, updateTheta, writeParameters |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
NestedCrpNode[][] c
NestedCrpNode ncrp
int M
int[][][] ncw
int[][] ncwsum
Constructor Detail |
---|
public HldaGibbsSampler(int[][] documents, int V)
documents
- document contentV
- vocabulary sizeMethod Detail |
---|
public void initialState(int L, double gamma)
L
- number of topics (same as L in super)private void gibbs(int L, double alpha, double eta, double gamma)
"Conceptually, we divide the Gibbs sampler into two parts. First, given the current state of the CRP, we sample the z_m,n variables of the underlying LDA model following the [LdaGibbsSampler] algorithm [...]. Second, given the values of the LDA hidden variables, we sample the cm;` variables which are associated with the CRP prior." [HLDA paper]
L
- number of topicsalpha
- hyperprior parameter for document--topic associationeta
- hyperprior parameter for topic--term association (= Griffiths'
beta when phi is used for Blei's beta matrix)gamma
- CRP concentration parameterprotected NestedCrpNode[] sampleCrpFullConditional(int m)
p(w_m | c, w_-m, z) = Prod_ell=1..L (Gamma(nsum_-m[ c[m][ell] ] + V * beta)) / (Prod_w Gamma(n_-m[ c[m][ell] ][w] + beta)) * (Prod_w Gamma(n_-m[ c[m][ell] ][w] + n_m[ c[m][ell] ][w] + beta)) / (Gamma(nsum_-m[ c[m][ell] ] + nsum_m[ c[m][ell] ] + V * beta)) *
m
- documentn
- wordprotected void updateParams()
updateParams
in class LdaGibbsSampler
public NestedCrpNode getNcrp()
public static void main(java.lang.String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |