|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.topics.simple.IldaGibbs
public class IldaGibbs
LDA Gibbs sampler with nonparametric prior (HDP):
(m,k | alpha * tau | gamma), k->inf, (k,t | beta)
using Teh et al. (2006) approach for the direct assignment sampler, with modular LDA parametric sampler first published by Griffiths (2002) and explained in Heinrich (2005). For the original LDA paper, see Blei et al. (2002).
The general idea is to retain as much as possible of the standard LDA Gibbs sampler, which is possible by alternatingly sampling the finite case with K + 1 topics and resampling the topic weights taking into account the current assignments of data items to topics and pruning or expanding the topic set accordingly.
I tried to find the (subjectively) best tradeoff between simplicity and the JASA paper (Teh et al. 2006). Therefore I have only used the direct assignment method.
The implementation uses lists instead of primitive arrays, but for performance reasons, this may be changed to have a bound Kmax to allocate fixed-size arrays, similar to a truncated DP.
Caveats: (1) Performance is not a core criterion, and OOP encapsulation is ignored for compactness' sake. (2) Code still uses the likelihood function of LDA, and without the hyperparameter terms.
LICENSE: GPL3, see: http://www.gnu.org/licenses/gpl-3.0.html
References:
D.M. Blei, A. Ng, M.I. Jordan. Latent Dirichlet Allocation. NIPS, 2002
T. Griffiths. Gibbs sampling in the generative model of Latent Dirichlet Allocation. TR, 2002, www-psych.stanford.edu/~gruffydd/cogsci02/lda.ps
G. Heinrich. Parameter estimation for text analysis. TR, 2009, www.arbylon.net/publications/textest2.pdf
G. Heinrich. "Infinite LDA" -- implementing the HDP with minimum code complexity. TN2011/1, www.arbylon.net/publications/ilda.pdf
Y.W. Teh, M.I. Jordan, M.J. Beal, D.M. Blei. Hierarchical Dirichlet Processes. JASA, 101:1566-1581, 2006
Field Summary | |
---|---|
int |
ppstep
step to increase the sampling array |
Constructor Summary | |
---|---|
IldaGibbs(int[][] w,
int[][] wq,
int K,
int V,
double alpha,
double beta,
double gamma,
java.util.Random rand)
parametrise gibbs sampler |
Method Summary | |
---|---|
void |
init()
initialise Markov chain |
void |
initq()
initialise Markov chain for querying |
static void |
main(java.lang.String[] args)
test driver for mixture network Gibbs sampler |
void |
packTopics()
reorders topics such that no gaps exist in the count arrays and topics are ordered with their counts descending. |
double |
ppx()
|
void |
run(int niter)
run Gibbs sampler |
void |
runq(int niter)
query Gibbs sampler. |
java.lang.String |
toString()
assemble a string of overview information. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public final int ppstep
Constructor Detail |
---|
public IldaGibbs(int[][] w, int[][] wq, int K, int V, double alpha, double beta, double gamma, java.util.Random rand)
w
- word tokenswq
- word tokens (testing)K
- initial number of topics: may be 0 if gamma > 0.V
- number of termsalpha
- node A precision (document DP)gamma
- node A precision (root DP), 0 for fixed K: plain LDA.beta
- node B hyperparamrand
- random number generatorMethod Detail |
---|
public static void main(java.lang.String[] args)
args
- public void init()
init
in interface ISimpleGibbs
public void initq()
initq
in interface ISimpleQueryGibbs
public void run(int niter)
run
in interface ISimpleGibbs
niter
- number of Gibbs iterationspublic void runq(int niter)
runq
in interface ISimpleQueryGibbs
niter
- number of Gibbs iterationspublic void packTopics()
public double ppx()
ppx
in interface ISimplePpx
public java.lang.String toString()
toString
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |