|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.sandbox.LdaGibbsSamplerDpa
public class LdaGibbsSamplerDpa
Gibbs sampler for estimating the best assignments of topics for words and documents in a corpus. The algorithm is introduced in Tom Griffiths' paper "Gibbs sampling in the generative model of Latent Dirichlet Allocation" (2002).
Field Summary | |
---|---|
protected double |
alpha
Deprecated. Dirichlet parameter (document--topic associations) |
protected VariationOfInformationAnalyser |
analyser
Deprecated. |
protected int |
backupInterval
Deprecated. |
protected int |
backupIteration
Deprecated. iteration in the last backup |
protected double |
beta
Deprecated. Dirichlet parameter (topic--term associations) |
protected int |
BURN_IN
Deprecated. burn-in period |
protected java.lang.String |
corpusname
Deprecated. |
protected int |
dispcol
Deprecated. |
protected int[][] |
documents
Deprecated. document data (term lists) |
protected int[] |
interSamples
Deprecated. number of iteration at which intermediate (single) samples are taken. |
protected boolean |
interSave
Deprecated. |
protected boolean |
interTopics
Deprecated. |
protected int |
ITERATIONS
Deprecated. max iterations |
protected int |
K
Deprecated. number of topics |
protected java.lang.String |
messageheader
Deprecated. |
protected java.lang.String[] |
messagerecipients
Deprecated. |
protected java.lang.String |
messagetext
Deprecated. |
protected int[][] |
nd
Deprecated. nd[d][k] number of words in document d assigned to topic k. |
protected int[] |
ndsum
Deprecated. ndsum[d] total number of words in document d. |
protected int |
numstats
Deprecated. size of statistics |
protected int[][] |
nw
Deprecated. cwt[k][j] number of instances of word j (term?) |
protected int[] |
nwsum
Deprecated. nwsum[k] total number of words assigned to topic k. |
protected java.lang.String |
outfilename
Deprecated. |
protected double[][] |
phisum
Deprecated. cumulative statistics of phi |
protected int |
SAMPLE_LAG
Deprecated. sample lag (if -1 only one sample taken) |
private static long |
serialVersionUID
Deprecated. |
protected long |
t0
Deprecated. |
protected double[][] |
thetasum
Deprecated. cumulative statistics of theta |
protected int |
THIN_INTERVAL
Deprecated. sampling lag (?) |
protected long |
timeElapsed
Deprecated. |
protected int |
V
Deprecated. vocabulary size |
protected int[][] |
z
Deprecated. topic assignments for each word. |
Constructor Summary | |
---|---|
LdaGibbsSamplerDpa(int[][] documents,
int V)
Deprecated. Initialise the Gibbs sampler with data. |
Method Summary | |
---|---|
protected void |
configureMessaging(java.lang.String header,
java.lang.String text,
java.lang.String[] recipients)
Deprecated. configure the sampler for messaging |
protected void |
configureOutput(java.lang.String corpusname,
java.lang.String outfilename,
int backupInterval,
int[] interSamples,
VariationOfInformationAnalyser analyser,
boolean interSave,
boolean interTopics)
Deprecated. configure the sampler output |
void |
configureSampler(int iterations,
int burnIn,
int thinInterval,
int sampleLag,
int K,
double alpha,
double beta)
Deprecated. Configure the gibbs sampler |
private VariationOfInformationAnalyser.DistMetric |
distance(java.lang.String outname)
Deprecated. perform a distance calculation on the estimated results |
double |
getAlpha()
Deprecated. |
double |
getBeta()
Deprecated. |
int[][] |
getDocuments()
Deprecated. |
int |
getK()
Deprecated. |
double[][] |
getPhi()
Deprecated. Retrieve estimated topic--word associations. |
double[][] |
getTheta()
Deprecated. Retrieve estimated document--topic associations. |
protected long |
getTimer()
Deprecated. get the current value of the timer. |
int |
getV()
Deprecated. |
int[][] |
getZ()
Deprecated. |
private void |
gibbs()
Deprecated. Main method: Select initial state ? |
void |
initialState()
Deprecated. Initialisation: Must start with an assignment of observations to topics ? |
static java.lang.Object |
load(java.lang.String filename)
Deprecated. read object from the stream |
static void |
main(java.lang.String[] args)
Deprecated. |
protected void |
output(java.lang.String analysisfile,
java.lang.String addheader,
java.lang.String addmessage)
Deprecated. Calculate distance (if doDist) and replace all occurrences of $@ and $# in strings by the complete distance information and the distance value only, respectively. |
protected void |
sampleCorpus()
Deprecated. sample once through the corpus. |
protected int |
sampleLdaFullConditional(int m,
int n)
Deprecated. Sample a topic z_i from the full conditional distribution: p(z_i = j | z_-i, w) = (n_-i,j(w_i) + beta)/(n_-i,j(.) + W * beta) * (n_-i,j(d_i) + alpha)/(n_-i,. |
void |
save(java.lang.String filename)
Deprecated. Object stream only for testing. |
void |
setAlpha(double alpha)
Deprecated. |
void |
setBeta(double beta)
Deprecated. |
protected void |
startTimer(long offset)
Deprecated. start timer from with an initial offset. |
protected void |
updateParams()
Deprecated. Add to the statistics the values of theta and phi for the current state. |
protected static void |
writeParameters(java.lang.String file,
java.lang.String corpusname,
int k,
double alpha,
double beta,
int m,
int v,
int w,
long duration,
int iterations,
int samplelag,
int burnin,
org.knowceans.util.Arguments a)
Deprecated. write statistics of the current run to a text file for later review |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static final long serialVersionUID
protected int[][] documents
protected int V
protected int K
protected double alpha
protected double beta
protected int[][] z
protected int[][] nw
protected int[][] nd
protected int[] nwsum
protected int[] ndsum
protected double[][] thetasum
protected double[][] phisum
protected int numstats
protected int[] interSamples
protected int THIN_INTERVAL
protected int BURN_IN
protected int ITERATIONS
protected int SAMPLE_LAG
protected int dispcol
protected java.lang.String corpusname
protected java.lang.String outfilename
protected java.lang.String messagetext
protected java.lang.String messageheader
protected java.lang.String[] messagerecipients
protected boolean interSave
protected VariationOfInformationAnalyser analyser
protected boolean interTopics
protected int backupIteration
protected long timeElapsed
protected int backupInterval
protected long t0
Constructor Detail |
---|
public LdaGibbsSamplerDpa(int[][] documents, int V)
V
- vocabulary sizedata
- Method Detail |
---|
public void initialState()
private void gibbs()
protected void sampleCorpus()
protected int sampleLdaFullConditional(int m, int n)
m
- documentn
- wordprotected void updateParams()
public double[][] getTheta()
public double[][] getPhi()
public void configureSampler(int iterations, int burnIn, int thinInterval, int sampleLag, int K, double alpha, double beta)
iterations
- number of total iterationsburnIn
- number of burn-in iterationsthinInterval
- update statistics intervalsampleLag
- sample interval (-1 for just one sample at the end)K
- number of topicsalpha
- symmetric prior parameter on document--topic associationsbeta
- symmetric prior parameter on topic--term associationsprotected void configureOutput(java.lang.String corpusname, java.lang.String outfilename, int backupInterval, int[] interSamples, VariationOfInformationAnalyser analyser, boolean interSave, boolean interTopics)
corpusname
- outfilename
- backupInterval
- interSamples
- analyser
- interSave3
- interSave
- protected void configureMessaging(java.lang.String header, java.lang.String text, java.lang.String[] recipients)
header
- text
- recipients
- public static void main(java.lang.String[] args)
protected void startTimer(long offset)
offset
- protected long getTimer()
protected void output(java.lang.String analysisfile, java.lang.String addheader, java.lang.String addmessage)
addmessage
- doDist
- public void save(java.lang.String filename)
public static java.lang.Object load(java.lang.String filename)
filename
-
private VariationOfInformationAnalyser.DistMetric distance(java.lang.String outname)
protected static void writeParameters(java.lang.String file, java.lang.String corpusname, int k, double alpha, double beta, int m, int v, int w, long duration, int iterations, int samplelag, int burnin, org.knowceans.util.Arguments a)
file
- corpusname
- k
- topicsalpha
- hyperparameterbeta
- hyperparameterm
- doc countv
- vocabulary/term countw
- word countduration
- training durationiterations
- no. of total iterationssamplelag
- sampling lagburnin
- burnin samplesa
- Arguments objectpublic final double getAlpha()
public final void setAlpha(double alpha)
public final double getBeta()
public final void setBeta(double beta)
public final int[][] getDocuments()
public final int getK()
public final int getV()
public final int[][] getZ()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |