|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.knowceans.dirichlet.lsam.LsamGibbsSampler
public class LsamGibbsSampler
Gibbs sampler for estimating the best assignments of topics for words and actors in a corpus with co-authoring information and personalised queries. The algorithm is based on Mark Steyvers et al. "Probabilistic Author-Topic Models for Information Discovery" (2004) and will be shown in G. H. "Probabilistic associative browsing for virtual expert communities" (2005).
TODO: What happens if one author is not searching for anything? TODO: How can demand be weighted other than within the bounds of pdf normalisation (introduce null topic by fill with 0 terms? use p(a|z)?) TODO: Weight influence of relations in phi.
Field Summary | |
---|---|
(package private) int |
A
total number of actors |
(package private) int[][] |
actors
actor data (co-author lists and queriers) |
(package private) double |
alpha
Dirichlet parameter (actor--topic associations) |
(package private) double |
beta
Dirichlet parameter (topic--term associations) |
private static int |
BURN_IN
burn-in period |
(package private) int[][][] |
cat
cat[r][i][j] number of times actor i is assigned to topic j in relation r. |
(package private) int[][] |
catsum
catsum[r][i] total number of word (=topic) assignments to actor i in relation r. |
(package private) int[][] |
cwt
cwt[i][j] number of instances of word i (term) assigned to topic j. |
(package private) int[] |
cwtsum
cwtsum[r][j] total number of words assigned to topic j. |
private static int |
dispcol
|
private static int |
ITERATIONS
max iterations |
(package private) int |
K
number of topics |
(package private) static java.text.NumberFormat |
lnf
|
(package private) int[][] |
media
document data (term lists) |
(package private) int |
numstats
size of statistics |
(package private) double[][] |
phisum
cumulative statistics of phi |
(package private) double[][][] |
psisum
cumulative statistics of psi (one per relation) |
(package private) int |
R
total number of relation types |
static org.knowceans.util.CokusRandom |
rand
|
(package private) int[] |
relations
relation that each medium maps to |
private static int |
SAMPLE_LAG
sample lag (if <=0 only one sample taken at the end) |
(package private) static java.lang.String[] |
shades
|
private static int |
THIN_INTERVAL
sampling lag (?) |
(package private) int |
V
vocabulary size |
(package private) int[][] |
x
actor assignments for each word. |
(package private) int[][] |
z
topic assignments for each word. |
Constructor Summary | |
---|---|
LsamGibbsSampler(int[][] documents,
int[][] actors,
int[] relations,
int V,
int A,
int R)
Initialise the Gibbs sampler with data. |
Method Summary | |
---|---|
void |
configure(int iterations,
int burnIn,
int thinInterval,
int sampleLag)
Configure the gibbs sampler |
double[][] |
getPhi()
Retrieve estimated topic--word associations. |
double[][][] |
getPsi()
Retrieve estimated actor--topic associations, one matrix for each relation. |
private void |
gibbs(int K,
double alpha,
double beta)
Main method: Select initial state ? |
void |
initialState(int K)
Initialisation: Must start with an assignment of observations to topics ? |
private static double |
kl(double[] p,
double[] q)
KL-divergence between distributions p and q. |
static void |
main(java.lang.String[] args)
Driver with example data. |
private static double[][] |
matchActors(double[][][] psi,
int p,
int q)
Build matrix of actor--actor matches according to the Kullback-Leibler-divergence, M_ij = KL(demand_i||supply_j). |
private void |
sampleFullConditional(int m,
int n)
Sample an actor--topic pair (x_i, z_i) from the full conditional distribution: p(x_i = q,z_i = j|z_-i, w, x_i, a_d, r_d) = (cwt_mj + beta)/(cwtsum_j + W * beta) * (cat_qjr + alpha)/(catsum_qr + K * alpha) |
static java.lang.String |
shadeDouble(double d,
double max)
create a string representation whose gray value appears as an indicator of magnitude, cf. |
static void |
test(java.lang.String[] args)
test driver with a small synthetic data set |
private void |
updateParams()
Add to the statistics the values of psi and phi for the current state. |
private static void |
writeParameters(java.lang.String file,
java.lang.String corpusname,
int k,
double alpha,
double beta,
int x,
int m,
int v,
int w,
long duration,
int iterations,
int samplelag,
int burnin,
org.knowceans.util.Arguments a)
write statistics of the current run to a text file for later review |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
int[][] media
int[][] actors
int[] relations
int V
int A
int R
int K
double alpha
double beta
int[][] z
int[][] x
int[][] cwt
int[][][] cat
int[] cwtsum
int[][] catsum
int numstats
double[][][] psisum
double[][] phisum
private static int THIN_INTERVAL
private static int BURN_IN
private static int ITERATIONS
private static int SAMPLE_LAG
private static int dispcol
public static org.knowceans.util.CokusRandom rand
static java.lang.String[] shades
static java.text.NumberFormat lnf
Constructor Detail |
---|
public LsamGibbsSampler(int[][] documents, int[][] actors, int[] relations, int V, int A, int R)
media
- actors
- relations
- V
- A
- R
- Method Detail |
---|
public void initialState(int K)
K
- number of topicsprivate void gibbs(int K, double alpha, double beta)
K
- number of topicsalpha
- symmetric prior parameter on document--topic associationsbeta
- symmetric prior parameter on topic--term associationsprivate void sampleFullConditional(int m, int n)
m
- documentn
- wordprivate void updateParams()
public double[][][] getPsi()
public double[][] getPhi()
IDEA: Here only the supply relation could be considered by cwtsum[r][]
public void configure(int iterations, int burnIn, int thinInterval, int sampleLag)
iterations
- number of total iterationsburnIn
- number of burn-in iterationsthinInterval
- update statistics intervalsampleLag
- sample interval (-1 for just one sample at the end)public static void main(java.lang.String[] args)
args
- private static void writeParameters(java.lang.String file, java.lang.String corpusname, int k, double alpha, double beta, int x, int m, int v, int w, long duration, int iterations, int samplelag, int burnin, org.knowceans.util.Arguments a)
file
- corpusname
- k
- topicsalpha
- hyperparameterbeta
- hyperparameterx
- actor countm
- doc countv
- vocabulary/term countw
- word countduration
- training durationiterations
- no. of total iterationssamplelag
- sampling lagburnin
- burnin samplesa
- Arguments objectpublic static void test(java.lang.String[] args)
args
- private static double[][] matchActors(double[][][] psi, int p, int q)
psi
- matrix[][][] of actor--topic associations, by relations in
first indexp
- relation of left KL argumentq
- relation of right KL argument
private static double kl(double[] p, double[] q)
p
- discrete pdfq
- discrete pdf (q.length = p.length)
public static java.lang.String shadeDouble(double d, double max)
d
- valuemax
- maximum value
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |