|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.corpus.analysis.LdaAmqDistance
public class LdaAmqDistance
LdaAmqCorrelationAnalyser analyses the distance between the extracted topics of an LDA model and an LS-AMQ, effectively measuring the influence of authorship and querying information on the topic distributions.
Opposed to TopicCorrelationAnalyser, which compares clusterings of documents, LdaAmqCorrelationAnalyser uses terms, which are the common entities between the two approaches LDA and AMQ.
Field Summary | |
---|---|
(package private) double[][] |
amqphi
the AMQ word--topic associations |
private java.lang.String |
comment
|
(package private) double[][] |
ldaphi
the LDA word--topic associations |
(package private) static double |
log2
basis |
(package private) int |
nAmqTopics
number of categories |
(package private) int |
nLdaTopics
number of LDA topics |
(package private) int |
nTerms
number of documents |
private java.lang.String |
outfile
|
private double |
sumPCatDoc
|
Constructor Summary | |
---|---|
LdaAmqDistance(java.lang.String ldaphifile,
java.lang.String amqphifile,
java.lang.String outfile,
java.lang.String comment)
|
Method Summary | |
---|---|
(package private) double |
entropy(double[] p)
entropy of the distribution |
static void |
main(java.lang.String[] args)
|
double |
metric(double[][] ldaphi,
double[][] amqphi)
Variation of Information metric for a priori and a posteriori relationships (Meila 2003). |
private double |
mutualInfo(double[] pv,
double[] pw,
double[][] pzv,
double[][] pzw)
calculate mutual info for the two clusterings if pjoint is known. |
double |
mylog(double arg)
|
double[] |
pItem(double[][] pzw)
averaged distributions n_z * sum_z p(v=s|z) |
private double |
pJoint(double[][] pzv,
double[][] pzw,
int v,
int w)
calculate joint probability for the two clusterings. |
private void |
run()
|
double |
sum(double[] v)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
static double log2
int nTerms
int nLdaTopics
int nAmqTopics
double[][] ldaphi
double[][] amqphi
private java.lang.String outfile
private java.lang.String comment
private double sumPCatDoc
Constructor Detail |
---|
public LdaAmqDistance(java.lang.String ldaphifile, java.lang.String amqphifile, java.lang.String outfile, java.lang.String comment)
comment
- includeunknown2
- hierup2
- Method Detail |
---|
public static void main(java.lang.String[] args)
private void run()
public double metric(double[][] ldaphi, double[][] amqphi)
D(X, Y) = H(X) + H(Y) - 2 I(X, Y)
with entropy H(X) = - sum p(x) log p(x)
and the KL divergence between the x,y considered independent and
the actual joint distribution I(X, Y) = KL( p(x,y) || p(x)p(y) )
public double[] pItem(double[][] pzw)
pwz
- p(v|z), e.g., phi.
double entropy(double[] p)
p
-
private double mutualInfo(double[] pv, double[] pw, double[][] pzv, double[][] pzw)
pv
- categories distribution p(v=s)pw
- topics distribution p(w=t)pvw
- joint distribution p(v=s, w=t)
private double pJoint(double[][] pzv, double[][] pzw, int v, int w)
public double mylog(double arg)
public double sum(double[] v)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |