|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.lda.LdaTopicSimilarities
public class LdaTopicSimilarities
FmLdaSimilarities calculates similarities between term and documents, both known and unknown. This is the interface for LDA queries, once the topics of an unknown string have been determined. This implementation supports both the symmetrised KL-divergence (= Jenson-Shannon distance) and a predictive likelihood.
By convention, conditional likelihoods are normalised along rows, i.e., p(col|row) = double[row][col]; If distributions are along columns, some methods provide a transposed flag.
Field Summary | |
---|---|
(package private) static double |
log2
basis |
protected double[][] |
phi
the LDA topic--word associations phi[z][w] = p(w|z) |
protected double[][] |
phiPost
the LDA word-topic associations phiPost[w][z] = p(z|w) |
protected double[][] |
theta
the LDA document--topic associations theta[d][z] = p(z|d) |
protected double[][] |
thetaPost
the LDA topic-document associations thetaPost[z][d] = p(d|z) |
Constructor Summary | |
---|---|
LdaTopicSimilarities(LdaGibbsSampler lda,
boolean terms,
boolean docs,
boolean pl,
boolean js)
Initialise topic similarities using an existing lda gibbs sampler, whose phi and theta values are shared. |
|
LdaTopicSimilarities(java.lang.String ldabase,
boolean terms,
boolean docs,
boolean pl,
boolean js)
Construct an LdaSimilaritiesCps object with path bases and action indicators for terms and documents processing |
Method Summary | |
---|---|
protected org.knowceans.map.IndexRanking |
bestJsMatches(double[][] pzx,
double[] qz,
int max)
Find matching items for the item with index i (row) using Jensen-Shannon distance. |
protected org.knowceans.map.IndexRanking |
bestJsMatches(double[][] pzx,
int x,
int max)
Find matching items for the item with index i (row) using Jensen-Shannon distance. |
protected org.knowceans.map.IndexRanking |
bestMutLikMatches(double[][] pxz,
double[][] pzx,
int item,
int max)
Find matching items for the item with index i (column!) |
protected org.knowceans.map.IndexRanking |
bestMutLikMatches(double[][] pxz,
double[] qz,
int max)
Find matching items in pzx for distribution qz, p(d|q) = sum p(d|z) p(z|q) |
org.knowceans.map.IndexRanking |
docDocs(int doc,
boolean mutLik,
int max)
Get the most similar documents for the document doc. |
org.knowceans.map.IndexRanking |
docTerms(int doc,
boolean mutLik,
int max)
Get the most similar terms for the doc. |
double[][] |
getPhi()
|
double[][] |
getPhiPost()
|
double[][] |
getTheta()
|
double[][] |
getThetaPost()
|
static double |
jsDistance(double[][] pzx,
int xp,
int xq,
boolean transposed)
Compute the Jensen-Shannon distance between px and qx, JS(p(x1) || p(x2)), which is used analogously to klDivergence (see there) -- JS-distance is just the symmetrised KL-divergence: JS(px || qx) = 1/2 [ KL(px || qx) + KL(qx || px) ] |
static double |
jsDistance(double[] px,
double[] qx)
Compute the Jensen-Shannon distance between px and qx, JS(p(x1) || p(x2)), which is used analogously to klDivergence (see there) -- JS-distance is just the symmetrised KL-divergence: JS(px || qx) = 1/2 [ KL(px || qx) + KL(qx || px) ] |
static double |
klDivergence(double[][] pzx,
int xp,
int xq,
boolean transposed)
Compute the Kullback-Leibler divergence between distributions px and qx, KL(px || qx) = sum_x px(x) [log px(x) - log qx(x)] where arguments xp and xq are the rows of a conditional probability distribution matrix. |
static double |
klDivergence(double[] px,
double[] qx)
Compute the Kullback-Leibler divergence between distributions px and qx, KL(px || qx) = sum_x px(x) [log px(x) - log qx(x)] where arguments px and qx are the distributions with equal length. |
static double |
mutualLikelihood(double[][] pxz,
double[][] pzx,
int x1,
int x2)
Given the distributions p(x | z) and p(z | x), calculate the likelihood that the topics of item x1 can generate item x2, i.e., p(x2 | x1) = sum_z p(x2 | z) p(z | x1). |
static double |
mylog(double arg)
Specialised log function (now logarithmus dualis) |
static double[][] |
posterior(double[][] likelihood)
Calculate posterior probability p(y|x) = p(x|y) / sum_y'(p(x|y')) with uniform prior p(y) = const. |
org.knowceans.map.IndexRanking[] |
queryDocs(double[][] topics,
boolean mutLik,
int max)
Get the most similar documents for the queries expressed as array of distributions over z. |
org.knowceans.map.IndexRanking |
queryDocs(double[] topics,
boolean mutLik,
int max)
Get the most similar documents for the query expressed as distribution over z. |
org.knowceans.map.IndexRanking[] |
queryTerms(double[][] topics,
boolean mutLik,
int max)
Get the most similar terms for the queries expressed as array of distributions over z. |
org.knowceans.map.IndexRanking |
queryTerms(double[] topics,
boolean mutLik,
int max)
Get the most similar terms for the query expressed as distribution over z. |
org.knowceans.map.IndexRanking |
termDocs(int term,
boolean mutLik,
int max)
Get the most similar docs for the term. |
org.knowceans.map.IndexRanking |
termTerms(int term,
boolean mutLik,
int max)
Get the most similar terms for the term. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
static double log2
protected double[][] phi
protected double[][] phiPost
protected double[][] theta
protected double[][] thetaPost
Constructor Detail |
---|
public LdaTopicSimilarities(java.lang.String ldabase, boolean terms, boolean docs, boolean pl, boolean js) throws java.io.IOException
ldabase
- path base of lda parameter set (path + filename excluding
extensions .phi.zip and .theta.zip)terms
- load term matrix (phi)docs
- load document matrix (theta)pl
- configure for use with predictive likelihoods (syn. mutual
likelihood, because it appears to be symmetric)js
- configure for use with jenson shannon likelihood
java.io.IOException
public LdaTopicSimilarities(LdaGibbsSampler lda, boolean terms, boolean docs, boolean pl, boolean js)
lda
- terms
- docs
- pl
- js
- Method Detail |
---|
public org.knowceans.map.IndexRanking queryDocs(double[] topics, boolean mutLik, int max)
topics
- distribution over z. multiple elementsmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking queryTerms(double[] topics, boolean mutLik, int max)
topics
- distribution over z. multiple elementsmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking[] queryDocs(double[][] topics, boolean mutLik, int max)
topics
- distribution over z.mutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking[] queryTerms(double[][] topics, boolean mutLik, int max)
topics
- distribution over z.mutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking docDocs(int doc, boolean mutLik, int max)
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking termTerms(int term, boolean mutLik, int max)
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking docTerms(int doc, boolean mutLik, int max)
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon) *max
- maximum number of matches
public org.knowceans.map.IndexRanking termDocs(int term, boolean mutLik, int max)
term
- term indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon) *max
- maximum number of matches
protected org.knowceans.map.IndexRanking bestJsMatches(double[][] pzx, int x, int max)
pzx
- conditional probability matrix with row normalisationx
- the distribution to be matched as row of pzxmax
- maximum number of matches
protected org.knowceans.map.IndexRanking bestJsMatches(double[][] pzx, double[] qz, int max)
pzx
- p(z|x) = pzx[x][z]qz
- q(z)max
-
protected org.knowceans.map.IndexRanking bestMutLikMatches(double[][] pxz, double[][] pzx, int item, int max)
pzx
- item
- pxz
- conditional probability matrix with row normalisationmax
- maximum number of matches
protected org.knowceans.map.IndexRanking bestMutLikMatches(double[][] pxz, double[] qz, int max)
pxz
- conditional probability matrix with row normalisationqz
- query distributionmax
- maximum number of matches
public static double mutualLikelihood(double[][] pxz, double[][] pzx, int x1, int x2)
pxz
- p(x|z) as double[z][x], with normalised rowspzx
- p(z|x) as double[x][z], with normalised rowsx1
- index of generator itemx2
- index of generated item
public static double jsDistance(double[][] pzx, int xp, int xq, boolean transposed)
JS(px || qx) = 1/2 [ KL(px || qx) + KL(qx || px) ]
pzx
- xp
- index of px in the matrix pzx (row)xq
- index of qx in the matrix pzx (row)transposed
- rows -> columns
public static double klDivergence(double[][] pzx, int xp, int xq, boolean transposed)
KL(px || qx) = sum_x px(x) [log px(x) - log qx(x)]where arguments xp and xq are the rows of a conditional probability distribution matrix. This method does not check the sum=1 property of the distributions.
pzx
- matrixxp
- first pdf (row into pzx)xq
- second pdf (row into pzx)transposed
- use columns as distributions
public static double jsDistance(double[] px, double[] qx)
JS(px || qx) = 1/2 [ KL(px || qx) + KL(qx || px) ]
px
- qx
-
public static double klDivergence(double[] px, double[] qx)
KL(px || qx) = sum_x px(x) [log px(x) - log qx(x)]where arguments px and qx are the distributions with equal length. This method does not check the sum=1 property of the distributions.
px
- first pdfqx
- second pdf
public static double[][] posterior(double[][] likelihood)
likelihood
- likelihood[y][x] = p(x|y), i.e., normalised along rows
public static double mylog(double arg)
arg
-
public final double[][] getPhi()
public final double[][] getPhiPost()
public final double[][] getTheta()
public final double[][] getThetaPost()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |