|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.dirichlet.lda.LdaTopicSimilarities org.knowceans.dirichlet.atm.AtmTopicSimilarities
public class AtmTopicSimilarities
FmLdaSimilarities calculates similarities between term and documents, both known and unknown. This is the interface for LDA queries, once the topics of an unknown string have been determined. This implementation supports both the symmetrised KL-divergence (= Jenson-Shannon distance) and a predictive likelihood.
By convention, conditional likelihoods are normalised along rows, i.e., p(col|row) = double[row][col]; If distributions are along columns, some methods provide a transposed flag.
Field Summary |
---|
Fields inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities |
---|
phi, phiPost, theta, thetaPost |
Constructor Summary | |
---|---|
AtmTopicSimilarities(AtmGibbsSampler lda,
boolean terms,
boolean authors,
boolean pl,
boolean js)
Initialise topic similarities using an existing lda gibbs sampler, whose phi and theta values are shared. |
|
AtmTopicSimilarities(java.lang.String atmbase,
boolean terms,
boolean authors,
boolean pl,
boolean js)
Construct an LdaSimilaritiesCps object with path bases and action indicators for terms and documents processing |
Method Summary | |
---|---|
org.knowceans.map.IndexRanking |
authorAuthors(int author,
boolean mutLik,
int max)
Get the most similar authors for the author. |
org.knowceans.map.IndexRanking |
docDocs(int doc,
boolean mutLik,
int max)
TODO: implement the search for documents. |
org.knowceans.map.IndexRanking |
docTerms(int doc,
boolean mutLik,
int max)
Get the most similar terms for the doc. |
org.knowceans.map.IndexRanking |
queryAuthors(double[] topics,
boolean mutLik,
int max)
Get the most similar documents for the query expressed as distribution over z. |
org.knowceans.map.IndexRanking |
queryDocs(double[] topics,
boolean mutLik,
int max)
TODO: implement the search for documents. |
org.knowceans.map.IndexRanking[] |
queryTerms(double[][] topics,
boolean mutLik,
int max)
Get the most similar terms for the queries expressed as array of distributions over z. |
org.knowceans.map.IndexRanking |
termDocs(int term,
boolean mutLik,
int max)
Get the most similar docs for the term. |
org.knowceans.map.IndexRanking |
termTerms(int term,
boolean mutLik,
int max)
Get the most similar terms for the term. |
Methods inherited from class org.knowceans.dirichlet.lda.LdaTopicSimilarities |
---|
bestJsMatches, bestJsMatches, bestMutLikMatches, bestMutLikMatches, getPhi, getPhiPost, getTheta, getThetaPost, jsDistance, jsDistance, klDivergence, klDivergence, mutualLikelihood, mylog, posterior, queryDocs, queryTerms |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public AtmTopicSimilarities(java.lang.String atmbase, boolean terms, boolean authors, boolean pl, boolean js) throws java.io.IOException
atmbase
- path base of lda parameter set (path + filename excluding
extensions .phi.zip and .theta.zip)terms
- load term matrix (phi)authors
- load document matrix (theta)pl
- configure for use with predictive likelihoods (syn. mutual
likelihood, because it appears to be symmetric)js
- configure for use with jenson shannon likelihood
java.io.IOException
public AtmTopicSimilarities(AtmGibbsSampler lda, boolean terms, boolean authors, boolean pl, boolean js)
lda
- terms
- authors
- pl
- js
- Method Detail |
---|
public org.knowceans.map.IndexRanking queryAuthors(double[] topics, boolean mutLik, int max)
topics
- distribution over z. multiple elementsmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking queryDocs(double[] topics, boolean mutLik, int max)
queryDocs
in class LdaTopicSimilarities
topics
- distribution over z. multiple elementsmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking[] queryTerms(double[][] topics, boolean mutLik, int max)
queryTerms
in class LdaTopicSimilarities
topics
- distribution over z.mutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking authorAuthors(int author, boolean mutLik, int max)
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking docDocs(int doc, boolean mutLik, int max)
docDocs
in class LdaTopicSimilarities
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking termTerms(int term, boolean mutLik, int max)
termTerms
in class LdaTopicSimilarities
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon)max
- maximum number of matches
public org.knowceans.map.IndexRanking docTerms(int doc, boolean mutLik, int max)
docTerms
in class LdaTopicSimilarities
doc
- document indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon) *max
- maximum number of matches
public org.knowceans.map.IndexRanking termDocs(int term, boolean mutLik, int max)
termDocs
in class LdaTopicSimilarities
term
- term indexmutLik
- use mutual / predictive likelihood (otherwise
jensen-shannon) *max
- maximum number of matches
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |