|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.knowceans.corpus.analysis.TopicsConverter
public class TopicsConverter
TopicAnalyser extracts topics from Phi and Theta variables and shows the Bayesian equivalent of the phi[z][w] = P(z|w) = P(w|z) P(z) / sum_z'(P(w|z') P(z')) or, equivalently, theta[d][z] = P(d|z) = P(z|d) P(d) / sum_d'(P(z|d') P(d'))
Constructor Summary | |
---|---|
TopicsConverter()
|
Method Summary | |
---|---|
protected void |
analyse(java.lang.String filename,
boolean transposed,
java.lang.String labelFilename,
double threshold,
double postThreshold,
java.lang.String comment,
java.lang.String postComment)
Analyse binary probability matrix (conditional). |
static java.util.Vector<org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer>> |
extractTopics(double[][] a,
double threshold,
boolean transposed)
Extract topic lists for the probability matrix a (topics in columns). |
static void |
main(java.lang.String[] args)
|
protected static double[] |
normaliseRows(double[][] matrix)
normalises the rows of the matrix in situ and returns the vector of normalisation factors. |
static double[][] |
posterior(double[][] likelihood)
Calculate posterior probability p(y|x) = p(x|y) / sum_y'(p(x|y')) with uniform prior p(y) = const. |
static double[][] |
posterior(double[][] likelihood,
double[] prior)
Calculate posterior probability p(y|x) = p(x|y) p(y) / sum_y'(p(x|y') p(y')) with a prior given. |
static void |
printMatrix(double[][] a)
|
void |
run(java.lang.String corpus,
java.lang.String model)
|
static void |
saveTopics(java.lang.String filename,
java.util.Vector<org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer>> topics,
java.lang.String comment,
java.lang.String additional)
saves a topic hashmap to a readable file; looks up the row labels (terms or document names) from additional file. |
static void |
test()
test driver for posterior calculation. |
static org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer> |
truncateMap(org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer> sorter,
double threshold)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public TopicsConverter()
Method Detail |
---|
public static void main(java.lang.String[] args)
public void run(java.lang.String corpus, java.lang.String model)
protected void analyse(java.lang.String filename, boolean transposed, java.lang.String labelFilename, double threshold, double postThreshold, java.lang.String comment, java.lang.String postComment)
filename
- binary file with original matrixtransposed
- true if binary file has transposed normalisationlabelFilename
- to load a list of labels for non-topic indexesthreshold
- shade visualisation threshold for original matrix or NaN
to disablepostThreshold
- shade visualisation threshold for posterior matrix
or NaN to disablecomment
- for original matrixpostComment
- for posterior matrixpublic static java.util.Vector<org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer>> extractTopics(double[][] a, double threshold, boolean transposed)
a
- matrixthreshold
- down to which the probabilities are extracted. If
negative, the threshold is taken as count, how many of each topic
to extracttransposed
- topics in row instead of columns
public static org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer> truncateMap(org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer> sorter, double threshold)
public static void saveTopics(java.lang.String filename, java.util.Vector<org.knowceans.map.TreeMultiMap<java.lang.Double,java.lang.Integer>> topics, java.lang.String comment, java.lang.String additional)
filename
- target filetopics
- vector of maps that contain the probability->index
associations for each topiccomment
- put on top of the target fileadditional
- filename of rowlabels information (.docs, .vocab,
.actors)public static void test()
public static void printMatrix(double[][] a)
a
- public static double[][] posterior(double[][] likelihood, double[] prior)
likelihood
- likelihood[y][x] = p(x|y), i.e., normalised along rowsprior
- prior[y] = p(y), i.e., must be normalised
public static double[][] posterior(double[][] likelihood)
likelihood
- likelihood[y][x] = p(x|y), i.e., normalised along rows
protected static double[] normaliseRows(double[][] matrix)
matrix
-
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |