org.knowceans.citeseer.fetcher
Class CsxFileWriter
java.lang.Object
org.knowceans.citeseer.fetcher.CsxFileWriter
public class CsxFileWriter
- extends java.lang.Object
converts OAI xml files to files easier indexable with the knowceans-lda
package, collecting authors, titles etc. in separate fields. The main issue
with this is the extensive amount of duplicates in the result (is CSX really
that noisy?).
Use this with -Xmx1024m, as the code is based on large indices.
- Version:
- draft (quickly written but functional)
- Author:
- gregor heinrich (gregor :: arbylon . net)
Constructor Summary |
CsxFileWriter(java.lang.String base,
java.lang.String outbase)
|
Method Summary |
static void |
main(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CsxFileWriter
public CsxFileWriter(java.lang.String base,
java.lang.String outbase)
main
public static void main(java.lang.String[] args)