org.knowceans.citeseer.fetcher
Class CsxFileWriter

java.lang.Object
  extended by org.knowceans.citeseer.fetcher.CsxFileWriter

public class CsxFileWriter
extends java.lang.Object

converts OAI xml files to files easier indexable with the knowceans-lda package, collecting authors, titles etc. in separate fields. The main issue with this is the extensive amount of duplicates in the result (is CSX really that noisy?).
Use this with -Xmx1024m, as the code is based on large indices.

Version:
draft (quickly written but functional)
Author:
gregor heinrich (gregor :: arbylon . net)

Constructor Summary
CsxFileWriter(java.lang.String base, java.lang.String outbase)
           
 
Method Summary
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CsxFileWriter

public CsxFileWriter(java.lang.String base,
                     java.lang.String outbase)
Method Detail

main

public static void main(java.lang.String[] args)