• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

? - Is there a way to validate 1 XML record at a time, using the SAX or other parser!

 
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello!

Before running into space problems, i generated an XML file from a 'pipe delimited' file and and then processed that XML file thru a SAXParser 'validator' and the data was correctly validated, using the RELAXNG Schema patterns as the validation criteria!

But as feared, the XML file was huge! (12 billion XML recs. generated from 1 billion 'pipe' recs.) and i am now trying to find a way to process 1 'pipe' record at a time, (ie) read 1 record from the 'pipe delimited' file, convert that rec. to an XML rec. and then send that 1 XML rec. thru the SAXParser 'validator', avoiding the build of a huge temporary XML file!

After testing this approach, its looks like the SAXParser 'validator' (sp.parse) is expecting only (1) StringBufferInputStream as input, and after opening, reading and closing just (1) of the returned StringBufferInputStream objects, the validator wants to close up operations!

Where i have the "<<<<<" you can see where i'm calling the the object.method that creates the 'pipe>XML' records 'sb.createxml' and will be returning many occurances of the StringBufferInputStream object, where (1) StringBufferInputStream object represents (1) 'pipe>XML' record!

So what i'm wondering, is if there is a form of 'inputStream' class that can be loaded and processed at the same time! ie instead of requiring that the 'inputStream' object be loaded in it's entirety, before going to validation?

Or if there is another XML 'validator' that can validate 1 XML record at a time, without requiring that the entire XML file be built first?

1. ------------------------------------------------------------------------

import ............

public class SX2
{
public static void main(String[] args) throws Exception
{
MyDefaultHandler dh = new MyDefaultHandler();

SX1 sx = new SX1();
SAXParser sp = sx.getParser(args[0]);

stbuf1 sb = new stbuf1();

sp.parse(sb.createxml(args[1]),dh); <<<<<< createxml( ) see <<<<<<< below
}
}

class MyDefaultHandler extends DefaultHandler {

public int errcnt;

"SX2.java" 87 lines, 2563 characters

2. ----------------------------------class: stbuf1 method: createxml---------------------------------------------------------------------------------

public stbuf1 () { }

public StringBufferInputStream createxml( String inputFile ) <<<<<< createxml(
{
BufferedReader textReader = null;
if ( (inputFile == null) || (inputFile.length() <= 1) )
{ throw new NullPointerException("Delimiter Input File does not exist");
}
String ele = new String();
try {
ele = new String();
textReader = new BufferedReader(new FileReader(inputFile));
String line = null; String SEPARATOR = "\\|"; String sToken = null;

String hdr1=("<?xml version=#1.0# encoding=#UTF-8#?>"); hdr1=hdr1.replace('#','"');
String hdr2=("<hlp_data>");
String hdr3=("</hlp_data>");
String hdr4=("<"+TABLE_NAME+">");
String hdr5=("</"+TABLE_NAME+">");

while ( (line = textReader.readLine()) != null )
{
String[] sa = line.split(SEPARATOR);
String elel = new String();

for (int i = 0; i < NUM_COLS; i++)
{
if (i>(sa.length-1)) { sToken = new String(); } else { sToken = sa[i]; }

elel="<"+_columnNames[i]+">"+sToken+"</"+_columnNames[i]+">";

if (i==0) {
ele=ele.concat(hdr1);ele=ele.concat(hdr2);ele=ele.concat(hdr4);ele=ele.concat(elel);
}
else
if (i==NUM_COLS - 1) {
ele=ele.concat(elel);ele=ele.concat(hdr5);ele=ele.concat(hdr3);
}
else {
ele=ele.concat(elel);
}
}
}
textReader.close();
}
catch (IOException e) {
}
return (new StringBufferInputStream(ele));
}
public static void main( String args[] ) {
stbuf1 genxml_obj = new stbuf1 ();
String ptxt=new String(args[0]);
genxml_obj.createxml(ptxt); }}
 
Ranch Hand
Posts: 1258
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I suppose you could write your own implementation of an InputStream that could be read/written by two separate threads, each which blocked when the buffer was emtpy/full as the case may be.

 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Nathaniel!

Sounds like a very interesting! and challenging! idea and even though it's way way over my experience level, i'm going to give it a serious try!

Appreciate your help on this Nathaniel!
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Nathaniel!

I found what i hope is a solution based on your recommendation and thought i'd share it with the ranch!

Merlin Hughes tried the PipedInputStream appproach which i was currently taking, but ran into some THREAD performance issues, so he developed his own approach:

http://www-106.ibm.com/developerworks/java/library/j-io1/

I will continue to update you and the ranch on the progress i make on this as it should be a nice solution to large file processing of many kinds!

Thanks again Nathaniel!

bc
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Nathaniel, thanks for the suggesti, here is what i used and it works great, taking the shell program from that web site, previous post, simplifying it somewhat and i just processed 1 billion records in half the time!

public class Pipe0
{
public static void main(String[] args) throws Exception
{
Pipe1 pip = new Pipe1();
try
{
InputStream rwords=pip.reverse(new DataInputStream(new FileInputStream(args[1])));
DataInputStream dis = new DataInputStream(rwords);
String input;
while ( (input = dis.readLine()) !=null) {
System.out.println(input);
}
dis.close();
}
catch (Exception e) {
System.out.println("Rym err");
}
}
}

import java.io.*;

class Pipe1 {
public static void main(String[] args) {

try {
DataInputStream words = new DataInputStream(new FileInputStream("raw.dat"));

// do the reversing and sorting
InputStream rhymedWords = reverse(reverse(words));

// write new list to standard out
DataInputStream dis = new DataInputStream(rhymedWords);
String input;

while ((input = dis.readLine()) != null) {
System.out.println(input);
}
dis.close();

} catch (Exception e) {
System.out.println("Pipe1: " + e);
}
}

public static InputStream reverse(InputStream source) {
PipedOutputStream pos = null;
PipedInputStream pis = null;

try {
DataInputStream dis = new DataInputStream(source);

pos = new PipedOutputStream();
pis = new PipedInputStream(pos);
PrintStream ps = new PrintStream(pos);

new Pipe2(ps, dis).start();

} catch (Exception e) {
System.out.println("Pipe1 reverse: " + e);
}
return pis;
}

}

import java.io.*;

class Pipe2 extends Thread {
PrintStream ps;
DataInputStream dis;

public int rcnt = 0;

public static final int ACT_DTE = 0;
public static final int FNMA_LN = 1;
public static final int DQIND = 2;
public static final int HFATYPCD = 3;
public static final int HSTATUS = 4;
public static final int LPI_DTE = 5;
public static final int ACT_UPB = 6;
public static final int HFR_UPB = 7;
public static final int REMLIFE = 8;
public static final int HPOOL_NO = 9;

public static final int NUM_COLS =10;
public static final String TABLE_NAME = "FN_LL_LN_ACTVY_1";
public static String[] _columnNames = new String [NUM_COLS];

static {
_columnNames[0] = "ACT_DTE";
_columnNames[1] = "FNMA_LN";
_columnNames[2] = "DQIND";
_columnNames[3] = "HFATYPCD";
_columnNames[4] = "HSTATUS";
_columnNames[5] = "LPI_DTE";
_columnNames[6] = "ACT_UPB";
_columnNames[7] = "HFR_UPB";
_columnNames[8] = "REMLIFE";
_columnNames[9] = "HPOOL_NO";
}

Pipe2 (PrintStream ps, DataInputStream dis) {
this.ps = ps;
this.dis = dis;
}

public void run() {
if (ps != null && dis != null) {
try {
String input;
String el;

while ((input = dis.readLine()) != null)
{
el= new String();
if (rcnt==0) {
el= setHDR (); }
el=el.concat(lineHDR () );
el=el.concat(getXML (input));
el=el.concat(lineTLR () );

ps.println(el); ps.flush();
rcnt++;
}
ps.println(setTLR()); ps.flush();

ps.close();

} catch (IOException e) {
System.out.println("Pipe2 run: " + e);
}
}
}

protected void finalize() {
try {
if (ps != null) {
ps.close();
ps = null;
}
if (dis != null) {
dis.close();
dis = null;
}
} catch (IOException e) {
System.out.println("Pipe2 finalize: " + e);
}
}

private String getXML(String source) {

String line= new String(source);
String ele = new String();

ele = new String();
String SEPARATOR = "\\|"; String sToken = null;

String[] sa = line.split(SEPARATOR);
String elel = new String();

for (int i = 0; i<NUM_COLS; i++)
{
if (i>(sa.length-1)) {sToken = new String();} else {sToken = sa[i];}
elel="<"+_columnNames[i]+">"+sToken+"</"+_columnNames[i]+">";
ele=ele.concat(elel);
}
return ele;
}
private String setHDR() {
String ele = new String();
String hdr1=("<?xml version=#1.0# encoding=#UTF-8#?>"); hdr1=hdr1.replace('#','"');
String hdr2=("<hlp_data>");
ele=ele.concat(hdr1);ele=ele.concat(hdr2);
return ele;
}
private String lineHDR() {
String ele = new String();
String hdr4= ("<"+TABLE_NAME+">");
ele=ele.concat(hdr4);
return ele;
}
private String lineTLR() {
String ele = new String();
String hdr5=("</"+TABLE_NAME+">");
ele=ele.concat(hdr5);
return ele;
}
private String setTLR() {
String ele=new String();
String hdr3=("</hlp_data>");
ele=ele.concat(hdr3);
return ele;
}

}
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic