aspose file tools*
The moose likes I/O and Streams and the fly likes wrting bytearray to xml file Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "wrting bytearray to xml file" Watch "wrting bytearray to xml file" New topic
Author

wrting bytearray to xml file

madupathi arun
Greenhorn

Joined: Feb 25, 2008
Posts: 23
hi i am arun
i am trying to convert pdf file to xml file
actually i read the content from pdf and put it in byte array
but when i put byte array
in xml it is not working

i used the following code to insert byte array in to xml file
Testing pdf is classname
getRequestBufferAsBytes(request) is astatic method
byte[] test=TestingPdf.getRequestBufferAsBytes(request);
int ch;
File f=new File("C:\testarun\test.xml");
FileOutputStream fout=new FileOutputStream(f);
for(int i=0;i<test.length;i++)
{
fout.write(test);
}
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
Firstly, you can't convert a PDF to XML. The best you can aim for is to embed the bytes that make up the PDF file in an XML file. That would be a strange thing to do, though, and I'd be curious to hear why you'd want to do that; there may be better ways to achieve what you're trying to do.

If you really do want to do that, be aware that XML is a text format, while PDF is a binary format. So you need to encode the contents of the PDF files before you can insert them into an XML file. A simple encoding like base-64 (as implemented by the Apache Commons Codec library, amongst others) will suffice.

Lastly, the code you posted doesn't create an XML file. It just writes the bytes of the PDF into a file with an ".xml" extension, which doesn't make its contents XML. I'd suggest to read up on XML before proceeding; the http://faq.javaranch.com/java/XmlFaq links to a couple of introductions.
[ May 20, 2008: Message edited by: Ulf Dittmer ]
Rajat Bhatnagar
Greenhorn

Joined: Mar 11, 2008
Posts: 22
Hi

If you are keen on manipulating PDF files you can probably try using iText which is a JAVA library that can be used to generate complex PDF document.

I haven't come across a requirement where you generate XML files on the fly using it but there must be some methods in its APIs.I had used it some time back and think that it's the best alternative for you.


Regards<br />Rajat Bhatnagar<br /><a href="http://guideofgreatness.googlepages.com" target="_blank" rel="nofollow">http://guideofgreatness.googlepages.com</a>
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
If you are keen on manipulating PDF files you can probably try using iText which is a JAVA library that can be used to generate complex PDF document.

iText is a library for creating PDFs. While it can perform certain modifications on existing PDFs, it doesn't sound from the (limited) problem description that that's what is being asked here.

I haven't come across a requirement where you generate XML files on the fly using it but there must be some methods in its APIs.I had used it some time back and think that it's the best alternative for you.

There are any number of ways for creating XML, some of which are part of the core J2SE library. But we don't know enough about the problem to say whether using XML is appropriate in this case.
madupathi arun
Greenhorn

Joined: Feb 25, 2008
Posts: 23
hi
actually i am having following requirement

i have to download pdf from server and save it as a xml in local machine

i neeed to run one offline tool to read xml and take i/p from end user

later I upload that file in to server


I changed my code but i am getting following error

[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at parser.Writingxml.bytesToXml(Writingxml.java:27)
at parser.Writingxml.main(Writingxml.java:36)


my code is

protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
byte[] test=TestingPdf.getRequestBufferAsBytes(request);
byte[]test5=Base64.encodeBase64(test);
org.w3c.dom.Document test4 = null;
try {
test4 = TestingPdf.bytesToXml(test5);
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (org.xml.sax.SAXException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = null;
try {
trans = transfac.newTransformer();
} catch (TransformerConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(test4);
try {
trans.transform(source, result);
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String xmlString = sw.toString();

//print xml
System.out.println("Here's the xml:\n\n" + xmlString);
}

private static org.w3c.dom.Document bytesToXml(byte[] test5) throws ParserConfigurationException, org.xml.sax.SAXException, IOException {
// TODO Auto-generated method stub
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(test5));
}

public static byte[] getRequestBufferAsBytes(HttpServletRequest request) throws IOException, ServletException
{
ServletInputStream oInput = request.getInputStream();


long nContentLength = request.getContentLength();

@SuppressWarnings("unused")
String contentType = request.getContentType();
if(nContentLength <= 0L)
return null;
byte cContent[] = new byte[(int)nContentLength];
int nRead = 0;
int nToRead = (int)nContentLength;
int nBlkSize = 512;
byte cTemp[] = new byte[512];
do {
int n = 0;
int i = 0;
if(nToRead - nRead < 512)
nBlkSize = nToRead - nRead;
n = oInput.read(cTemp, 0, nBlkSize);
for(i = 0; i < n; i++)
cContent[i + nRead] = cTemp[i];
nRead += i;
} while(nRead < nToRead);
@SuppressWarnings("unused")
Long nBytesRead = new Long(nRead);

return cContent;
}
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42906
    
  69
PDF is a completely different format than XML. You can't expect to somehow obtain a DOM Document object from a PDF file.

Please re-read and consider my first post. You should get clarification on that requirement; it can't be fulfilled as you posted it here.
[ May 21, 2008: Message edited by: Ulf Dittmer ]
Prakash Subramanian
Ranch Hand

Joined: Feb 03, 2005
Posts: 32
Hello Arun,

Have a look at http://pdfbox.org/, which is an open source java library for working with PDF docs. Also have a look at the article http://discerning.com/hacks/docutils/pdf2xml/readme.html which might address your issue.

The point to bear in mind is that there is no one-to-one conversion from PDF to XML. You have to decide about what you want to extract from your PDF, how you want the information to be represented in XML etc. So please dig into any related open source code and select the solution which suits you.

Thanks.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: wrting bytearray to xml file