• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

wrting bytearray to xml file

 
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi i am arun
i am trying to convert pdf file to xml file
actually i read the content from pdf and put it in byte array
but when i put byte array
in xml it is not working

i used the following code to insert byte array in to xml file
Testing pdf is classname
getRequestBufferAsBytes(request) is astatic method
byte[] test=TestingPdf.getRequestBufferAsBytes(request);
int ch;
File f=new File("C:\testarun\test.xml");
FileOutputStream fout=new FileOutputStream(f);
for(int i=0;i<test.length;i++)
{
fout.write(test);
}
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Firstly, you can't convert a PDF to XML. The best you can aim for is to embed the bytes that make up the PDF file in an XML file. That would be a strange thing to do, though, and I'd be curious to hear why you'd want to do that; there may be better ways to achieve what you're trying to do.

If you really do want to do that, be aware that XML is a text format, while PDF is a binary format. So you need to encode the contents of the PDF files before you can insert them into an XML file. A simple encoding like base-64 (as implemented by the Apache Commons Codec library, amongst others) will suffice.

Lastly, the code you posted doesn't create an XML file. It just writes the bytes of the PDF into a file with an ".xml" extension, which doesn't make its contents XML. I'd suggest to read up on XML before proceeding; the http://faq.javaranch.com/java/XmlFaq links to a couple of introductions.
[ May 20, 2008: Message edited by: Ulf Dittmer ]
 
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

If you are keen on manipulating PDF files you can probably try using iText which is a JAVA library that can be used to generate complex PDF document.

I haven't come across a requirement where you generate XML files on the fly using it but there must be some methods in its APIs.I had used it some time back and think that it's the best alternative for you.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

If you are keen on manipulating PDF files you can probably try using iText which is a JAVA library that can be used to generate complex PDF document.


iText is a library for creating PDFs. While it can perform certain modifications on existing PDFs, it doesn't sound from the (limited) problem description that that's what is being asked here.

I haven't come across a requirement where you generate XML files on the fly using it but there must be some methods in its APIs.I had used it some time back and think that it's the best alternative for you.


There are any number of ways for creating XML, some of which are part of the core J2SE library. But we don't know enough about the problem to say whether using XML is appropriate in this case.
 
madupathi arun
Greenhorn
Posts: 23
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi
actually i am having following requirement

i have to download pdf from server and save it as a xml in local machine

i neeed to run one offline tool to read xml and take i/p from end user

later I upload that file in to server


I changed my code but i am getting following error

[Fatal Error] :1:1: Content is not allowed in prolog.
Exception in thread "main" org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at parser.Writingxml.bytesToXml(Writingxml.java:27)
at parser.Writingxml.main(Writingxml.java:36)


my code is

protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
byte[] test=TestingPdf.getRequestBufferAsBytes(request);
byte[]test5=Base64.encodeBase64(test);
org.w3c.dom.Document test4 = null;
try {
test4 = TestingPdf.bytesToXml(test5);
} catch (ParserConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (org.xml.sax.SAXException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = null;
try {
trans = transfac.newTransformer();
} catch (TransformerConfigurationException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
StringWriter sw = new StringWriter();
StreamResult result = new StreamResult(sw);
DOMSource source = new DOMSource(test4);
try {
trans.transform(source, result);
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
String xmlString = sw.toString();

//print xml
System.out.println("Here's the xml:\n\n" + xmlString);
}

private static org.w3c.dom.Document bytesToXml(byte[] test5) throws ParserConfigurationException, org.xml.sax.SAXException, IOException {
// TODO Auto-generated method stub
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(test5));
}

public static byte[] getRequestBufferAsBytes(HttpServletRequest request) throws IOException, ServletException
{
ServletInputStream oInput = request.getInputStream();


long nContentLength = request.getContentLength();

@SuppressWarnings("unused")
String contentType = request.getContentType();
if(nContentLength <= 0L)
return null;
byte cContent[] = new byte[(int)nContentLength];
int nRead = 0;
int nToRead = (int)nContentLength;
int nBlkSize = 512;
byte cTemp[] = new byte[512];
do {
int n = 0;
int i = 0;
if(nToRead - nRead < 512)
nBlkSize = nToRead - nRead;
n = oInput.read(cTemp, 0, nBlkSize);
for(i = 0; i < n; i++)
cContent[i + nRead] = cTemp[i];
nRead += i;
} while(nRead < nToRead);
@SuppressWarnings("unused")
Long nBytesRead = new Long(nRead);

return cContent;
}
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
PDF is a completely different format than XML. You can't expect to somehow obtain a DOM Document object from a PDF file.

Please re-read and consider my first post. You should get clarification on that requirement; it can't be fulfilled as you posted it here.
[ May 21, 2008: Message edited by: Ulf Dittmer ]
 
Ranch Hand
Posts: 32
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Arun,

Have a look at http://pdfbox.org/, which is an open source java library for working with PDF docs. Also have a look at the article http://discerning.com/hacks/docutils/pdf2xml/readme.html which might address your issue.

The point to bear in mind is that there is no one-to-one conversion from PDF to XML. You have to decide about what you want to extract from your PDF, how you want the information to be represented in XML etc. So please dig into any related open source code and select the solution which suits you.

Thanks.
 
reply
    Bookmark Topic Watch Topic
  • New Topic