File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes JSP and the fly likes PDF to Html convertion in jsp using java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » JSP
Bookmark "PDF to Html convertion in jsp using java" Watch "PDF to Html convertion in jsp using java" New topic

PDF to Html convertion in jsp using java

Nazeer Ahammad
Ranch Hand

Joined: Feb 26, 2012
Posts: 43
Hi All,
I'm using below code to convert pdf file to Html. It was printing table content as string.
Example: suppose pdf has table content like below
| Header |
TD1 | TD2 | TD3 | TD4 |

If i use below jsp code.
I'm getting Output as like below

Header TD1 TD2 TD2 TD3 TD4

<%@page import="com.itextpdf.text.pdf.parser.PdfTextExtractor"%>
<%@page import="com.itextpdf.text.pdf.PdfReader"%>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "">
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>View page</title>
<%! String page1;%>
<%! String[] pagescon; %>
<%! String pages="Nazeer\nAhammad\nDudekula"; %>

PdfReader reader = new PdfReader("D:/tablecontent.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
PdfTextExtractor.getTextFromPage(reader, 1);
page1=PdfTextExtractor.getTextFromPage(reader, 1).replaceAll("\\s"," ");

for(int i=0;i<pagescon.length;i++)
<br> <%= pagescon[i]%>

<%} %>

please anyone give solution.

Thank you,
William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13036
Seems to me that if you want extracted strings to be presented in an HTML table, you will have to write the HTML formatting yourself.

I would never try to do this with embedded code in a JSP. Instead I would create a class that could be tested outside the JSP/servlet environment. Once you get it producing well formatted HTML then see about using it in JSP.

Paul Clapham

Joined: Oct 14, 2005
Posts: 19973

And it seems to me that if you use a class named PdfTextExtractor, it's only going to extract the text from the PDF. If the PDF contains formatting such as tables, it isn't going to tell you anything about that.
I agree. Here's the link:
subject: PDF to Html convertion in jsp using java
It's not a secret anymore!