my dog learned polymorphism*
The moose likes JSP and the fly likes PDF to Html convertion in jsp using java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Spring in Action this week in the Spring forum!
JavaRanch » Java Forums » Java » JSP
Bookmark "PDF to Html convertion in jsp using java" Watch "PDF to Html convertion in jsp using java" New topic
Author

PDF to Html convertion in jsp using java

Nazeer Ahammad
Ranch Hand

Joined: Feb 26, 2012
Posts: 43
Hi All,
I'm using below code to convert pdf file to Html. It was printing table content as string.
Example: suppose pdf has table content like below
...................................
| Header |
...................................
TD1 | TD2 | TD3 | TD4 |
...................................

If i use below jsp code.
I'm getting Output as like below

Header TD1 TD2 TD2 TD3 TD4


<%@page import="com.itextpdf.text.pdf.parser.PdfTextExtractor"%>
<%@page import="com.itextpdf.text.pdf.PdfReader"%>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>View page</title>
</head>
<body>
<%! String page1;%>
<%! String[] pagescon; %>
<%! String pages="Nazeer\nAhammad\nDudekula"; %>

<%
PdfReader reader = new PdfReader("D:/tablecontent.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
PdfTextExtractor.getTextFromPage(reader, 1);
page1=PdfTextExtractor.getTextFromPage(reader, 1).replaceAll("\\s"," ");
pagescon=page1.split("\n");

for(int i=0;i<pagescon.length;i++)
{
%>
<br> <%= pagescon[i]%>

<br>
<%} %>
</body>
</html>

please anyone give solution.

Thank you,
Nazeer.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12806
    
    5
Seems to me that if you want extracted strings to be presented in an HTML table, you will have to write the HTML formatting yourself.

I would never try to do this with embedded code in a JSP. Instead I would create a class that could be tested outside the JSP/servlet environment. Once you get it producing well formatted HTML then see about using it in JSP.

Bill
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18658
    
    8

And it seems to me that if you use a class named PdfTextExtractor, it's only going to extract the text from the PDF. If the PDF contains formatting such as tables, it isn't going to tell you anything about that.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: PDF to Html convertion in jsp using java