• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

PDF to Html convertion in jsp using java

 
Nazeer Ahammad
Ranch Hand
Posts: 43
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
I'm using below code to convert pdf file to Html. It was printing table content as string.
Example: suppose pdf has table content like below
...................................
| Header |
...................................
TD1 | TD2 | TD3 | TD4 |
...................................

If i use below jsp code.
I'm getting Output as like below

Header TD1 TD2 TD2 TD3 TD4


<%@page import="com.itextpdf.text.pdf.parser.PdfTextExtractor"%>
<%@page import="com.itextpdf.text.pdf.PdfReader"%>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>View page</title>
</head>
<body>
<%! String page1;%>
<%! String[] pagescon; %>
<%! String pages="Nazeer\nAhammad\nDudekula"; %>

<%
PdfReader reader = new PdfReader("D:/tablecontent.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
PdfTextExtractor.getTextFromPage(reader, 1);
page1=PdfTextExtractor.getTextFromPage(reader, 1).replaceAll("\\s"," ");
pagescon=page1.split("\n");

for(int i=0;i<pagescon.length;i++)
{
%>
<br> <%= pagescon[i]%>

<br>
<%} %>
</body>
</html>

please anyone give solution.

Thank you,
Nazeer.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems to me that if you want extracted strings to be presented in an HTML table, you will have to write the HTML formatting yourself.

I would never try to do this with embedded code in a JSP. Instead I would create a class that could be tested outside the JSP/servlet environment. Once you get it producing well formatted HTML then see about using it in JSP.

Bill
 
Paul Clapham
Sheriff
Pie
Posts: 20208
26
MySQL Database
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And it seems to me that if you use a class named PdfTextExtractor, it's only going to extract the text from the PDF. If the PDF contains formatting such as tables, it isn't going to tell you anything about that.
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic