Win a copy of Re-engineering Legacy Software this week in the Refactoring forum
or Docker in Action in the Cloud/Virtualization forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

PDF to Html convertion in jsp using java

 
Nazeer Ahammad
Ranch Hand
Posts: 43
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,
I'm using below code to convert pdf file to Html. It was printing table content as string.
Example: suppose pdf has table content like below
...................................
| Header |
...................................
TD1 | TD2 | TD3 | TD4 |
...................................

If i use below jsp code.
I'm getting Output as like below

Header TD1 TD2 TD2 TD3 TD4


<%@page import="com.itextpdf.text.pdf.parser.PdfTextExtractor"%>
<%@page import="com.itextpdf.text.pdf.PdfReader"%>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
pageEncoding="ISO-8859-1"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>View page</title>
</head>
<body>
<%! String page1;%>
<%! String[] pagescon; %>
<%! String pages="Nazeer\nAhammad\nDudekula"; %>

<%
PdfReader reader = new PdfReader("D:/tablecontent.pdf");
System.out.println("This PDF has "+reader.getNumberOfPages()+" pages.");
PdfTextExtractor.getTextFromPage(reader, 1);
page1=PdfTextExtractor.getTextFromPage(reader, 1).replaceAll("\\s"," ");
pagescon=page1.split("\n");

for(int i=0;i<pagescon.length;i++)
{
%>
<br> <%= pagescon[i]%>

<br>
<%} %>
</body>
</html>

please anyone give solution.

Thank you,
Nazeer.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13056
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Seems to me that if you want extracted strings to be presented in an HTML table, you will have to write the HTML formatting yourself.

I would never try to do this with embedded code in a JSP. Instead I would create a class that could be tested outside the JSP/servlet environment. Once you get it producing well formatted HTML then see about using it in JSP.

Bill
 
Paul Clapham
Sheriff
Pie
Posts: 20764
30
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And it seems to me that if you use a class named PdfTextExtractor, it's only going to extract the text from the PDF. If the PDF contains formatting such as tables, it isn't going to tell you anything about that.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic