• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

java program for pdf file to excel file conversion

 
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello friends
is there a way to write a java program to convert pdf file to excel file because currently i am doing the extraction from a word file which i am finding it very very difficult since pattern of extraction is not fixed .
since once the data is extracted to excel file it will be easy to extract from excel i feel .


ok currently i am extracting the text from pdf to word document which is in some format .

can i write a java program for converting a word document into excel file using any utility available in java


please could anybody help me out asap
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Am I understanding it correctly that the ultimate goal is to create a Word file? If so, why do you think going through an Excel file in between would make things easier? Is the data in the PDF largely composed of tables?

But regardless, PDFs are not meant for structured text extraction. While there are several libraries that can extract text from PDFs, all notion of layout or table formatting is lost. PDFs are meant for viewing and printing, for the most part. Just about everything else is complicated or impossible.
 
pavithra murthy
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello friend

1 . pdf to text extraction done using PjText.jar file
2. the text file should now be converted to excel file

the only reason behind it is ,,,the data that i want to extract will be varying based on the requirement .

so i want to convert the word file into excel file so that extraction will be more easy .

please could anybody help me out asap
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can use the Apache POI library to read Word files and to write excel files. If you tell us in more detail what you're trying to do we may be able to give more specific advice.
 
pavithra murthy
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello friend

1 . pdf to text extraction done using PjText.jar file
2. the text file should now be converted to excel file

the only reason behind it is ,,,the data that i want to extract will be varying based on the requirement .

so i want to convert the word file into excel file so that extraction will be more easy .

please could anybody help me out
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Since we don't know how this conversion should work -keep in mind that we know neither the format of the text file, nor the format of the Excel file- there isn't much else we can tell you; only that you can use the classes in the java.io package to read text files, and the Apache POI library to create Excel files.

What -specifically- do you need help with?
 
pavithra murthy
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello i have writen a code currently .but i have commented this line
//<%@ page contentType="application/vnd.ms-excel"
do not know what is the equivalent statement in java . this program compiles and run fine but dont know is it because of the commented line that an output excel is not getting created .
please could anyone help me out asap .

 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe there's an exception that is ignored by the code:


} catch ( Exception ex ) {
}

 
pavithra murthy
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ulf Dittmer wrote:Maybe there's an exception that is ignored by the code:


} catch ( Exception ex ) {
}



You are right Mr.Ulm

its giving a file not found exception . but the file is present in the c:\\excel folder
 
pavithra murthy
Ranch Hand
Posts: 56
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hello friend ,
i tried with the program with a hard coded value "excelData[0][0]="2001338138". it is getting written .At hte same time i am using a bufferedreader and reading the data from the word file and storing it in string variable line .now it is getting displayed in the command prompt .
now how do i make each data in the word or command prompt to be written into excel file .,,,,so that later i may extract the contents from excel easily

sample data to be put in doc file
UST-IDNR: DE811128135
RECHNUNG
Beleg-Nr.:
2001338138
DEMO:
27.03.2009
Auftraggeber:

how can i make each data of word document or on command prompt to be written to excel file so that extraction will be easy . Currently just for testing purpose when i am hard coding the value the excel file is getting created displaying the value also in the first cell [0][0]



please can anyone help asap .
 
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
my 2 cents. There is other way using ICEpdf (for text extraction) and OpenOffice for creating excel file. In this solution you don't need to create temporary word/txt ... file (may be only for the sake of persistence or clustering).

Oleg.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you have a string that you want to chop into smaller pieces (like the line you're reading from the file), you can use the various methods of the String class, especially indexOf and substring. We could give more precise advice if we knew what the lines in the text file looked like, and how they should be stored in the Excel file.
 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
After reading your post I understand that your final goal is to have PDF document by extracting data from a word document. You are transforming the document text to XML using POI API and then to PDF.

My suggestion here is
1) Extract word document with PJText. I am assuming you have already done that.
2) Create the xml elements with the data available. You need a String in your java program which will hold the XML data. It would be same as creating columns in excel sheet.
3) Once XML is ready you can use Apche FOP engine to create PDF with XML data.

 
Hey! Wanna see my flashlight? It looks like this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic