This week's book giveaway is in the OCPJP forum.
We're giving away four copies of OCA/OCP Java SE 7 Programmer I & II Study Guide and have Kathy Sierra & Bert Bates on-line!
See this thread for details.
The moose likes Java in General and the fly likes java program for pdf file to excel file conversion Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of OCA/OCP Java SE 7 Programmer I & II Study Guide this week in the OCPJP forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "java program for pdf file to excel file conversion " Watch "java program for pdf file to excel file conversion " New topic
Author

java program for pdf file to excel file conversion

pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
hello friends
is there a way to write a java program to convert pdf file to excel file because currently i am doing the extraction from a word file which i am finding it very very difficult since pattern of extraction is not fixed .
since once the data is extracted to excel file it will be easy to extract from excel i feel .


ok currently i am extracting the text from pdf to word document which is in some format .

can i write a java program for converting a word document into excel file using any utility available in java


please could anybody help me out asap
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42370
    
  64
Am I understanding it correctly that the ultimate goal is to create a Word file? If so, why do you think going through an Excel file in between would make things easier? Is the data in the PDF largely composed of tables?

But regardless, PDFs are not meant for structured text extraction. While there are several libraries that can extract text from PDFs, all notion of layout or table formatting is lost. PDFs are meant for viewing and printing, for the most part. Just about everything else is complicated or impossible.


Ping & DNS - my free Android networking tools app
pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
hello friend

1 . pdf to text extraction done using PjText.jar file
2. the text file should now be converted to excel file

the only reason behind it is ,,,the data that i want to extract will be varying based on the requirement .

so i want to convert the word file into excel file so that extraction will be more easy .

please could anybody help me out asap
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42370
    
  64
You can use the Apache POI library to read Word files and to write excel files. If you tell us in more detail what you're trying to do we may be able to give more specific advice.
pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
hello friend

1 . pdf to text extraction done using PjText.jar file
2. the text file should now be converted to excel file

the only reason behind it is ,,,the data that i want to extract will be varying based on the requirement .

so i want to convert the word file into excel file so that extraction will be more easy .

please could anybody help me out
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42370
    
  64
Since we don't know how this conversion should work -keep in mind that we know neither the format of the text file, nor the format of the Excel file- there isn't much else we can tell you; only that you can use the classes in the java.io package to read text files, and the Apache POI library to create Excel files.

What -specifically- do you need help with?
pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
hello i have writen a code currently .but i have commented this line
//<%@ page contentType="application/vnd.ms-excel"
do not know what is the equivalent statement in java . this program compiles and run fine but dont know is it because of the commented line that an output excel is not getting created .
please could anyone help me out asap .

Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42370
    
  64
Maybe there's an exception that is ignored by the code:

} catch ( Exception ex ) {
}
pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
Ulf Dittmer wrote:Maybe there's an exception that is ignored by the code:

} catch ( Exception ex ) {
}


You are right Mr.Ulm

its giving a file not found exception . but the file is present in the c:\\excel folder
pavithra murthy
Ranch Hand

Joined: Feb 06, 2009
Posts: 56
hello friend ,
i tried with the program with a hard coded value "excelData[0][0]="2001338138". it is getting written .At hte same time i am using a bufferedreader and reading the data from the word file and storing it in string variable line .now it is getting displayed in the command prompt .
now how do i make each data in the word or command prompt to be written into excel file .,,,,so that later i may extract the contents from excel easily

sample data to be put in doc file
UST-IDNR: DE811128135
RECHNUNG
Beleg-Nr.:
2001338138
DEMO:
27.03.2009
Auftraggeber:

how can i make each data of word document or on command prompt to be written to excel file so that extraction will be easy . Currently just for testing purpose when i am hard coding the value the excel file is getting created displaying the value also in the first cell [0][0]



please can anyone help asap .
Oleg Tikhonov
Ranch Hand

Joined: Aug 02, 2008
Posts: 55
my 2 cents. There is other way using ICEpdf (for text extraction) and OpenOffice for creating excel file. In this solution you don't need to create temporary word/txt ... file (may be only for the sake of persistence or clustering).

Oleg.
Ulf Dittmer
Marshal

Joined: Mar 22, 2005
Posts: 42370
    
  64
If you have a string that you want to chop into smaller pieces (like the line you're reading from the file), you can use the various methods of the String class, especially indexOf and substring. We could give more precise advice if we knew what the lines in the text file looked like, and how they should be stored in the Excel file.
Sanjeev K Jain
Greenhorn

Joined: May 11, 2009
Posts: 7
After reading your post I understand that your final goal is to have PDF document by extracting data from a word document. You are transforming the document text to XML using POI API and then to PDF.

My suggestion here is
1) Extract word document with PJText. I am assuming you have already done that.
2) Create the xml elements with the data available. You need a String in your java program which will hold the XML data. It would be same as creating columns in excel sheet.
3) Once XML is ready you can use Apche FOP engine to create PDF with XML data.

 
Don't get me started about those stupid light bulbs.
 
subject: java program for pdf file to excel file conversion