I am developing a web application that will be used to search for text in a set of pdf files ( about 150 pdf each containing minimum of 800pages). i am using itext jars to access contents in the pdfs. i run a loop on each file and every page in the file extracting the content and searching for the text in the extracted contents. This takes a very long while to return output and its decreases the performance of the application.
public HashMap<String,String> searchForText(String searchKey,String Category){
Pattern expression =Pattern.compile(searchKey,Pattern.CASE_INSENSITIVE);
Vector<String> files=searchForFiles(Category); // generating a list of all pdf files in that category
Vector<String> filesFound= new Vector<String>();
for(String file : files){
try {
Document document = new Document();
document.open();
com.itextpdf.text.pdf.PdfReader reader = new com.itextpdf.text.pdf.PdfReader(baseDir+Category+"\\"+file);
for(int i=1; i<=reader.getNumberOfPages(); i++){
String fileContents = PdfTextExtractor.getTextFromPage(reader, i);
System.out.println(PdfTextExtractor.getTextFromPage(reader, i));
Matcher matcher = expression.matcher( fileContents );
if( matcher.find())
{
filesFound.add(file);
}
}
}
catch (Exception e) {
}
}
return filesFound;
}
Can any one advice on a better way of doing these.
public HashMap<String,String> searchForText(String searchKey,String Category){
Pattern expression =Pattern.compile(searchKey,Pattern.CASE_INSENSITIVE);
Vector<String> files=searchForFiles(Category); // generating a list of all pdf files in that category
Vector<String> filesFound= new Vector<String>();
for(String file : files){
try {
Document document = new Document();
document.open();
com.itextpdf.text.pdf.PdfReader reader = new com.itextpdf.text.pdf.PdfReader(baseDir+Category+"\\"+file);
for(int i=1; i<=reader.getNumberOfPages(); i++){
String fileContents = PdfTextExtractor.getTextFromPage(reader, i);
System.out.println(PdfTextExtractor.getTextFromPage(reader, i));
Matcher matcher = expression.matcher( fileContents );
if( matcher.find())
{
filesFound.add(file);
}
}
}
catch (Exception e) {
}
}
return filesFound;
}
Can any one advice on a better way of doing these.