File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Other Open Source Projects and the fly likes Issue with Web Harvest removing spaces after closing tags Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Products » Other Open Source Projects
Bookmark "Issue with Web Harvest removing spaces after closing tags" Watch "Issue with Web Harvest removing spaces after closing tags" New topic
Author

Issue with Web Harvest removing spaces after closing tags

Ajay Dhar
Ranch Hand

Joined: Jan 26, 2011
Posts: 30
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:



I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.


Help is greatly appreciated.


OCPJP 6, OCEEJBD 6, GIAC Secure Software Programmer-Java (GSSP-Java)
 
jQuery in Action, 2nd edition
 
subject: Issue with Web Harvest removing spaces after closing tags