File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Issue with Web Harvest removing spaces after closing tags

Ajay Dhar
Ranch Hand
Posts: 30
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:

I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.

Help is greatly appreciated.
I agree. Here's the link:
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic