• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Issue with Web Harvest removing spaces after closing tags

 
Ajay Dhar
Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:



I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.


Help is greatly appreciated.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic