Win a copy of Testing JavaScript Applications this week in the HTML Pages with CSS and JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Bear Bibeault
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
  • Tim Cooke
  • Liutauras Vilda
  • Junilu Lacar
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • fred rosenberger
  • salvin francis
  • Piet Souris
  • Frits Walraven
  • Carey Brown

Issue with Web Harvest removing spaces after closing tags

Ranch Hand
Posts: 30
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
How do I prevent Web Harvest from removing the space after closing tags when I convert html to xml? My configuration file is shown below:

I'm using Web Harvest to extract the paragraphs (<p></p>) from an HTML page. But there's an issue. Web Harvest is removing the space after the closing tags like </b> and </a>. When I remove the HTML tags using JSoup from the results of Web Harvest there is no space between the text of a link and the following word. The same happens for text that was in bold.

Help is greatly appreciated.
Consider Paul's rocket mass heater.
    Bookmark Topic Watch Topic
  • New Topic