Recently I had to move bunches of tables from an old system to a Drupal site. The table data was heavily infested with the crappy HTML inserted from Microsoft Word.

The MS HTML was 1) Redundant and making the HTML almost 5 times its actual size and 2) Breaking the page HTML on the new system, at times.

Used the htmLawed Library from…

Once included, it is as simple as
$htmlawedsettings can carry a multitude of settings as explained in…

However at a minimum you can have it to be
'clean_ms_char' =?> 2

Last but not the least, as the migration script was a Drupal module, included the htmLawed.php, places in the same folder as the .module file as

There you go. Sparkling clean HTML that is close to being w3c compliant!

Submitted by tanay on Wed, 04/18/2012 - 05:11