Friday, August 6, 2010

html2xml

While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:
  • Unclosed Tags:
    HTMLtoXML("Hello") == '
    Hello
    '
  • Empty Elements:
    HTMLtoXML("") == ''
  • Block vs. Inline Elements:
    HTMLtoXML("Hello John") == 'Hello 
    John
    '
  • Self-closing Elements:
    HTMLtoXML("Hello
    World") == '
    Hello
    World
    '
  • Attributes Without Values:
    HTMLtoXML("") == ''
Note: It does not take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.