| « Hotmail Server Changes | New Documentation Site coming! » |
Stripping HTML code for AltBody and PHPMailer-FE
As the developers, we obviously use PHPMailer extensively in our applications. In some, we were using HTML2Text, a very good utility available at chuggnut.com. For basic forms processing, HTML2Text is overkill and does not render the forms data properly -- particularly the tables, th, td, and tr tags.
We modified several functions that we use in our content management systems and in our own PHPMailer scripts and wish to discuss those here and provide them for your use: same license as PHPMailer, LGPL -- please attribute properly.
The two functions that we modified are:
- A function that strips out the <body> tag through to the </body> tag, inclusive. We use this in a commercial email marketing application to strip out all the HTML tags above and including <body ... > and strip out all the HTML tags below and including </body> (meaning exclusive of the body tags) ... the modifications are to inverse the results returning only the inclusive portion.
- A function that:
- converts HTML entities to character representations
- strips out all new line characters and spaces after the closing tag element
- converts </td></tr> to new line characters
- converts </td> to a colon and space
- then strips all tags
The first function is to strip out certain code that is not processed by other HTML to Text conversion utilities. One example, is the <style></style> tags and everything contained within those two tags.
function _stripStartEndStr($str,$startTag='<style>',$endTag='</style>') {
/* Copyright Andy Prevost */
$startTag = strtolower($startTag);
$endTag = strtolower($endTag);
$lower_contents = strtolower($str);
// determine if a $startTag tag exists and process if necessary
do { $posStart = strpos($lower_contents,$startTag);
if ( $posStart !== false ) {
$posEndStart = strpos($lower_contents, $endTag);
$posEnd = $posEndStart + strlen($endTag) + 1;
$posEnd = $posEnd - $posStart;
// return stripped out tags and contents
$strPart = substr($str, $posStart, $posEnd);
$str = str_replace($strPart,'',$str);
}
} while (0);
return $str;
}
To use this function, derive your HTML the normal way, then convert it to text:
$html = {whatever you normally do};
$text = _stripStartEndStr($html);
The next function does the actual HTML to Text conversion. Note that it will render your tables reasonably, convert all HTML entities to characters (like © to ©)
function _html2txt($html) {
/* Copyright Andy Prevost */
if (trim($html)=='') { return $html; }
$text = htmlspecialchars_decode($html);
$text = str_replace("</table>", "</TABLE>", $text);
do { if (strpos($text," </TABLE>")) { $text = str_replace(" </TABLE>", "</TABLE>", $text); } else { break; } } while (0);
do { if (strpos($text,">\n\n")) { $text = str_replace(">\n\n", ">\n", $text); } else { break; } } while (0);
$text = str_replace(">\n", ">", $text);
$text = str_replace("</tr>", "</TR>", $text);
$text = str_replace("</td>", "</TD>", $text);
$text = str_replace("</th>", "</TH>", $text);
$text = str_replace("</TD></TR>", "\n", $text);
$text = str_replace("</TH></TR>", "\n", $text);
$text = str_replace("</TD>", ": ", $text);
$text = str_replace("</TH>", ": ", $text);
$text = str_replace("</TR>", "\n", $text);
$text = strip_tags($text);
return $text;
}
... then add your HTML content, and add your Text content
$mail->MsgHTML($html);
$mail->AltBody = _html2txt($text);
Enjoy!
Andy