I’ve been working with some web pages that were written in 2005. I made some changes, but the page wasn’t displaying the way I wanted. The code has lots of nested tables with DIVs inside tables, so I probably just messed up on opening or closing something. The easiest way to find these kind of mistakes is to validate the code and fix the errors. Because the code is so old, it doesn’t validate as XHTML transitional so there were hundreds of errors. Most of the issues are related to capitalization, but a few are because the tags are not closed. I fixed one file by hand, but since I have lots of files that I am working with, I created this sed script to automate the process.
#### ConvertHTML.sed created 2016-01-09
#### Updated 2016-01-15
#### The global flag g is to be required for multiple occurrences on the same line
#### Sometimes the code is in JavaScript functions, so use single quotes instead of double quotes when replacing
#### TH is part of WIDTH, so need to use < and >
#### SELECT and TABLE are MySQL commands so make sure to use the < and >
s/HTML>/html>/g
s/HEAD>/head>/g
s/TITLE>/title>/g
s/BODY>/body>/g
s/META NAME=/meta name=/g
s/<LINK REL=/<link rel=/g
# Change the case and add the type
s/<SCRIPT LANGUAGE="JavaScript">/<script language="Javascript" type="text\/javascript">/g
s/<SCRIPT/<script/g
s/SCRIPT>/script>/g
#### Tables Be careful with TD, TR, TH, parts are in other tags
s/<TABLE/<table/g
s/<TD/<td/g
s/<TR/<tr/g
s/<TH/<th/g
s/TABLE>/table>/g
s/TD>/td>/g
s/TR>/tr>/g
s/TH>/th>/g
s/COLSPAN/colspan/g
s/ROWSPAN/rowspan/g
s/VALIGN=/valign=/g
s/=TOP/='top'/g
s/=BOTTOM/='bottom'/g
s/=CENTER/='center'/g
s/=top/='top'/g
s/=bottom/='bottom'/g
s/=center/='center'/g
s/ALIGN=/align=/g
s/=RIGHT/='right'/g
s/=LEFT/='left'/g
s/=right/='right'/g
s/=left/='left'/g
s/CELLPADDING/cellpadding/g
s/CELLSPACING/cellspacing/g
s/BORDER/border/g
# Make the tag conform
s/NOWRAP>/nowrap='nowrap'>/g
s/NOWRAP /nowrap='nowrap' /g
s/<HR>/<hr \/>/g
s/<BR>/<br \/>/g
s/<BR\/>/<br \/>/g
s/CENTER/center/g
s/<DIV/<div/g
s/DIV>/div>/g
s/H1/h1/g
s/H2/h2/g
s/H3/h3/g
s/H4/h4/g
s/H5/h5/g
s/H6/h6/g
s/<P/<p/g
s/P>/p>/g
s/CLASS=/class=/g
s/ID=/id=/g
s/STYLE=/style=/g
s/<SELECT/<select/g
s/SELECT>/select>/g
s/<IMG/<img/g
s/ALT=/alt=/g
s/SRC=/src=/g
s/A HREF/a href/g
s/<\/A>/<\/a>/g
s/_NEW/_blank/g
s/<B>/<b>/g
s/<\/B>/<\/b>/g
s/STRONG/strong/g
s/SPAN/span/g
s/<UL/<ul/g
s/UL>/ul>/g
s/<LI/<li/g
s/LI>/li>/g
s/HEIGHT=/height=/g
s/WIDTH=/width=/g
s/SIZE=/size=/g
s/FONT/font/g
s/COLOR=/color=/g
s/TYPE=/type=/g
s/Type=/type=/g
s/VALUE=/value=/g
s/NAME=/name=/g
s/<INPUT/<input/g
s/<FORM/<form/g
s/FORM>/form>/g
s/<OPTION/<option/g
s/OPTION>/option>/g
s/<INPUT/<input/g
s/<TEXTAREA/<textarea/g
s/TEXTAREA>/textarea>/g
s/ROWS/rows/g
s/COLS/cols/g
s/VALUE=/value=/g
s/METHOD=POST/method="post"/g
s/ACTION=/action=/g
s/TARGET=/target=/g
# JavaScript Calls
s/onLoad/onload/g
s/onMouse/onmouse/g
s/onmouseOut/onmouseout/g
s/onmouseOver/onmouseover/g
s/onChange/onchange/g
s/onSubmit/onsubmit/g
s/onClick/onclick/g
s/onError/onerror/g
s/ONERROR/onerror/g
s/cellspacing=\([0-9]*\)/cellspacing=\'\1\'/g
s/cellpadding=\([0-9]*\)/cellpadding=\'\1\'/g
# These can be percent
s/width=\([0-9]*\)%/width=\'\1%\'/g
s/height=\([0-9]*\)%/height=\'\1%\'/g
s/border=\([0-9]*\)%/border=\'\1%\'/g
s/width=\([0-9]*\)/width=\'\1\'/g
s/height=\([0-9]*\)/height=\'\1\'/g
s/border=\([0-9]*\)/border=\'\1\'/g
s/colspan=\([0-9]*\)/colspan=\'\1\'/g
# Should be able to match one or more in the previous with \+ but it isn’t working
s/\'\'\'/\'/g
s/\'\'\"/\"/g
# make the selected tag conform. Mine are in perl statements and conditionals
s/selected\"/selected='selected'\"/g
# Lots of image tags aren’t closed
#s/<img \([0-9a-zA-Z\=\/\.\'\"]*\)>/<img \1 a\/>/g
To run the code, save it in a file—mine is called ConvertHTML.sed, then pipe the output to a temporary file for review.
sed -f ./ConvertHTML.sed original.html converted.html
Fix the img tags for the closing slash and for alt=”. Then check for validation. Once you are happy with it, copy it to your original code. I just started using this file, so I’ll probably make updates for tags that I missed. I put the date at the top so you can tell if it is the latest version.