Convert HTML to XHTML compliant code.

I’ve been working with some web pages that were written in 2005. I made some changes, but the page wasn’t displaying the way I wanted. The code has lots of nested tables with DIVs inside tables, so I probably just messed up on opening or closing something. The easiest way to find these kind of mistakes is to validate the code and fix the errors. Because the code is so old, it doesn’t validate as XHTML transitional so there were hundreds of errors. Most of the issues are related to capitalization, but a few are because the tags are not closed. I fixed one file by hand, but since I have lots of files that I am working with, I created this sed script to automate the process.


#### ConvertHTML.sed created 2016-01-09
#### Updated 2016-01-15
#### The global flag g is to be required for multiple occurrences on the same line
#### Sometimes the code is in JavaScript functions, so use single quotes instead of double quotes when replacing
#### TH is part of WIDTH, so need to use < and >
#### SELECT and TABLE are MySQL commands so make sure to use the < and >

s/HTML>/html>/g
s/HEAD>/head>/g
s/TITLE>/title>/g
s/BODY>/body>/g
s/META NAME=/meta name=/g
s/<LINK REL=/<link rel=/g

# Change the case and add the type
s/<SCRIPT LANGUAGE="JavaScript">/<script language="Javascript" type="text\/javascript">/g
s/<SCRIPT/<script/g
s/SCRIPT>/script>/g

#### Tables Be careful with TD, TR, TH, parts are in other tags
s/<TABLE/<table/g
s/<TD/<td/g
s/<TR/<tr/g
s/<TH/<th/g

s/TABLE>/table>/g
s/TD>/td>/g
s/TR>/tr>/g
s/TH>/th>/g

s/COLSPAN/colspan/g
s/ROWSPAN/rowspan/g

s/VALIGN=/valign=/g
s/=TOP/='top'/g
s/=BOTTOM/='bottom'/g
s/=CENTER/='center'/g

s/=top/='top'/g
s/=bottom/='bottom'/g
s/=center/='center'/g

s/ALIGN=/align=/g
s/=RIGHT/='right'/g
s/=LEFT/='left'/g
s/=right/='right'/g
s/=left/='left'/g

s/CELLPADDING/cellpadding/g
s/CELLSPACING/cellspacing/g
s/BORDER/border/g

# Make the tag conform
s/NOWRAP>/nowrap='nowrap'>/g
s/NOWRAP /nowrap='nowrap' /g

s/<HR>/<hr \/>/g
s/<BR>/<br \/>/g
s/<BR\/>/<br \/>/g
s/CENTER/center/g
s/<DIV/<div/g
s/DIV>/div>/g

s/H1/h1/g
s/H2/h2/g
s/H3/h3/g
s/H4/h4/g
s/H5/h5/g
s/H6/h6/g

s/<P/<p/g
s/P>/p>/g

s/CLASS=/class=/g
s/ID=/id=/g
s/STYLE=/style=/g

s/<SELECT/<select/g
s/SELECT>/select>/g
s/<IMG/<img/g
s/ALT=/alt=/g
s/SRC=/src=/g

s/A HREF/a href/g
s/<\/A>/<\/a>/g
s/_NEW/_blank/g

s/<B>/<b>/g
s/<\/B>/<\/b>/g
s/STRONG/strong/g
s/SPAN/span/g

s/<UL/<ul/g
s/UL>/ul>/g
s/<LI/<li/g
s/LI>/li>/g

s/HEIGHT=/height=/g
s/WIDTH=/width=/g
s/SIZE=/size=/g
s/FONT/font/g
s/COLOR=/color=/g
s/TYPE=/type=/g
s/Type=/type=/g
s/VALUE=/value=/g
s/NAME=/name=/g
s/<INPUT/<input/g
s/<FORM/<form/g
s/FORM>/form>/g
s/<OPTION/<option/g
s/OPTION>/option>/g
s/<INPUT/<input/g

s/<TEXTAREA/<textarea/g
s/TEXTAREA>/textarea>/g
s/ROWS/rows/g
s/COLS/cols/g

s/VALUE=/value=/g
s/METHOD=POST/method="post"/g
s/ACTION=/action=/g
s/TARGET=/target=/g

# JavaScript Calls
s/onLoad/onload/g
s/onMouse/onmouse/g
s/onmouseOut/onmouseout/g
s/onmouseOver/onmouseover/g

s/onChange/onchange/g
s/onSubmit/onsubmit/g
s/onClick/onclick/g
s/onError/onerror/g
s/ONERROR/onerror/g

s/cellspacing=\([0-9]*\)/cellspacing=\'\1\'/g
s/cellpadding=\([0-9]*\)/cellpadding=\'\1\'/g
# These can be percent
s/width=\([0-9]*\)%/width=\'\1%\'/g
s/height=\([0-9]*\)%/height=\'\1%\'/g
s/border=\([0-9]*\)%/border=\'\1%\'/g

s/width=\([0-9]*\)/width=\'\1\'/g
s/height=\([0-9]*\)/height=\'\1\'/g
s/border=\([0-9]*\)/border=\'\1\'/g
s/colspan=\([0-9]*\)/colspan=\'\1\'/g

# Should be able to match one or more in the previous with \+ but it isn’t working
s/\'\'\'/\'/g
s/\'\'\"/\"/g

# make the selected tag conform. Mine are in perl statements and conditionals
s/selected\"/selected='selected'\"/g

# Lots of image tags aren’t closed
#s/<img \([0-9a-zA-Z\=\/\.\'\"]*\)>/<img \1 a\/>/g

To run the code, save it in a file—mine is called ConvertHTML.sed, then pipe the output to a temporary file for review.


sed -f ./ConvertHTML.sed original.html converted.html

Fix the img tags for the closing slash and for alt=”. Then check for validation. Once you are happy with it, copy it to your original code. I just started using this file, so I’ll probably make updates for tags that I missed. I put the date at the top so you can tell if it is the latest version.

Changing permissions on files

I recently inherited a system and wanted to be edit all of the website files. I changed the group to admin and added myself to the group. Then I wanted to change all of the directories so I could have access to them, and change all of the file permissions so the group had read-write access and others (including Apache) had only read access. The command to do that is:

sudo chmod -R u+rwX,go+rwX,o-w .

-R recursively change
u+rwX users to read and write, and if it is a directory, also execute
go+rwX same as users for group and others
o-w then remove the write permission for others
. start here

Log results of perl scripts

While you are debugging perl scripts, the default for print statements is to display on the console. Often you want to have more info than fits nicely on the console or you want to be able to search through the results. In that case you can redirect the print commands with a simple redirect. e.g.


./perl_test.pl > test_results.txt

Once the script is debugged and running as a cron job, you might still want to see that it has successfully completed or has generated errors. In that case, redirecting the STDOUT and STDERR to log files is what you want to do.

I created a directory /var/log/My_logs and chmodded it to 755 and chowned it to admin. Then I added two lines at the beginning of the file.


#! /usr/bin/perl

use strict;
use warnings;

open(STDOUT, '>>',  $0 . ".log") or die "Can't open log";
open(STDERR, '>>',  $0 . ".error.log") or die "Can't open error log";

A couple of notes. the >> appends the results to the current file. $0 is the name of the perl script that is running. So if I run perl_test.pl, the log file is perl_test.l.log. Also note that that is the complete file path and file name. If you are running the script manually, you might want to create log files in the same directory as the script. But if the script is part of a cron job, you might be better off writing the log files to /var/log/. And you probably don’t want the entire script do die if the log file can’t be opened.

An alternative way to do the same thing is to split the file name into parts with the fileparse function in the File::Basename module.


#! /usr/bin/perl

use strict;
use warnings;

 my($filename, $directories, $suffix) = fileparse($path);

open(STDOUT, '>>', "/var/log/My_Project_logs/" . $filename. ".log");
open(STDERR, '>>', "/var/log/My_Project_logs/" . $filename. ".error.log");

If you run the script as you, but later want it to be run as a cron job, make sure you change the permissions of the log files so that they match the permissions of the owner of the script.

If you write out a bunch of stuff, make sure you clean out the log files from time to time. Otherwise, set up a log rotation in /etc/logrotate.d/.

Fix hyphenation in WordPress themes

A new blog that I installed using the TwentyFifteen theme has annoying hyphenation on the posts. I found an easy fix online. Open the style.css file and insert this at the end.


.entry-content,
.entry-summary,
.page-content,
.nav-links,
.comment-content,
.widget
 {
   -webkit-hyphens: none;
   -moz-hyphens:    none;
   -ms-hyphens:     none;
   hyphens:         none;
}

Perl gotcha’s–equality

If you try to compare strings using ==, and you have included the use warnings pragma, you will get a warning, as in the example below. However, Perl will still attempt to convert the string into a number. If the string starts with numbers, Perl will use these, otherwise the string equates to 0.

So my conditional,


if ($device_id == '5G0E3663') { ... }

compares two zeros and always returns true. Which in my case made bad things happen.
The correct way to compare strings is:


if ($device_id eq '5G0E3663') { ... }

h/t perlmeme.