using the sed utility to add
attributes to an HTML table



NOTES: to be successful using these scripts, the first column in the table must be sorted correctly or at least any duplicates must be grouped together. The scripts are complex, so READ all the text and the COMMENTS. Don't copy the scripts by hand; rather USE 'edit->copy'. If you are adapting these scripts for your own use, be forewarned: the longer sed scripts are especially fragile and will break easily (and disastrously) if not carefully modified and perhaps even if they ARE carefully modified!

Since I am familiar with regular expressions and the UNIX "sed" line command, I thought it would be easy to modify my acronym page to cross-reference them where they are used in the definitions. I knew it would be easy to change the first cell in each row to also be a link in the same page. Below is my first attempt to solve this problem. It works, but it also generated some links I didn't want as you will discover farther down the page. Here is the sed script which I used in a file via "sed -f <filename>":
	
/<TR>$/ {
	N
	s/<TD>\(.\{2,10\}\)<\/TD>/<TD><A name="\1">\1<\/A><\/TD>/
}
	
	
The 'N' command above might be replaced with 'n' without changing the result.

Briefly, the script recognized and changed the first cell in each row in a previous version of my acronym table. The above script changes HTML sequences which are like this:
	
	<TR>
		<TD>OEM</TD>
    
	
To HTML sequences like this:
	
	<TR>
 		<TD><A name="OEM">OEM</A></TD>
    
	
However, I needed to improve this script to cover two additional criteria. One is that the regular expression for the acronym was too simple and might recognize some cells in other tables. The other is that the first pass created 125 HTML errors caused by there being more than one definition for many of the acronyms. That was more errors than I wanted to correct by hand. Here is the next successful iteration of the script. It's more complicated!
	
/<TR>$/ {
 	n
 	/^[ ]*<TD>[-0-9A-Za-z/.]\{2,10\}<\/TD>$/ {
		x
		G
#		## next sed lines change the first match which does not have anything 
#		## in the hold space
		/^\n.*$/ {
			s/^\n\(.*\)$/\1/
			h
			s/<TD>\([-0-9A-Za-z/.]\{2,10\}\)<\/TD>/<TD><A name="\1">\1<\/A><\/TD>/
			b
		}
#		## next sed lines change the matches which duplicate the preceding 
#		## match - since both lines are identical, does not matter from which 
# 		## the patterns are saved.
		/^\(.*\)\n\1$/ {
			s/^\(.*\)\n\1$/\1/
			h
			s/<TD>\([-0-9A-Za-z/.]\{2,10\}\)<\/TD>/<TD><A name="\12">\1<\/A><\/TD>/
			b
		}
#		## next lines change  new matches which are not preceded by a duplicate
		/^.*\n.*$/ {
			s/^.*\n\(.*\)$/\1/
			h
			s/<TD>\([-0-9A-Za-z/.]\{2,10\}\)<\/TD>/<TD><A name="\1">\1<\/A><\/TD>/
			b
		}
	}
}
    
	
While it's not quite clear from reading the comments, the above script will only change the duplicates. This script reduced the number of HTML errors from 125 to 36. The presence of some triplicates and quadruplicates and some of the rows not in alphabetical order caused the 36 errors. I also found some rows were not modified because the acronym contained a space.

Here is the script which was successful in creating embedded references in all the acronym rows:
	
 
/<TR>$/ {
	n
	/^[ ]*<TD>[-0-9A-Za-z/. ]\{2,10\}<\/TD>$/ {
		G
#		## The following sed lines change only the first match because it does not 
#		## have anything in the hold space
		/^.*\n$/ {
			s/^\(.*\)\n$/\1/
			h
			s/<TD>\([-0-9A-Za-z/.]\{2,10\}\)<\/TD>/<TD><A name="\1">\1<\/A><\/TD>/
			b
		}
#		## The next sed lines change the match which duplicates the preceding match.
#		## Since both lines are identical, does not matter from which patterns are saved.
#		## Duplicate/triplicate/quadruplicate acronyms must be on successive rows.
		/^\(.*\)\n\1$/ {
			s/^\(.*\)\n\1$/\1_2/
			h
			s/<TD>\([-0-9A-Za-z/ .]\{2,10\}\)<\/TD>_2/<TD><A name="\12">\1<\/A><\/TD>/
			b
		}
#		## Changes the triplicates
		/^\(.*\)\n\1_2$/ {
			s/^\(.*\)\n\1_2$/\1_3/
			h
			s/<TD>\([-0-9A-Za-z/ .]\{2,10\}\)<\/TD>_3/<TD><A name="\13">\1<\/A><\/TD>/
			b
		}
#		## Changes the quadruplicates
		/^\(.*\)\n\1_3$/ {
			s/^\(.*\)\n\1_3$/\1_4/
			h
			s/<TD>\([-0-9A-Za-z/ .]\{2,10\}\)<\/TD>_4/<TD><A name="\14">\1<\/A><\/TD>/
			b
		}
#		## Changes the lines NOT preceded by a duplicate.
		/^.*\n.*$/ {
			s/^\(.*\)\n.*$/\1/
			h
			s/<TD>\([-0-9A-Za-z/ .]\{2,10\}\)<\/TD>/<TD><A name="\1">\1<\/A><\/TD>/
			b
		}
	}
}
	
	
Theoretically any finite number of replicated acronyms could be handled by adding enough lines to the script although at some point a different, shorter solution would be demanded by a requirement for simplicity.

NEXT: creating the cross-reference links.

Sed could use a more modern counterpart. I noted some features I would like to see in my Sed Wish List

kx Software Aesthetics


This web site is best viewed with FireFox, Chrome or Safari.

Valid XHTML 1.1 Valid CSS!