using sed to add cross references
to an HTML table


NOTES: If you are adapting these scripts for your own use, be forewarned: the longer sed scripts are especially fragile and will break easily (and disastrously) if not carefully modified and perhaps even if they ARE carefully modified! Furthermore, the scripts are working scripts (on Mac OS Tiger) but are highly specific to my HTML code format, so they will probably not work for you as-is.

There is one file with a sed script and another shell file with inline sed commands. They work together. This is the sed script named "doit.sed":
	
/<TD><A name\="\(.*\)">\1<\/A><\/TD>/ {
	s/[ ]*<TD><A name\="\(.*\)">\1<\/A><\/TD>/\1/
	p		
}
d
	
	
This is the bash shell:
	
#!/bin/bash
# Get a copy of the acronyms page
# and put spaces between the HTML tokens and the text for the
# context and expansion/definition lines. The spaces help to simplify
# the regular expressions in the substitute commands below.
cat "../   KX /acronyms.html" | sed -e \
"/<TR>$/ {
		n 
:here
		n
		/<\\/TR>/ b
		s/<TD>\([^ ]\)/<TD> \\1/
		s?\([^ ]\)<\\/TD>?\\1 <\\/TD>?
		b here
}" >test.html
# extract a complete list of the acronyms from the web page source
cat test.html | sed -f doit.sed >list
# initialize a second list
>list2
# create a list of acronyms which are used in the definitions
# by counting the number of uses of each in the web page.
# Note that "while read A" will read the entire acronym
# including whitespace. We're counting on that because some
# acronyms contain embedded spaces.
export A
cat list | \
while read A
do
	# count the number of instances
	# to make the compare below work, we must remove the
	# spaces returned by the "wc" command
	B=`egrep -- "[][ ,.)(/-]$A[][ ,.:)('/-]" test.html | \
	   egrep -v "DOCTYPE" | wc -l | sed -e "s/ //g"`
	# if the number is not zero or acronym is not 'area'
	if [ "$B" != "0" -a "$A" != "area" ]
	then
		# append it to the list.
		echo $A >>list2
	fi
done
# a redundant save of test.html to avoid corruption while
# developing the scripts
cp test.html temp	
# For all acronyms used in definitions, substitute the 
# cross references.
cat list2 | while read A
do
	# escape embedded slashes so the substitute command works
	A=`echo $A | sed -e 's?/?\\\/?g'`
	# The sed substitute commands above and below have extra
	# escapes ("\") because the shell will do some string resolution
	# (and substitution) before passing the commands to sed.
	# We could not use a file for this sed script because
	# there was no way to pass the "$A" argument to it.
	cat temp | sed -e "
/<TD><A name=\\\"/ b
2,$ s/\\([][ ,.)(/-]\\)\\($A\\)\\([][ ,.:\')(/-]\\)/\\1<A href=\"\#\\2\">\\2<\\/A>\3/g
2,$ s/\\([][ ,.)(/-]\\)\\($A\\)s\([][ ,.:\')(/-]\\)/\\1<A href=\"\#\\2\">\\2<\\/A>\3/g" >temp2
	mv temp2 temp
done
    
	

kx Software Aesthetics


This web site is best viewed with FireFox, Chrome or Safari.

Valid XHTML 1.1 Valid CSS!