January 17, 2003

Followup on LaTeX to HTML

developerWorks: Linux : Tip: Get to know your textutils. In my previous post of how to convert your LaTeX document to HTML, I talked about how to remove underlining of links in your HTML document by editing your style section. I also talked about how you can make tth specific macros which help in tagging superscript and subscript text. In this post, I'll be commenting on how to make that html document of yours just right after tth is finished with it, and automating this process.

My HTML document still wasn't cutting it after tth converted my LaTeX document to HTML. For example, my citations of my paper were bound by square brackets [] rather than parentheses. And tth annoyingly added & nbsp ; to areas in the HTML code that it thought required an actual space; however, a regular empty space would have been sufficient. My options were:
1. live with it
2. use emacs to search-and-replace the offending characters, strings, words
3. or automate the process with a sed script file
Well, needless to say, I chose the third option---I'm a sucker for automation (and less work on my part in the long run). Making a sed script file is not all that difficult. However, the resulting commands ended up looking like an alien language honestly and could have easily scare away casual users from even trying out sed. I duked it out and hacked something like this in the sed script text file:

#!/bin/sed -f
s/serif/Times New Roman, Times, serif/g
s/\[\(< a href="\#\)/(\1/g
s/\([0-9]<\/a>\)\]/\1)/g
s/\[\([0-9]*\)\]/\1./g
s/figures\/eps\_files\/\([a-z0-9\_\-]*\)"/figures\/\1.jpg"/g
s/< dl compact="compact">/\< table border=0 cellpadding=0>/g
s/< dt>/\< tr\>\< td valign=top\>/g
s/<\/dt>/\<\/td\>/g
s/< dd>/\< td\>/g
s/<\/dd>/\< p\>\<\/td\>\<\/tr\>/g
s/<\/dl>/\<\/table\>\
\
\< p\>/g
s/< hr \/>< small>[a-zA-Z0-9 ]*//g
s/T< sub>< font size="-1">[a-zA-Z0-9 ,]*//g
s/<\/font><\/sub>X//g
s/by < a href="http\:\/\/hutchinson\.belmont\.ma\.us\/tth\/">//g
s/<\/font><\/sub>H<\/a>\,//g
s/version[0-9 \.]*< br \/>[a-zA-Z0-9 \:\.,]*<\/small>//g
s/\(& nbsp;\)/ /g

Notice the first line: it tells the shell where to look for sed in my installation. Briefly, the commands take care of changing the square brackets of the citations to parentheses. It removes the tagline of the html page. And I added lines handling the changes that I mentioned in my previous post on converting LaTeX to HTML using tth. BTW, I run tth with the following options: "tth -e2 -n1 -w2 -V". -e2 tells tth to embed images rather than using links and URLs.

Running the sed script text file is not difficult. I changed ownership of the file to allow executing it (chmod 755 filename).

Running the LaTeX compilers individually each time was too tedious, it needed automation. So I turned to trusty old Makefile. I placed the following lines in my Makefile:

----------cut------------
# My makefile to automate compiling all necessary docs

TARGET = paper

all: ps pdf html php

ps: $(TARGET).tex
latex $(TARGET).tex
bibtex $(TARGET)
latex $(TARGET).tex
bibtex $(TARGET)
latex $(TARGET).tex
dvips $(TARGET).dvi -o $(TARGET).ps
ghostview $(TARGET).ps &

pdf: $(TARGET).tex
pdflatex $(TARGET).tex
bibtex $(TARGET)
pdflatex $(TARGET).tex
bibtex $(TARGET)
pdflatex $(TARGET).tex
acroread $(TARGET).pdf

html: $(TARGET).tex
tth -e2 -n1 -w2 -V $(TARGET).tex
bibtex $(TARGET)
tth -e2 -n1 -w2 -V $(TARGET).tex
bibtex $(TARGET)
tth -e2 -n1 -w2 -V $(TARGET).tex
./tthpostproc.sed $(TARGET).html > index.html

php: header.php footer.php $(TARGET).tex
tth -r -e2 -n1 -w2 -V $(TARGET).tex
bibtex $(TARGET)
tth -r -e2 -n1 -w2 -V $(TARGET).tex
bibtex $(TARGET)
tth -r -e2 -n1 -w2 -V $(TARGET).tex
cat header.php > index.php
./tthpostprocraw.sed $(TARGET).html >> index.php
cat footer.php >> index.php

tidy:
rm -f *~

clean:
rm -f *~
rm -f *.aux *.log *.toc *.lof *.lot *.bbl *.blg *.out *.brf

----------end-----------

When I want to compile, I type "make ps" at the commandline and it just works. Same goes for html "make html", etc. Just change "paper" to the name of the LaTeX document. Or if I want to compile everything, I do a "make all." Linux is nice eh? In Winblows, I'd have to write a batch file to automate compiling; however, good luck in trying to find a program for text stream editing installed by default in windows (like sed).

More on my LaTeX publishing system later.

Posted by johnvu at January 17, 2003 02:04 AM
Comments
Post a comment