doc/utils/html2four.1


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

.TH HTML2FOUR 1 "August 1999"
.\" RCSID $Id: html2four.1,v 1.1 2004/03/15 20:35:24 as Exp $
.SH NAME
html2four - extract headers from HTML files into four-field lines
.SH SYNOPSIS
.B html2four
[-digit] file*
command [ argument ...]
.SH DESCRIPTION
.I html2four
extracts information from HTML files and writes it out with four
tab-separated fields: filename, last label (<a name=> tag) seen,
header tag type (H[0-9]), and header text. This is an intermediate
format convenient for generating a permuted index with four2perm(1)
or a table of contents with a simple awkscript.

The only option is a digit to limit the header levels extracted.
For example, with -3 only h1, h2, h3 tags are taken. By default,
it takes h[0-9], though HTML only defines levels 1 to 6.
.SH SEE ALSO
.hy 0
four2perm(1)
.SH HISTORY
Written for the Linux FreeS/WAN project
<http://www.xs4all.nl/~freeswan/>
by Sandy Harris.