Breve introduzione ad htdig
In questo testo descriveremo l'installazione e l'utilizzo basilare
di htdig dai
sorgenti. Dettagli relativi a FreeBSD, ma dovrebbe essere tutto molto simile
su un altro tipo di Unix.
Introduzione
htdig, o meglio ht://Dig è un programma in grado di
indicizzare ed effettuare ricerche in siti web, adatto per un
piccolo dominio o una intranet e viene distribuito secondo la GPL.
Gli eseguibili principali di htdig sono 3:
- htdig
- crea il database necessario alla ricerca (digging)
- htmerge
- crea gli indici di ricerca e quando si fa l'indicizzazione
incrementale fa il merge dei documenti che sono cambiati nel
database di ricerca (merging)
- htsearch
- CGI che effettua la ricerca (searching)
Download
Scaricare dal sito http://www.htdig.org l'ultima versione
stabile, ad es. htdig-3.1.6.tar.gz.
Scompattare i sorgenti
# tar zxvf htdig-3.1.6.tar.gz -C /usr/local/src
# cd /usr/local/src/htdig-3.1.6
Lettura documentazione
# less README
# lynx htdoc/install.html
Configurazione
# ./configure --help | less
# ./configure --prefix=/usr/local/htdig \
--with-cgi-bin-dir=/usr/local/apache/cgi-bin \
--with-image-dir=/usr/local/apache/htdocs/htdig \
--with-search-dir=/usr/local/apache/htdocs/htdig
Compilazione
# make
Installazione
# make install
Installing ht://Dig
Creating directories (if needed)...
mkdir /usr/local/htdig
mkdir /usr/local/htdig/bin
mkdir /usr/local/htdig/conf
mkdir /usr/local/htdig/common
mkdir /usr/local/htdig/db
mkdir /usr/local/apache/htdocs/htdig
Installing individual programs...
transform=s,x,x,
/usr/bin/install -c htfuzzy /usr/local/htdig/bin/`echo htfuzzy | sed ''`
transform=s,x,x,
/usr/bin/install -c htdig /usr/local/htdig/bin/`echo htdig | sed ''`
/usr/bin/install -c htdig /usr/local/htdig/bin/`echo htdump | sed ''`
/usr/bin/install -c htdig /usr/local/htdig/bin/`echo htload | sed ''`
transform=s,x,x,
/usr/bin/install -c htsearch /usr/local/apache/cgi-bin/`echo htsearch | sed ''`
transform=s,x,x,
/usr/bin/install -c htmerge /usr/local/htdig/bin/`echo htmerge | sed ''`
transform=s,x,x,
/usr/bin/install -c htnotify /usr/local/htdig/bin/`echo htnotify | sed ''`
Installing default configuration files...
/usr/local/htdig/conf/htdig.conf
/usr/local/apache/htdocs/htdig/search.html
/usr/local/htdig/common/header.html
/usr/local/htdig/common/footer.html
/usr/local/htdig/common/wrapper.html
/usr/local/htdig/common/nomatch.html
/usr/local/htdig/common/syntax.html
/usr/local/htdig/common/long.html
/usr/local/htdig/common/short.html
/usr/local/htdig/common/bad_words
/usr/local/htdig/common/english.0
/usr/local/htdig/common/english.aff
/usr/local/htdig/common/synonyms
Installing images...
/usr/local/apache/htdocs/htdig/button1.gif
/usr/local/apache/htdocs/htdig/button2.gif
/usr/local/apache/htdocs/htdig/button3.gif
/usr/local/apache/htdocs/htdig/button4.gif
/usr/local/apache/htdocs/htdig/button5.gif
/usr/local/apache/htdocs/htdig/button6.gif
/usr/local/apache/htdocs/htdig/button7.gif
/usr/local/apache/htdocs/htdig/button8.gif
/usr/local/apache/htdocs/htdig/button9.gif
/usr/local/apache/htdocs/htdig/buttonl.gif
/usr/local/apache/htdocs/htdig/buttonr.gif
/usr/local/apache/htdocs/htdig/button10.gif
/usr/local/apache/htdocs/htdig/htdig.gif
/usr/local/apache/htdocs/htdig/star.gif
/usr/local/apache/htdocs/htdig/star_blank.gif
/usr/local/apache/htdocs/htdig/button1.png
/usr/local/apache/htdocs/htdig/button2.png
/usr/local/apache/htdocs/htdig/button3.png
/usr/local/apache/htdocs/htdig/button4.png
/usr/local/apache/htdocs/htdig/button5.png
/usr/local/apache/htdocs/htdig/button6.png
/usr/local/apache/htdocs/htdig/button7.png
/usr/local/apache/htdocs/htdig/button8.png
/usr/local/apache/htdocs/htdig/button9.png
/usr/local/apache/htdocs/htdig/buttonl.png
/usr/local/apache/htdocs/htdig/buttonr.png
/usr/local/apache/htdocs/htdig/button10.png
/usr/local/apache/htdocs/htdig/htdig.png
/usr/local/apache/htdocs/htdig/star.png
/usr/local/apache/htdocs/htdig/star_blank.png
Creating rundig script...
Installation done.
Before you can start searching, you will need to create a
search database. A sample script to do this has been
installed as /usr/local/htdig/bin/rundig
Controllo cgi installata
# ls -l /usr/local/apache/cgi-bin/htsearch
-rwxr-xr-x 1 root wheel 1066412 May 21 18:38 /usr/local/apache/cgi-bin/htsearch
Prova di indicizzazione
Proviamo ad indicizzare la documentazione stessa di htdig, usando
lo script di esempio /usr/local/htdig/bin/rundig per indicizzare.
Installiamo il manuale di htdig sotto la document root, in modo che
sia visibile via web:
# cp -R /usr/local/src/htdig-3.1.6/htdoc /usr/local/apache/htdocs/htdig/
# links http://localhost/htdig/htdoc
Partiamo dal file di configurazione di default
/usr/local/htdig/conf/htdig.conf, piuttosto che modificarlo
scegliamo di farne una copia che ad es. chiamiamo htdoc.conf:
# cd /usr/local/htdig/conf
# cp htdig.conf htdoc.conf
e quindi la modifichiamo secondo il patch seguente:
htdig.diff
----------
18c18
< database_dir: /usr/local/htdig/db
---
> database_dir: /usr/local/htdig/db/htdoc
28c28,30
< start_url: http://www.htdig.org/
---
> local_urls: http://localhost/htdig/htdoc/=/usr/local/apache/htdocs/htdig/htdoc/
> local_urls_only: true
> start_url: http://localhost/htdig/htdoc/
# patch htdoc.conf htdig.diff
Hmm... Looks like a normal diff to me...
Patching file htdoc.conf using Plan A...
Hunk #1 succeeded at 18.
Hunk #2 succeeded at 28.
done
in questo modo l'indicizzazione avviene solo tramite il filesystem
locale e non è necessario avere attivo alcun server web per
poterla fare.
# mkdir /usr/local/htdig/db/htdoc/
# /usr/local/htdig/bin/rundig -c /usr/local/htdig/conf/htdoc.conf
Nel caso volete più dettagli, usate il comando:
# /usr/local/htdig/bin/rundig -v -c /usr/local/htdig/conf/htdoc.conf | less
l'opzione -vv aumenterebbe ulteriormente il livello di
verbosità e ancora di più -vvv.
Applichiamo poi una patch al file
http://localhost/htdig/htdoc/contents.html per poter usare la cgi
di ricerca locale invece di quella di http://www.htdig.org/:
contents.diff
-----------------
50c50
< <form action="http://www.htdig.org/cgi-bin/htsearch" target=body>
---
> <form action="http://localhost/cgi-bin/htsearch" target=body>
55c55
< <input type=hidden name=config value=htdig>
---
> <input type=hidden name=config value=htdoc>
# cd /usr/local/apache/htdocs/htdig/htdoc/
# patch contents.html contents.diff
Hmm... Looks like a normal diff to me...
Patching file contents.html using Plan A...
Hunk #1 succeeded at 50.
Hunk #2 succeeded at 55.
done
facciamo lo stesso per l'altro form di ricerca, nella pagina:
http://ninux.rett.polimi.it/htdig/htdoc/main.html
main.diff
---------
73c73
< <form action="http://cgi.htdig.org/cgi-bin/htsearch" method="post">
---
> <form action="http://localhost/cgi-bin/htsearch" method="post">
76c76
<
<input type="hidden" name="config" value="htdig">
---
>
<input type="hidden" name="config" value="htdoc">
# cd /usr/local/apache/htdocs/htdig/htdoc/
# patch main.html main.diff
Hmm... Looks like a normal diff to me...
Patching file main.html using Plan A...
Hunk #1 succeeded at 73.
Hunk #2 succeeded at 76.
done
Infine provate a fare delle ricerche, richiamando uno di questi
indirizzi tramite un browser web locale (o sostituite il nome host
della vostra macchina se usate un browser remoto):
http://localhost/htdig/htdoc/
http://localhost/htdig/search.html
I file di database:
# ls -l /usr/local/htdig/db/htdoc
total 1254
-rw-r--r-- 1 root wheel 207872 Jun 5 16:48 db.docdb
-rw-r--r-- 1 root wheel 6144 Jun 5 16:48 db.docs.index
-rw-r--r-- 1 root wheel 421690 Jun 5 16:48 db.wordlist
-rw-r--r-- 1 root wheel 588800 Jun 5 16:48 db.words.db
Lavorare con più database
Ora indicizzeremo altra documentazione locale, facendo un database
ed un file di configurazione separato per ciascuna documentazione.
Ad es. possiamo indicizzare la documentazion di PHP, che sul mio
sistema si trova visibile via web, in
/usr/local/apache/htdocs/doc/php_manual_en. Copiamo il file di
configurazione htdoc.conf in php.conf:
# cd /usr/local/htdig/conf
# cp htdoc.conf php.conf
quindi modifichiamolo secondo il seguente patch:
htdoc.diff
----------
18c18
< database_dir: /usr/local/htdig/db/htdoc
---
> database_dir: /usr/local/htdig/db/php
28c28
< local_urls: http://localhost/htdig/htdoc/=/usr/local/apache/htdocs/htdig/htdoc/
---
> local_urls: http://localhost/doc/php_manual_en/=/usr/local/apache/htdocs/doc/php_manual_en/
30c30
< start_url: http://localhost/htdig/htdoc/
---
> start_url: http://localhost/doc/php_manual_en/
# patch php.conf htdoc.diff
Hmm... Looks like a normal diff to me...
Patching file php.conf using Plan A...
Hunk #1 succeeded at 18.
Hunk #2 succeeded at 28.
Hunk #3 succeeded at 30.
done
creiamo poi la directory atta a contenere i db ed avviamo
l'indicizzazione:
# mkdir /usr/local/htdig/db/php
# /usr/local/htdig/bin/rundig -v -c /usr/local/htdig/conf/php.conf | less
Prepariamo una maschera HTML per la ricerca, con un controllo
di selezione singola sulla documentazione da ricercare, ad es.
modificando il file search.html di default fornito con htdig:
search.diff
-----------
33a34,37
> Manual to search: <select name="config">
> <option>htdoc</option>
> <option>php</option>
> </select>
35d38
< <input type="hidden" name="config" value="htdig">
# cp search.html search.html.orig
# patch search.html search.diff
Hmm... Looks like a normal diff to me...
Patching file search.html using Plan A...
Hunk #1 succeeded at 34.
Hunk #2 succeeded at 39.
done
Andare poi all'url: http://localhost/htdig/search.html
Il procedimento può essere esteso per ogni manuale che si
vuole indicizzare.