forked from SWI-Prolog/packages-http
-
Notifications
You must be signed in to change notification settings - Fork 0
/
http.doc
1702 lines (1370 loc) · 63.1 KB
/
http.doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt]{article}
\usepackage{times}
\usepackage{pl}
\usepackage{plpage}
\usepackage{html}
\makeindex
\onefile
\htmloutput{.} % Output directory
\htmlmainfile{http} % Main document file
\bodycolor{white} % Page colour
\sloppy
\renewcommand{\runningtitle}{SWI-Prolog HTTP support}
\begin{document}
\title{SWI-Prolog HTTP support}
\author{Jan Wielemaker \\
VU University Amsterdam \\
University of Amsterdam \\
The Netherlands \\
E-mail: \email{[email protected]}}
\maketitle
\begin{abstract}
This article documents the package HTTP, a series of libraries for
accessing data on HTTP servers as well as providing HTTP server
capabilities from SWI-Prolog. Both server and client are modular
libraries. Further reading material is available from the locations
below.
\begin{shortlist}
\item \href{http://www.swi-prolog.org/howto/http/}{HOWTO collection}
\item \href{http://www.pathwayslms.com/swipltuts/html}{Tutorial by Anne Ogborn}
\end{shortlist}
\end{abstract}
\vfill
\pagebreak
\tableofcontents
\vfill
\vfill
\newpage
\section{Introduction}
\label{sec:http-intro}
The HTTP (HyperText Transfer Protocol) is the W3C standard protocol for
transferring information between a web-client (e.g., a browser) and a
web-server. The protocol is a simple \emph{envelope} protocol where
standard name/value pairs in the header are used to split the stream
into messages and communicate about the connection-status. Many
languages have client and or server libraries to deal with the HTTP
protocol, making it a suitable candidate for general purpose
client-server applications.
In this document we describe a modular infra-structure to access
web-servers from SWI-Prolog and turn Prolog into a web-server.
\subsection*{Acknowledgements}
\label{sec:http-acknowledgements}
This work has been carried out under the following projects:
\href{http://hcs.science.uva.nl/projects/GARP/}{GARP},
\href{http://www.ins.cwi.nl/projects/MIA/}{MIA},
\href{http://hcs.science.uva.nl/projects/ibrow/home.html}{IBROW},
\href{http://kits.edte.utwente.nl/}{KITS} and
\href{http://e-culture.multimedian.nl/}{MultiMediaN}
The following people have pioneered parts of this library and
contributed with bug-report and suggestions for improvements: Anjo
Anjewierden, Bert Bredeweg, Wouter Jansweijer, Bob Wielinga, Jacco
van Ossenbruggen, Michiel Hildebrandt, Matt Lilley and Keri Harris.
Path wildcarts (see http_handler/3) have been modelled after the
`arouter` package by Raivo Laanemets. Request rewriting has been
added after discussion with Raivo Laanemets and Anne Ogborn on the
SWI-Prolog mailinglist.
\section{The HTTP client libraries}
\label{sec:http-clients}
This package provides two client libraries for accessing HTTP servers.
The first, \pllib{http/http_open} is a library for opening a HTTP URL
address as a Prolog stream. The general skeleton for using this library
is given below, where \nopredref{process}{1} processes the data from the
HTTP server.\footnote{One may opt to use cleanup/2 intead of
setup_call_cleanup/3 to allow for aborting while http_open/3 is waiting
for the connection.}
\begin{code}
setup_call_cleanup(
http_open(URL, In, []),
process(In),
close(In)).
\end{code}
The second, \pllib{http/http_client} provides http_get/3 and
http_post/4, both of which process the reply using plugins to convert
the data based on the \texttt{Content-Type} of the reply. This library
supports a plugin infrastructure that can register hooks for converting
additional document types.
\input{httpopen.tex}
\input{httpclient.tex}
\section{The HTTP server libraries} \label{sec:httpserver}
The HTTP server infra structure consists of a number of small modular
libraries that are combined into \exam{library(http/http_server)}. These
modules are:
\begin{description}
\definition{\pllib{http/thread_httpd}}
This library is responsible for accepting and managing
connections.\footnote{In older versions there were two alternative
libraries for managing connections based on XPCE and Unix inetd.}
\definition{\pllib{http/http_dyn_workers}}
This library dynamically adds and removes workers based on the workload
of the server.
\definition{\pllib{http/http_wrapper}}
This library takes a connection, parses the HTTP request header and runs
a goal that produces a CGI document based on the parsed request. It
watches for exceptions and turns these into (error) status pages. The
status page generation may be hooked to provide custom pages.
\definition{\pllib{http/http_dispatch}}
This library associates the \jargon{path} of the HTTP request with a
\jargon{handler} that services this particular path. It also manages
timeouts and may pass the execution of a request to a dedicated thread
with specified resource limits using http_spawn/2. The module supports
plugable request rewrite handlers that may be used to implement
identification, authorization, input argument processing, etc.
\definition{\pllib{http/http_parameters}}
This library parses HTTP request parameters, both dealing with GET
and POST style parameter passing.
\definition{\pllib{http/html_write}}
This library translates a Prolog term into an HTML document using Prolog
\jargon{grammar rules} (DCG). It provides a modular infrastructure to
build pages that are guaranteed to be valid HTML. The HTTP server
libraries provide several alternatives for generating HTML ranging
from simple printing to \const{current_output} to XML-based templates
(PWP).
\definition{\pllib{http/http_json}}
This library parses a POSTed HTTP document into a Prolog dict and
formulates an HTTP JSON reply from a Prolog dict and is typically
used to implement REST services.
\end{description}
Most server implementation simply load the \pllib{http/http_server}
library, which loads the above modules and reexports all predicates
except for those used for internal communication and older deprecated
predicates. Specific use cases may load a subset of the individual
libraries and may decide to replace one or more of them.
A typical skeleton for building a server is given below. If this file is
loaded as main file (using e.g., \exam{swipl server.pl}) it creates a
simple server that listens on port 8080. If the root is accessed it
redirects to the home page and shows \textbf{Hello world!}.
\begin{code}
:- use_module(library(http/http_server)).
:- initialization
http_server([port(8080)]).
:- http_handler(root(.),
http_redirect(moved, location_by_id(home_page)),
[]).
:- http_handler(root(home), home_page, []).
home_page(_Request) :-
reply_html_page(
title('Demo server'),
[ h1('Hello world!')
]).
\end{code}
\subsection{Creating an HTTP reply} \label{sec:html-body}
The \jargon{handler} (e.g., \nopredref{home_page}{1} above) is called
with the parsed request (see \secref{request}) as argument and
\const{current_output} set to a temporary buffer. Its task is closely
related to the task of a CGI script; it must write a header declaring at
least the \const{Content-type} field and a body. Below is a simple body
writing the request as an HTML table.\footnote{Note that writing an HTML
reply this way is deprecated. In fact, the code is subject to
\jargon{injection attacks} as the HTTP request field values are
literally injected in the output while HTML reserved characters should
be properly escaped.}
\begin{code}
reply(Request) :-
format('Content-type: text/html~n~n', []),
format('<html>~n', []),
format('<table border=1>~n'),
print_request(Request),
format('~n</table>~n'),
format('</html>~n', []).
print_request([]).
print_request([H|T]) :-
H =.. [Name, Value],
format('<tr><td>~w<td>~w~n', [Name, Value]),
print_request(T).
\end{code}
The infrastructure recognises the header fields described below. Other
header lines are passed verbatim to the client. Typical examples are
\texttt{Set-Cookie} and authentication headers (see \secref{httpauthenticate}).
\begin{description}
\item[Content-type:~\arg{Type}]
This field is passed to the client and used by the infrastructure to
determine the \jargon{encoding} to use for the stream. If \arg{type}
matches \const{text/*} or the type matches with \const{UTF-8} (case
insensitive), the server uses UTF-8 encoding. The user may force UTF-8
encoding for arbitrary content types by adding \texttt{; charset=UTF-8}
to the end of the \const{Content-type} header.
\item[Transfer-encoding:~chunked]
Causes the server to use \jargon{chunked} encoding
if the client allows for it. See also \secref{transfer} and the
\const{chunked} option in http_handler/3.
\item[Connection:~close]
Causes the connection to be closed after the transfer. The default is
to keep it open `Keep-Alive' if possible.
\item[Location:~\arg{URL}]
This header may be combined with the \const{Status} header to force a
\jargon{redirect} response to the given \arg{URL}. The message body
must be empty. Handling this header is primarily intended for
compatibility with the CGI conventions. Prolog code should use
http_redirect/3.
\item[Status:~\arg{Status}]
This header can be combined with \const{Location}, where \arg{Status}
must be one of 301 (moved), 302 (moved temporary, default) or 303 (see
other). Using the status field also allows for formulating replies such
as 201 (created).
\end{description}
Note that the handler may send any type of document instead of HTML.
After the header has been written, the \jargon{encoding} of the
\const{current_output} stream encoding is established as follows:
\begin{enumerate}
\item If the content type is \const{text/*} the stream is switched
to UTF-8 encoding. If the content type does not provide
attributes, \verb$; charset=UTF-8$ is added.
\item The content type contains \verb$UTF-8$ the stream is switched
to UTF-8 encoding.
\item http:mime_type_encoding/2 succeeds the returned encoding is
used. The returned encoding must be valid for set_stream/2.
\item If the content type matches a list of known encodings, this
is used. See mime_type_encoding/2 is \file{http_header}.
The current list deals with JSON, Turtle and SPARQL.
\item Otherwise the stream uses octed (binary) encoding.
\end{enumerate}
\subsubsection{Returning special status codes} \label{sec:httpspecials}
Besides returning a page by writing it to the current output stream,
the server goal can raise an exception using throw/1 to generate special
pages such as \const{not_found}, \const{moved}, etc. The defined
exceptions are:
\begin{description}
\termitem{http_reply}{+Reply, +HdrExtra, +Context}
Return a result page using http_reply/3. See http_reply/3 for
supported values for Reply and \secref{http-custom-error-page}
for providing a custom error page.
\termitem{http_reply}{+Reply, +HdrExtra}
Return a result page using http_reply/3. Equivalent to
\term{http_reply}{Reply, HdrExtra, []}.
\termitem{http_reply}{+Reply}
Equivalent to \term{http_reply}{Reply, [], []}.
\termitem{http}{not_modified}
Equivalent to \term{http_reply}{not_modified, []}. This exception is
for backward compatibility and can be used by the server to indicate
the referenced resource has not been modified since it was requested
last time.
\end{description}
In addition, the normal \verb$"200 OK"$ reply status may be overruled by
writing a CGI \const{Status} header prior to the remainder of the
message. This is particularly useful for defining REST APIs. The
following handler replies with a \verb$"201 Created"$ header:
\begin{code}
handle_request(Request) :-
process_data(Request, Id), % application predicate
format('Status: 201~n'),
format('Content-type: text/plain~n~n'),
format('Created object as ~q~n', [Id]).
\end{code}
\input{httpdispatch.tex}
\input{httpdirindex.tex}
\input{httpfiles.tex}
\input{httpsession.tex}
\input{httpcors.tex}
\input{httpauthenticate.tex}
\input{httpdigest.tex}
\input{httpdynworkers.tex}
\subsection{Custom Error Pages}
\label{sec:http-custom-error-page}
It is possible to create arbitrary error pages for responses generated
when a http_reply term is thrown. Currently this is only supported for
status 403 (\textit{authentication required}). To do this, instead of
throwing \term{http_reply}{authorise(Term)} throw
\term{http_reply}{authorise(Term), [], Key}, where \arg{Key} is an
arbitrary term relating to the page you want to generate. You must then
also define a clause of the multifile predicate http:status_page_hook/3:
\begin{description}
\predicate{http:status_page_hook}{3}{+TermOrCode, +Context, -CustomHTML}
TermOrCode is either the first argument of the \const{http_reply}
exception or the HTTP status code, i.e., the hook is called twice. New
code should using the \arg{Term}. Context is the third argument of the
http_reply exception which was thrown, and CustomHTML is a list of HTML
tokens. A page equivalent to the default page for 401 is generated
by the example below.
\begin{code}
:- multifile http:status_page_hook/3.
http:status_page_hook(authorise(Term), _Context, HTML) :-
phrase(page([ title('401 Authorization Required')
],
[ h1('Authorization Required'),
p(['This server could not verify that you ',
'are authorized to access the document ',
'requested. Either you supplied the wrong ',
'credentials (e.g., bad password), or your ',
'browser doesn\'t understand how to supply ',
'the credentials required.'
]),
\address
]),
HTML).
\end{code}
\end{description}
\input{httpopenid.tex}
%================================================================
\subsection{Get parameters from HTML forms}
\label{sec:httpparam}
The library \pllib{http/http_parameters} provides two predicates to
fetch HTTP request parameters as a type-checked list easily. The
library transparently handles both GET and POST requests. It builds
on top of the low-level request representation described in
\secref{request}.
\begin{description}
\predicate{http_parameters}{2}{+Request, ?Parameters}
The predicate is passes the \arg{Request} as provided to the handler
goal by http_wrapper/5 as well as a partially instantiated lists
describing the requested parameters and their types. Each parameter
specification in \arg{Parameters} is a term of the format
\mbox{\arg{Name}(\arg{-Value}, \arg{+Options})}. \arg{Options} is a list
of option terms describing the type, default, etc. If no options are
specified the parameter must be present and its value is returned in
\arg{Value} as an atom.
If a parameter is missing the exception
\term{error}{\term{existence_error}{http_parameter, Name}, _} is thrown
which. If the argument cannot be converted to the requested type, a
\term{error}{\term{existence_error}{Type, Value}, _} is raised, where
the error context indicates the HTTP parameter. If not caught, the
server translates both errors into a \texttt{400 Bad request} HTTP
message.
Options fall into three categories: those that handle presence of
the parameter, those that guide conversion and restrict types and
those that support automatic generation of documention. First,
the presence-options:
\begin{description}
\termitem{default}{Default}
If the named parameter is missing, \arg{Value} is unified to
\arg{Default}.
\termitem{optional}{true}
If the named parameter is missing, \arg{Value} is left unbound and
no error is generated.
\termitem{list}{Type}
The same parameter may not appear or appear multiple times. If this
option is present, \const{default} and \const{optional} are ignored and
the value is returned as a list. Type checking options are processed on
each value.
\termitem{zero_or_more}{}
Deprecated. Use \term{list}{Type}.
\end{description}
The type and conversion options are given below. The type-language can
be extended by providing clauses for the multifile hook
http:convert_parameter/3.
\begin{description}
\termitem{;}{Type1, Type2}
Succeed if either \arg{Type1} or \arg{Type2} applies. It allows
for checks such as \exam{(nonneg;oneof([infinite]))} to specify
an integer or a symbolic value.
\termitem{oneof}{List}
Succeeds if the value is member of the given list.
\definition{length $> N$}
Succeeds if value is an atom of more than $N$ characters.
\definition{length $>= N$}
Succeeds if value is an atom of more than or equal to $N$ characters.
\definition{length $< N$}
Succeeds if value is an atom of less than $N$ characters.
\definition{length $=< N$}
Succeeds if value is an atom of length less than or equal to $N$ characters.
\termitem{atom}{}
No-op. Allowed for consistency.
\termitem{string}{}
Convert value to a string.
\termitem{between}{+Low, +High}
Convert value to a number and if either \arg{Low} or \arg{High} is a
float, force value to be a float. Then check that the value is in the
given range, which includes the boundaries.
\termitem{boolean}{}
Translate =true=, =yes=, =on= and '1' into =true=; =false=, =no=,
=off= and '0' into =false= and raises an error otherwise.
\termitem{float}{}
Convert value to a float. Integers are transformed into float. Throws a
type-error otherwise.
\termitem{integer}{}
Convert value to an integer. Throws a type-error otherwise.
\termitem{nonneg}{}
Convert value to a non-negative integer. Throws a type-error
of the value cannot be converted to an integer and a domain-error
otherwise.
\termitem{number}{}
Convert value to a number. Throws a type-error otherwise.
\end{description}
The last set of options is to support automatic generation of HTTP
API documentation from the sources.\footnote{This facility is under
development in ClioPatria; see \file{http_help.pl}}.
\begin{description}
\termitem{description}{+Atom}
Description of the parameter in plain text.
\termitem{group}{+Parameters, +Options}
Define a logical group of parameters. \arg{Parameters} are processed
as normal. \arg{Options} may include a description of the group. Groups
can be nested.
\end{description}
Below is an example
\begin{code}
reply(Request) :-
http_parameters(Request,
[ title(Title, [ optional(true) ]),
name(Name, [ length >= 2 ]),
age(Age, [ between(0, 150) ])
]),
...
\end{code}
Same as \term{http_parameters}{Request, Parameters, []}
\predicate{http_parameters}{3}{+Request, ?Parameters, +Options}
In addition to http_parameters/2, the following options are defined.
\begin{description}
\termitem{form_data}{-Data}
Return the entire set of provided \arg{Name}=\arg{Value} pairs from
the GET or POST request. All values are returned as atoms.
\termitem{attribute_declarations}{:Goal}
If a parameter specification lacks the parameter options, call
\term{call}{Goal, +ParamName, -Options} to find the options. Intended
to share declarations over many calls to http_parameters/3. Using
this construct the above can be written as below.
\begin{code}
reply(Request) :-
http_parameters(Request,
[ title(Title),
name(Name),
age(Age)
],
[ attribute_declarations(param)
]),
...
param(title, [optional(true)]).
param(name, [length >= 2 ]).
param(age, [integer]).
\end{code}
\end{description}
\end{description}
\subsection{Request format} \label{sec:request}
The body-code (see \secref{html-body}) is driven by a \arg{Request}.
This request is generated from http_read_request/2 defined in
\pllib{http/http_header}.
\begin{description}
\predicate{http_read_request}{2}{+Stream, -Request}
Reads an HTTP request from \arg{Stream} and unify \arg{Request} with
the parsed request. \arg{Request} is a list of \term{\arg{Name}}{Value}
elements. It provides a number of predefined elements for the result
of parsing the first line of the request, followed by the additional
request parameters. The predefined fields are:
\begin{description}
\termitem{host}{Host}
If the request contains \verb$Host: $\arg{Host}, Host is unified
with the host-name. If \arg{Host} is of the format <host>:<port>
\arg{Host} only describes <host> and a field \term{port}{Port} where
\arg{Port} is an integer is added.
\termitem{input}{Stream}
The \arg{Stream} is passed along, allowing to read more data or
requests from the same stream. This field is always present.
\termitem{method}{Method}
\arg{Method} is the HTTP \jargon{method} represented as a lower-case
atom (i.e., \const{delete}, \const{get}, \const{head},
\const{options}, \const{patch}, \const{post}, \const{put},
\const{trace}). This field is present if the header has been parsed
successfully.
\termitem{path}{Path}
Path associated to the request. This field is always present.
\termitem{peer}{Peer}
\arg{Peer} is a term \term{ip}{A,B,C,D} containing the IP address of
the contacting host.
\termitem{port}{Port}
Port requested. See \const{host} for details.
\termitem{request_uri}{RequestURI}
This is the untranslated string that follows the method in the
request header. It is used to construct the path and search fields
of the \arg{Request}. It is provided because reconstructing this
string from the path and search fields may yield a different value
due to different usage of percent encoding.
\termitem{search}{ListOfNameValue}
Search-specification of URI. This is the part after the \chr{?},
normally used to transfer data from HTML forms that use the
HTTP GET method. In the URL it consists of a www-form-encoded
list of \arg{Name}=\arg{Value} pairs. This is mapped to a list of
Prolog \arg{Name}=\arg{Value} terms with decoded names and values.
This field is only present if the location contains a
search-specification.
The URL specification does not \emph{demand} the query part to be
of the form \textit{name=value}. If the field is syntactically
incorrect, ListOfNameValue is bound the the empty list ([]).
\termitem{http_version}{Major-Minor}
If the first line contains the \const{HTTP/}\arg{Major}.\arg{Minor}
version indicator this element indicate the HTTP version of the
peer. Otherwise this field is not present.
\termitem{cookie}{ListOfNameValue}
If the header contains a \const{Cookie} line, the value of the
cookie is broken down in \arg{Name}=\arg{Value} pairs, where the
\arg{Name} is the lowercase version of the cookie name as used
for the HTTP fields.
\termitem{set_cookie}{set_cookie(Name, Value, Options)}
If the header contains a \const{SetCookie} line, the cookie field
is broken down into the \arg{Name} of the cookie, the \arg{Value}
and a list of \arg{Name}=\arg{Value} pairs for additional options
such as \const{expire}, \const{path}, \const{domain} or \const{secure}.
\end{description}
If the first line of the request is tagged with
\const{HTTP/}\arg{Major}.\arg{Minor}, http_read_request/2 reads all
input upto the first blank line. This header consists of
\arg{Name}:\arg{Value} fields. Each such field appears as a term
\term{\arg{Name}}{Value} in the \arg{Request}, where \arg{Name} is
canonicalised for use with Prolog. Canonisation implies that the
\arg{Name} is converted to lower case and all occurrences of the
\chr{-} are replaced by \chr{_}. The value for the
\const{Content-length} fields is translated into an integer.
\end{description}
Here is an example:
\begin{code}
?- http_read_request(user_input, X).
|: GET /mydb?class=person HTTP/1.0
|: Host: gollem
|:
X = [ input(user),
method(get),
search([ class = person
]),
path('/mydb'),
http_version(1-0),
host(gollem)
].
\end{code}
\subsubsection{Handling POST requests}
\label{sec:http-read-post}
Where the HTTP \const{GET} operation is intended to get a document,
using a \arg{path} and possibly some additional search information,
the \const{POST} operation is intended to hand potentially large
amounts of data to the server for processing.
The \arg{Request} parameter above contains the term \term{method}{post}.
The data posted is left on the input stream that is available through
the term \term{input}{Stream} from the \arg{Request} header. This data
can be read using http_read_data/3 from the HTTP client library. Here is
a demo implementation simply returning the parsed posted data as plain
text (assuming pp/1 pretty-prints the data).
\begin{code}
reply(Request) :-
member(method(post), Request), !,
http_read_data(Request, Data, []),
format('Content-type: text/plain~n~n', []),
pp(Data).
\end{code}
If the POST is initiated from a browser, content-type is generally
either \const{application/x-www-form-urlencoded} or
\const{multipart/form-data}.
\subsection{Running the server}
\label{sec:http-running-server}
The functionality of the server should be defined in one Prolog file (of
course this file is allowed to load other files). Depending on the
wanted server setup this `body' is wrapped into a small Prolog file
combining the body with the appropriate server interface. There are
three supported server-setups. For most applications we advice the
multi-threaded server. Examples of this server architecture are the
\href{http://www.swi-prolog.org/packages/pldoc.html}{PlDoc} documentation
system and the \href{http://www.swi-prolog.org/packages/SeRQL/}{SeRQL}
Semantic Web server infrastructure.
All the server setups may be wrapped in a \jargon{reverse proxy} to
make them available from the public web-server as described in
\secref{proxy}.
\begin{itemlist}
\item [Using \pllib{thread_httpd} for a multi-threaded server]
This server exploits the multi-threaded version of SWI-Prolog, running
the users body code parallel from a pool of worker threads. As it avoids
the state engine and copying required in the event-driven server it is
generally faster and capable to handle multiple requests concurrently.
This server is harder to debug due to the involved threading, although
the GUI tracer provides reasonable support for multi-threaded
applications using the tspy/1 command. It can provide fast communication
to multiple clients and can be used for more demanding servers.
\item [Using \pllib{inetd_httpd} for server-per-client]
In this setup the Unix \program{inetd} user-daemon is used to initialise
a server for each connection. This approach is especially suitable for
servers that have a limited startup-time. In this setup a crashing
client does not influence other requests.
This server is very hard to debug as the server is not connected to the
user environment. It provides a robust implementation for servers that
can be started quickly.
\end{itemlist}
\subsubsection{Common server interface options}
\label{sec:http-server-options}
All the server interfaces provide \term{http_server}{:Goal, +Options}
to create the server. The list of options differ, but the servers share
common options:
\begin{description}
\termitem{port}{?Port}
Specify the port to listen to for stand-alone servers. \arg{Port} is
either an integer or unbound. If unbound, it is unified to the selected
free port.
\end{description}
\subsubsection{Multi-threaded Prolog} \label{sec:mthttpd}
The \pllib{http/thread_httpd.pl} provides the infrastructure to manage
multiple clients using a pool of \jargon{worker-threads}. This realises
a popular server design, also seen in Java Tomcat and Microsoft .NET.
As a single persistent server process maintains communication to all
clients startup time is not an important issue and the server can
easily maintain state-information for all clients.
In addition to the functionality provided by the inetd server, the
threaded server can also be used to realise an HTTPS server exploiting
the \pllib{ssl} library. See option \term{ssl}{+SSLOptions} below.
\begin{description}
\predicate{http_server}{3}{:Goal, +Options}
Create the server. \arg{Options} must provide the \term{port}{?Port}
option to specify the port the server should listen to. If \arg{Port} is
unbound an arbitrary free port is selected and \arg{Port} is unified to
this port-number. The server consists of a small Prolog thread
accepting new connection on \arg{Port} and dispatching these to a pool
of workers. Defined \arg{Options} are:
\begin{description}
\termitem{port}{?Address}
Address to bind to. \arg{Address} is either a port (integer) or a term
\arg{Host}:\arg{Port}. The port may be a variable, causing the system to
select a free port and unify the variable with the selected port. See
also tcp_bind/2.
\termitem{workers}{+N}
Defines the number of worker threads in the pool. Default is to use
\arg{five} workers. Choosing the optimal value for best performance is a
difficult task depending on the number of CPUs in your system and how
much resources are required for processing a request. Too high numbers
makes your system switch too often between threads or even swap if there
is not enough memory to keep all threads in memory, while a too low
number causes clients to wait unnecessary for other clients to complete.
See also http_workers/2.
\termitem{timeout}{+SecondsOrInfinite}
Determines the maximum period of inactivity handling a request. If no
data arrives within the specified time since the last data arrived, the
connection raises an exception, and the worker discards the client and
returns to the pool-queue for a new client. If it is \const{infinite},
a worker may wait forever on a client that doesn't complete its
request. Default is 60~seconds.
\termitem{keep_alive_timeout}{+SecondsOrInfinite}
Maximum time to wait for new activity on \emph{Keep-Alive} connections.
Choosing the correct value for this parameter is hard. Disabling
Keep-Alive is bad for performance if the clients request multiple
documents for a single page. This may ---for example-- be caused by HTML
frames, HTML pages with images, associated CSS files, etc. Keeping
a connection open in the threaded model however prevents the thread
servicing the client servicing other clients. The default is 2 seconds.
\termitem{local}{+KBytes}
Size of the local-stack for the workers. Default is taken from the
commandline option.
\termitem{global}{+KBytes}
Size of the global-stack for the workers. Default is taken from the
commandline option.
\termitem{trail}{+KBytes}
Size of the trail-stack for the workers. Default is taken from the
commandline option.
\termitem{ssl}{+SSLOptions}
Use SSL (Secure Socket Layer) rather than plain TCP/IP. A server created
this way is accessed using the \const{https://} protocol. SSL allows for
encrypted communication to avoid others from tapping the wire as well as
improved authentication of client and server. The \arg{SSLOptions}
option list is passed to ssl_context/3. The port option of the main option
list is forwarded to the SSL~layer. See the \pllib{ssl} library for
details.
\end{description}
\predicate{http_server_property}{2}{?Port, ?Property}
True if \arg{Property} is a property of the HTTP server running at
\arg{Port}. Defined properties are:
\begin{description}
\termitem{goal}{:Goal}
Goal used to start the server. This is often http_dispatch/1.
\termitem{scheme}{-Scheme}
Scheme is one of \const{http} or \const{https}.
\termitem{start_time}{-Time}
Time-stamp when the server was created. See format_time/3 for
creating a human-readable representation.
\end{description}
\predicate{http_workers}{2}{+Port, ?Workers}
Query or manipulate the number of workers of the server identified by
\arg{Port}. If \arg{Workers} is unbound it is unified with the number
of running servers. If it is an integer greater than the current size
of the worker pool new workers are created with the same specification
as the running workers. If the number is less than the current size
of the worker pool, this predicate inserts a number of `quit' requests
in the queue, discarding the excess workers as they finish their jobs
(i.e.\ no worker is abandoned while serving a client).
This can be used to tune the number of workers for performance. Another
possible application is to reduce the pool to one worker to facilitate
easier debugging.
\predicate{http_add_worker}{2}{+Port, +Options}
Add a new worker to the HTTP server for port \arg{Port}. \arg{Options}
overrule the default queue options. The following additional options are
processed:
\begin{description}
\termitem{max_idle_time}{+Seconds}
The created worker will automatically terminate if there is no new work
within Seconds.
\end{description}
\predicate{http_stop_server}{2}{+Port, +Options}
Stop the HTTP server at Port. Halting a server is done
\textit{gracefully}, which means that requests being processed are not
abandoned. The \arg{Options} list is for future refinements of this
predicate such as a forced immediate abort of the server, but is
currently ignored.
\predicate{http_current_worker}{2}{?Port, ?ThreadID}
True if \arg{ThreadID} is the identifier of a Prolog thread serving
\arg{Port}. This predicate is motivated to allow for the use of
arbitrary interaction with the worker thread for development and
statistics.
\predicate{http_spawn}{2}{:Goal, +Spec}
Continue handling this request in a new thread running \arg{Goal}. After
http_spawn/2, the worker returns to the pool to process new requests. In
its simplest form, \arg{Spec} is the name of a thread pool as defined by
thread_pool_create/3. Alternatively it is an option list, whose options
are passed to thread_create_in_pool/4 if \arg{Spec} contains
\term{pool}{Pool} or to thread_create/3 of the pool option is not
present. If the dispatch module is used (see \secref{httpdispatch}),
spawning is normally specified as an option to the http_handler/3
registration.
We recomment the use of thread pools. They allow registration of a set
of threads using common characteristics, specify how many can be active
and what to do if all threads are active. A typical application may
define a small pool of threads with large stacks for computation
intensive tasks, and a large pool of threads with small stacks to serve
media. The declaration could be the one below, allowing for max 3
concurrent solvers and a maximum backlog of 5 and 30 tasks creating
image thumbnails.
\begin{code}
:- use_module(library(thread_pool)).
:- thread_pool_create(compute, 3,
[ local(20000), global(100000), trail(50000),
backlog(5)
]).
:- thread_pool_create(media, 30,
[ local(100), global(100), trail(100),
backlog(100)
]).
:- http_handler('/solve', solve, [spawn(compute)]).
:- http_handler('/thumbnail', thumbnail, [spawn(media)]).
\end{code}
\end{description}
\InputIfFileExists{httpunixdaemon.tex}{}{}
\subsubsection{From (Unix) inetd}
\label{sec:http-inetd}
All modern Unix systems handle a large number of the services they run
through the super-server \emph{inetd}. This program reads
\file{/etc/inetd.conf} and opens server-sockets on all ports defined in
this file. As a request comes in it accepts it and starts the associated
server such that standard I/O refers to the socket. This approach has
several advantages:
\begin{itemlist}
\item [Simplification of servers]
Servers don't have to know about sockets and -operations.
\item [Centralised authorisation]
Using \emph{tcpwrappers} simple and effective firewalling of all
services is realised.
\item [Automatic start and monitor]
The inetd automatically starts the server `just-in-time' and starts
additional servers or restarts a crashed server according to the
specifications.
\end{itemlist}
The very small generic script for handling inetd based connections
is in \file{inetd_httpd}, defining http_server/1:
\begin{description}
\predicate{http_server}{2}{:Goal, +Options}
Initialises and runs http_wrapper/5 in a loop until failure or
end-of-file. This server does not support the \arg{Port} option
as the port is specified with the \program{inetd} configuration.
The only supported option is \arg{After}.
\end{description}
Here is the example from \file{demo_inetd}
\begin{code}
#!/usr/bin/pl -t main -q -f
:- use_module(demo_body).
:- use_module(inetd_httpd).
main :-
http_server(reply).
\end{code}
With the above file installed in \file{/home/jan/plhttp/demo_inetd},
the following line in \file{/etc/inetd} enables the server at port
4001 guarded by \emph{tcpwrappers}. After modifying inetd, send the
daemon the \const{HUP} signal to make it reload its configuration.
For more information, please check \manref{inetd.conf}{5}.
\begin{code}
4001 stream tcp nowait nobody /usr/sbin/tcpd /home/jan/plhttp/demo_inetd
\end{code}
\subsubsection{MS-Windows}
\label{sec:http-inetd-mswin}
There are rumours that \emph{inetd} has been ported to Windows.
\subsubsection{As CGI script}
\label{sec:http-server-as-cgi}
To be done.
\subsubsection{Using a reverse proxy}
\label{sec:proxy}
There are several options for public deployment of a web service. The
main decision is whether to run it on a standard port (port 80 for HTTP,
port 443 for HTTPS) or a non-standard port such as for example 8000 or
8080. Using a standard port below 1000 requires root access to the
machine, and prevents other web services from using the same port. On
the other hand, using a non-standard port may cause problems with
intermediate proxy- and/or firewall policies that may block the port
when you try to access the service from some networks. In both cases,
you can either use a physical or a virtual machine running ---for
example--- under \href{http://www.vmware.com}{VMWARE} or
\href{http://www.cl.cam.ac.uk/research/srg/netos/xen/}{XEN} to host the
service. Using a dedicated (physical or virtual) machine to host a
service isolates security threats. Isolation can also be achieved using
a Unix \jargon{chroot} environment, which is however not a security
feature.
To make several different web services reachable on the same (either
standard or non-standard) port, you can use a so-called \jargon{reverse
proxy}. A reverse proxy uses rules to relay requests to other web
services that use their own dedicated ports. This approach has several
advantages:
\begin{itemize}
\item We can run the service on a non-standard port, but still access
it (via the proxy) on a standard port, just as for a dedicated
machine. We do not need a separate machine though:
We only need to configure the reverse proxy to relay requests
to the intended target servers.