Documentation about job priorities added to man page.

Also includes docs about IKE_SA_INIT dropping.
author: Tobias Brunner <tobias@strongswan.org> 2011-07-21 16:17:08 +0200
committer: Tobias Brunner <tobias@strongswan.org> 2011-07-21 16:17:08 +0200
commit: 4f3ca916c50b0e0cddc170cc80012c71497f368c (patch)
tree: c813c05f76dbb15ef4749afda4b8e0df6bcb11f7 /man
parent: d33f6f7dba24a0cf9d34f93d0d79543d41abb72a (diff)
download: strongswan-4f3ca916c50b0e0cddc170cc80012c71497f368c.tar.bz2
strongswan-4f3ca916c50b0e0cddc170cc80012c71497f368c.tar.xz
1 files changed, 159 insertions, 1 deletions
diff --git a/man/strongswan.conf.5.in b/man/strongswan.conf.5.in
index da05eb1af..9fd4f6dbc 100644
--- a/man/strongswan.conf.5.in
+++ b/man/strongswan.conf.5.in
@@ -151,6 +151,9 @@ Section to define file loggers, see LOGGER CONFIGURATION
 .BR charon.flush_auth_cfg " [no]"
 
 .TP
+.BR charon.half_open_timeout " [30]"
+Timeout in seconds for connecting IKE_SAs (also see IKE_SA_INIT DROPPING).
+.TP
 .BR charon.hash_and_url " [no]"
 Enable hash and URL support
 .TP
@@ -166,6 +169,14 @@ Size of the IKE_SA hash table
 .BR charon.inactivity_close_ike " [no]"
 Whether to close IKE_SA if the only CHILD_SA closed due to inactivity
 .TP
+.BR charon.init_limit_half_open " [0]"
+Limit new connections based on the current number of half open IKE_SAs (see
+IKE_SA_INIT DROPPING).
+.TP
+.BR charon.init_limit_job_load " [0]"
+Limit new connections based on the number of jobs currently queued for
+processing (see IKE_SA_INIT DROPPING).
+.TP
 .BR charon.install_routes " [yes]"
 Install routes into a separate routing table for established IPsec tunnels
 .TP
@@ -502,6 +513,10 @@ Check daemon, libstrongswan and plugin integrity at startup
 .BR libstrongswan.leak_detective.detailed " [yes]"
 Includes source file names and line numbers in leak detective output
 .TP
+.BR libstrongswan.processor.priority_threads
+Subsection to configure the number of reserved threads per priority class
+see JOB PRIORITY MANAGEMENT
+.TP
 .BR libstrongswan.x509.enforce_critical " [yes]"
 Discard certificates with unsupported or unknown critical extensions
 .SS libstrongswan.plugins subsection
@@ -538,7 +553,7 @@ Command to be sent to the Test IMV
 .BR libimcv.plugins.imc_test.retry " [no]"
 Do a handshake retry
 .TP
-.BR libimcv.plugins.imc_test.retry_command 
+.BR libimcv.plugins.imc_test.retry_command
 Command to be sent to the Test IMV in the handshake retry
 .TP
 .BR libimcv.plugins.imv_test.rounds " [0]"
@@ -814,6 +829,149 @@ Also include sensitive material in dumps, e.g. keys
 	}
 .EE
 
+.SH JOB PRIORITY MANAGEMENT
+Some operations in the IKEv2 daemon charon are currently implemented
+synchronously and blocking. Two examples for such operations are communication
+with a RADIUS server via EAP-RADIUS, or fetching CRL/OCSP information during
+certificate chain verification. Under high load conditions, the thread pool may
+run out of available threads, and some more important jobs, such as liveness
+checking, may not get executed in time.
+.PP
+To prevent thread starvation in such situations job priorities were introduced.
+The job processor will reserve some threads for higher priority jobs, these
+threads are not available for lower priority, locking jobs.
+.SS Implementation
+Currently 4 priorities have been defined, and they are used in charon as
+follows:
+.TP
+.B CRITICAL
+Priority for long-running dispatcher jobs.
+.TP
+.B HIGH
+INFORMATIONAL exchanges, as used by liveness checking (DPD).
+.TP
+.B MEDIUM
+Everything not HIGH/LOW, including IKE_SA_INIT processing.
+.TP
+.B LOW
+IKE_AUTH message processing. RADIUS and CRL fetching block here
+.PP
+Although IKE_SA_INIT processing is computationally expensive, it is explicitly
+assigned to the MEDIUM class. This allows charon to do the DH exchange while
+other threads are blocked in IKE_AUTH. To prevent the daemon from accepting more
+IKE_SA_INIT requests than it can handle, use IKE_SA_INIT DROPPING.
+.PP
+The thread pool processes jobs strictly by priority, meaning it will consume all
+higher priority jobs before looking for ones with lower priority. Further, it
+reserves threads for certain priorities. A priority class having reserved
+.I n
+threads will always have
+.I n
+threads available for this class (either currently processing a job, or waiting
+for one).
+.SS Configuration
+To ensure that there are always enough threads available for higher priority
+tasks, threads must be reserved for each priority class.
+.TP
+.BR libstrongswan.processor.priority_threads.critical " [0]"
+Threads reserved for CRITICAL priority class jobs
+.TP
+.BR libstrongswan.processor.priority_threads.high " [0]"
+Threads reserved for HIGH priority class jobs
+.TP
+.BR libstrongswan.processor.priority_threads.medium " [0]"
+Threads reserved for MEDIUM priority class jobs
+.TP
+.BR libstrongswan.processor.priority_threads.low " [0]"
+Threads reserved for LOW priority class jobs
+.PP
+Let's consider the following configuration:
+.PP
+.EX
+	libstrongswan {
+		processor {
+			priority_threads {
+				high = 1
+				medium = 4
+			}
+		}
+	}
+.EE
+.PP
+With this configuration, one thread is reserved for HIGH priority tasks. As
+currently only liveness checking and stroke message processing is done with
+high priority, one or two threads should be sufficient.
+.PP
+The MEDIUM class mostly processes non-blocking jobs. Unless your setup is
+experiencing many blocks in locks while accessing shared resources, threads for
+one or two times the number of CPU cores is fine.
+.PP
+It is usually not required to reserve threads for CRITICAL jobs. Jobs in this
+class rarely return and do not release their thread to the pool.
+.PP
+The remaining threads are available for LOW priority jobs. Reserving threads
+does not make sense (until we have an even lower priority).
+.SS Monitoring
+To see what the threads are actually doing, invoke
+.IR "ipsec statusall" .
+Under high load, something like this will show up:
+.PP
+.EX
+	worker threads: 2 or 32 idle, 5/1/2/22 working,
+		job queue: 0/0/1/149, scheduled: 198
+.EE
+.PP
+From 32 worker threads,
+.IP 2
+are currently idle.
+.IP 5
+are running CRITICAL priority jobs (dispatching from sockets, etc.).
+.IP 1
+is currently handling a HIGH priority job. This is actually the thread currently
+providing this information via stroke.
+.IP 2
+are handling MEDIUM priority jobs, likely IKE_SA_INIT or CREATE_CHILD_SA
+messages.
+.IP 22
+are handling LOW priority jobs, probably waiting for an EAP-RADIUS response
+while processing IKE_AUTH messages.
+.PP
+The job queue load shows how many jobs are queued for each priority, ready for
+execution. The single MEDIUM priority job will get executed immediately, as
+we have two spare threads reserved for MEDIUM class jobs.
+
+.SH IKE_SA_INIT DROPPING
+If a responder receives more connection requests per seconds than it can handle,
+it does not make sense to accept more IKE_SA_INIT messages. And if they are
+queued but can't get processed in time, an answer might be sent after the
+client has already given up and restarted its connection setup. This
+additionally increases the load on the responder.
+.PP
+To limit the responder load resulting from new connection attempts, the daemon
+can drop IKE_SA_INIT messages just after reception. There are two mechanisms to
+decide if this should happen, configured with the following options:
+.TP
+.BR charon.init_limit_half_open " [0]"
+Limit based on the number of half open IKE_SAs. Half open IKE_SAs are SAs in
+connecting state, but not yet established.
+.TP
+.BR charon.init_limit_job_load " [0]"
+Limit based on the number of jobs currently queued for processing (sum over all
+job priorities).
+.PP
+The second limit includes load from other jobs, such as rekeying. Choosing a
+good value is difficult and depends on the hardware and expected load.
+.PP
+The first limit is simpler to calculate, but includes the load from new
+connections only. If your responder is capable of negotiating 100 tunnels/s, you
+might set this limit to 1000. The daemon will then drop new connection attempts
+if generating a response would require more than 10 seconds. If you are
+allowing for a maximum response time of more than 30 seconds, consider adjusting
+the timeout for connecting IKE_SAs
+.RB ( charon.half_open_timeout ).
+A responder, by default, deletes an IKE_SA if the initiator does not establish
+it within 30 seconds. Under high load, a higher value might be required.
+
 .SH LOAD TESTS
 To do stability testing and performance optimizations, the IKEv2 daemon charon
 provides the load-tester plugin. This plugin allows to setup thousands of
author	Tobias Brunner <tobias@strongswan.org>	2011-07-21 16:17:08 +0200
committer	Tobias Brunner <tobias@strongswan.org>	2011-07-21 16:17:08 +0200
commit	4f3ca916c50b0e0cddc170cc80012c71497f368c (patch)
tree	c813c05f76dbb15ef4749afda4b8e0df6bcb11f7 /man
parent	d33f6f7dba24a0cf9d34f93d0d79543d41abb72a (diff)
download	strongswan-4f3ca916c50b0e0cddc170cc80012c71497f368c.tar.bz2 strongswan-4f3ca916c50b0e0cddc170cc80012c71497f368c.tar.xz