diff options
author | Timo Teräs <timo.teras@iki.fi> | 2016-03-30 13:44:03 +0300 |
---|---|---|
committer | Timo Teräs <timo.teras@iki.fi> | 2016-03-30 14:29:40 +0300 |
commit | 8cb40c91cdfb00ddf04e88d3ecd40403890d90f7 (patch) | |
tree | 6d6a3de27525820abb740f1fa8347b6f03986bad /doc | |
parent | 86c5d2ee68f7b9c00ae4aeb5c8b3c5d82c5ebffc (diff) | |
download | quagga-8cb40c91cdfb00ddf04e88d3ecd40403890d90f7.tar.bz2 quagga-8cb40c91cdfb00ddf04e88d3ecd40403890d90f7.tar.xz |
cumulus take-3cumulus-take-3
Diffstat (limited to 'doc')
-rw-r--r-- | doc/bgpd.texi | 41 | ||||
-rw-r--r-- | doc/next-hop-tracking.txt | 326 | ||||
-rw-r--r-- | doc/routemap.texi | 4 |
3 files changed, 371 insertions, 0 deletions
diff --git a/doc/bgpd.texi b/doc/bgpd.texi index 5735111d..3ef7c8f7 100644 --- a/doc/bgpd.texi +++ b/doc/bgpd.texi @@ -581,6 +581,41 @@ Redistribute RIP route to BGP process. Redistribute OSPF route to BGP process. @end deffn +@deffn {BGP} {update-delay @var{max-delay}} {} +@deffnx {BGP} {update-delay @var{max-delay} @var{establish-wait}} {} +This feature is used to enable read-only mode on BGP process restart or when +BGP process is cleared using 'clear ip bgp *'. When applicable, read-only mode +would begin as soon as the first peer reaches Established status and a timer +for max-delay seconds is started. + +During this mode BGP doesn't run any best-path or generate any updates to its +peers. This mode continues until: +1. All the configured peers, except the shutdown peers, have sent explicit EOR +(End-Of-RIB) or an implicit-EOR. The first keep-alive after BGP has reached +Established is considered an implicit-EOR. + If the establish-wait optional value is given, then BGP will wait for + peers to reach established from the begining of the update-delay till the + establish-wait period is over, i.e. the minimum set of established peers for + which EOR is expected would be peers established during the establish-wait + window, not necessarily all the configured neighbors. +2. max-delay period is over. +On hitting any of the above two conditions, BGP resumes the decision process +and generates updates to its peers. + +Default max-delay is 0, i.e. the feature is off by default. +@end deffn + +@deffn {BGP} {table-map @var{route-map-name}} {} +This feature is used to apply a route-map on route updates from BGP to Zebra. +All the applicable match operations are allowed, such as match on prefix, +next-hop, communities, etc. Set operations for this attach-point are limited +to metric and next-hop only. Any operation of this feature does not affect +BGPs internal RIB. + +Supported for ipv4 and ipv6 address families. It works on multi-paths as well, +however, metric setting is based on the best-path only. +@end deffn + @node BGP Peer @section BGP Peer @@ -753,6 +788,12 @@ Apply a route-map on the neighbor. @var{direct} must be @code{in} or @code{out}. @end deffn +@deffn {BGP} {bgp route-reflector allow-outbound-policy} {} +By default, attribute modification via route-map policy out is not reflected +on reflected routes. This option allows the modifications to be reflected as +well. Once enabled, it affects all reflected routes. +@end deffn + @c ----------------------------------------------------------------------- @node BGP Peer Group @section BGP Peer Group diff --git a/doc/next-hop-tracking.txt b/doc/next-hop-tracking.txt new file mode 100644 index 00000000..d157866e --- /dev/null +++ b/doc/next-hop-tracking.txt @@ -0,0 +1,326 @@ +0. Introduction + +This is the design specification for next hop tracking feature in +Quagga. + +1. Background + +Recursive routes are of the form: + + p/m --> n + [Ex: 1.1.0.0/16 --> 2.2.2.2] + +where 'n' itself is resolved through another route as follows: + + p2/m --> h, interface + [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0] + +Usually, BGP routes are recursive in nature and BGP nexthops get +resolved through an IGP route. IGP usually adds its routes pointing to +an interface (these are called non-recursive routes). + +When BGP receives a recursive route from a peer, it needs to validate +the nexthop. The path is marked valid or invalid based on the +reachability status of the nexthop. Nexthop validation is also +important for BGP decision process as the metric to reach the nexthop +is a parameter to best path selection process. + +As it goes with routing, this is a dynamic process. Route to the +nexthop can change. The nexthop can become unreachable or +reachable. In the current BGP implementation, the nexthop validation +is done periodically in the scanner run. The default scanner run +interval is one minute. Every minute, the scanner task walks the +entire BGP table. It checks the validity of each nexthop with Zebra +(the routing table manager) through a request and response message +exchange between BGP and Zebra process. BGP process is blocked for +that duration. The mechanism has two major drawbacks: + +(1) The scanner task runs to completion. That can potentially starve + the other tasks for long periods of time, based on the BGP table + size and number of nexthops. + +(2) Convergence around routing changes that affect the nexthops can be + long (around a minute with the default intervals). The interval + can be shortened to achieve faster reaction time, but it makes the + first problem worse, with the scanner task consuming most of the + CPU resources. + +"Next hop tracking" feature makes this process event-driven. It +eliminates periodic nexthop validation and introduces an asynchronous +communication path between BGP and Zebra for route change notifications +that can then be acted upon. + +2. Goal + +Stating the obvious, the main goal is to remove the two limitations we +discussed in the previous section. The goals, in a constructive tone, +are the following: + +- fairness: the scanner run should not consume an unjustly high amount + of CPU time. This should give an overall good performance and + response time to other events (route changes, session events, + IO/user interface). + +- convergence: BGP must react to nexthop changes instantly and provide + sub-second convergence. This may involve diverting the routes from + one nexthop to another. + +3. Overview of the changes + +The changes are in both BGP and Zebra modules. The short summary is +the following: + +- Zebra implements a registration mechanism by which clients can + register for next hop notification. Consequently, it maintains a + separate table, per (VRF, AF) pair, of next hops and interested + client-list per next hop. + +- When the main routing table changes in Zebra, it evaluates the next + hop table: for each next hop, it checks if the route table + modifications have changed its state. If so, it notifies the + interested clients. + +- BGP is one such client. It registers the next hops corresponding to + all of its received routes/paths. It also threads the paths against + each nexthop structure. + +- When BGP receives a next hop notification from Zebra, it walks the + corresponding path list. It makes them valid or invalid depending + on the next hop notification. It then re-computes best path for the + corresponding destination. This may result in re-announcing those + destinations to peers. + +4. Design + +4.1. Modules + +The core design introduces an "nht" (next hop tracking) module in BGP +and "rnh" (recursive nexthop) module in Zebra. The "nht" module +provides the following APIs: + +bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table +bgp_find_nexthop() : find a nexthop in BGP nexthop table +bgp_parse_nexthop_update() : parse a nexthop update message coming + from zebra + +The "rnh" module provides the following APIs: + +zebra_add_rnh() : add a recursive nexthop +zebra_delete_rnh() : delete a recursive nexthop +zebra_lookup_rnh() : lookup a recursive nexthop + +zebra_add_rnh_client() : register a client for nexthop notifications + against a recursive nexthop + +zebra_remove_rnh_client(): remove the client registration for a + recursive nexthop + +zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table + (most probably because the main routing + table has changed). + +zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module + data structures (most probably because the + client is going away). + +4.2. Control flow + +The next hop registration control flow is the following: + +<==== BGP Process ====>|<==== Zebra Process ====> + | +receive module nht module | zserv module rnh module +---------------------------------------------------------------------- + | | | +bgp_update_ | | | + main() | bgp_find_or_add_ | | + | nexthop() | | + | | | + | | zserv_nexthop_ | + | | register() | + | | | zebra_add_rnh() + | | | + + +The next hop notification control flow is the following: + +<==== Zebra Process ====>|<==== BGP Process ====> + | +rib module rnh module | zebra module nht module +---------------------------------------------------------------------- + | | | +meta_queue_ | | | + process() | zebra_evaluate_ | | + | rnh_table() | | + | | | + | | bgp_read_nexthop_ | + | | update() | + | | | bgp_parse_ + | | | nexthop_update() + | | | + + +4.3. zclient message format + +ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are +encoded in the following way: + +/* + * 0 1 2 3 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + */ + +ZEBRA_NEXTHOP_UPDATE message is encoded as follows: + +/* + * 0 1 2 3 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix getting resolved . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | metric | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | #nexthops | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | nexthop type | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . resolving Nexthop details . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | nexthop type | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . resolving Nexthop details . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + */ + +4.4. BGP data structure + +Legend: + +/\ struct bgp_node: a BGP destination/route/prefix +\/ + +[ ] struct bgp_info: a BGP path (e.g. route received from a peer) + + _ +(_) struct bgp_nexthop_cache: a BGP nexthop + + + + /\ NULL + \/--+ ^ + | : + +--[ ]--[ ]--[ ]--> NULL + /\ : + \/--+ : + | : + +--[ ]--[ ]--> NULL + : + _ : + (_)............. + + +4.5. Zebra data structure + +rnh table: + + O + / \ + O O + / \ + O O + + struct rnh + { + u_char flags; + struct rib *state; + struct list *client_list; + struct route_node *node; + }; + +5. User interface changes + +quagga# show ip nht +3.3.3.3 + resolved via kernel + via 11.0.0.6, swp1 + Client list: bgp(fd 12) +11.0.0.10 + resolved via connected + is directly connected, swp2 + Client list: bgp(fd 12) +11.0.0.18 + resolved via connected + is directly connected, swp4 + Client list: bgp(fd 12) +11.11.11.11 + resolved via kernel + via 10.0.1.2, eth0 + Client list: bgp(fd 12) + +quagga# show ip bgp nexthop +Current BGP nexthop cache: + 3.3.3.3 valid [IGP metric 0], #paths 3 + Last update: Wed Oct 16 04:43:49 2013 + + 11.0.0.10 valid [IGP metric 1], #paths 1 + Last update: Wed Oct 16 04:43:51 2013 + + 11.0.0.18 valid [IGP metric 1], #paths 2 + Last update: Wed Oct 16 04:43:47 2013 + + 11.11.11.11 valid [IGP metric 0], #paths 1 + Last update: Wed Oct 16 04:43:47 2013 + +quagga# show ipv6 nht +quagga# show ip bgp nexthop detail + +quagga# debug bgp nht +quagga# debug zebra nht + +6. Sample test cases + + r2----r3 + / \ / + r1----r4 + +- Verify that a change in IGP cost triggers NHT + + shutdown the r1-r4 and r2-r4 links + + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back + up + + We should be back to the original nexthop via r4 now +- Verify that a NH becoming unreachable triggers NHT + + Shutdown all links to r4 +- Verify that a NH becoming reachable triggers NHT + + no shut all links to r4 + +7. Future work + +- route-policy for next hop validation (e.g. ignore default route) +- damping for rapid next hop changes +- prioritized handling of nexthop changes ((un)reachability vs. metric + changes) +- handling recursion loop, e.g. + 11.11.11.11/32 -> 12.12.12.12 + 12.12.12.12/32 -> 11.11.11.11 + 11.0.0.0/8 -> <interface> +- better statistics diff --git a/doc/routemap.texi b/doc/routemap.texi index 7938c965..b3ef7ca7 100644 --- a/doc/routemap.texi +++ b/doc/routemap.texi @@ -151,6 +151,10 @@ Matches the specified @var{as_path}. Matches the specified @var{metric}. @end deffn +@deffn {Route-map Command} {match local-preference @var{metric}} {} +Matches the specified @var{local-preference}. +@end deffn + @deffn {Route-map Command} {match community @var{community_list}} {} Matches the specified @var{community_list} @end deffn |