summaryrefslogtreecommitdiffstats
path: root/bgpd
diff options
context:
space:
mode:
authorDonald Sharp <sharpd@cumulusnetworks.com>2015-03-30 06:32:52 -0700
committerPaul Jakma <paul@quagga.net>2015-09-24 15:26:42 +0100
commit7ef4221c3f85121edb68a6a54ebd6bb167408e47 (patch)
treefc347b5fa1935a8ef801c3af4b75167af0716934 /bgpd
parent234e5c8d5a35339fb319affb952581bf5abb48a7 (diff)
downloadquagga-7ef4221c3f85121edb68a6a54ebd6bb167408e47.tar.bz2
quagga-7ef4221c3f85121edb68a6a54ebd6bb167408e47.tar.xz
bgpd: Fix race in clearing completion
When a peer that is Established goes down, it is moved into the Clearing state to facilitate clearing of the routes received from the peer - remove from the RIB, reselect best path, update/delete from Zebra and to other peers etc. At the end of this, a Clearing_Completed event is generated to the FSM which will allow the peer to move out of Clearing to Idle. The issue in the code is that there is a possibility of multiple Clearing Completed events being generated for a peer, one per AFI/SAFI. Upon the first such event, the peer would move to Idle. If other events happened (e.g., new connection got established) before the last Clearing_Completed event is received, bad things can happen. Fix to ensure only one Clearing_Completed event is generated. Signed-off-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Diffstat (limited to 'bgpd')
-rw-r--r--bgpd/bgp_fsm.c18
-rw-r--r--bgpd/bgp_route.c32
2 files changed, 27 insertions, 23 deletions
diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c
index 6a2e41e8..80ced479 100644
--- a/bgpd/bgp_fsm.c
+++ b/bgpd/bgp_fsm.c
@@ -406,7 +406,23 @@ bgp_fsm_change_status (struct peer *peer, int status)
* (and must do so before actually changing into Deleted..
*/
if (status >= Clearing)
- bgp_clear_route_all (peer);
+ {
+ bgp_clear_route_all (peer);
+
+ /* If no route was queued for the clear-node processing, generate the
+ * completion event here. This is needed because if there are no routes
+ * to trigger the background clear-node thread, the event won't get
+ * generated and the peer would be stuck in Clearing. Note that this
+ * event is for the peer and helps the peer transition out of Clearing
+ * state; it should not be generated per (AFI,SAFI). The event is
+ * directly posted here without calling clear_node_complete() as we
+ * shouldn't do an extra unlock. This event will get processed after
+ * the state change that happens below, so peer will be in Clearing
+ * (or Deleted).
+ */
+ if (!peer->clear_node_queue->thread)
+ BGP_EVENT_ADD (peer, Clearing_Completed);
+ }
/* Preserve old status and change into new status. */
peer->ostatus = peer->status;
diff --git a/bgpd/bgp_route.c b/bgpd/bgp_route.c
index 02c926fe..f4012719 100644
--- a/bgpd/bgp_route.c
+++ b/bgpd/bgp_route.c
@@ -2979,8 +2979,13 @@ bgp_clear_route (struct peer *peer, afi_t afi, safi_t safi,
* on the process_main queue. Fast-flapping could cause that queue
* to grow and grow.
*/
+
+ /* lock peer in assumption that clear-node-queue will get nodes; if so,
+ * the unlock will happen upon work-queue completion; other wise, the
+ * unlock happens at the end of this function.
+ */
if (!peer->clear_node_queue->thread)
- peer_lock (peer); /* bgp_clear_node_complete */
+ peer_lock (peer);
switch (purpose)
{
@@ -3007,28 +3012,11 @@ bgp_clear_route (struct peer *peer, afi_t afi, safi_t safi,
assert (0);
break;
}
-
- /* If no routes were cleared, nothing was added to workqueue, the
- * completion function won't be run by workqueue code - call it here.
- * XXX: Actually, this assumption doesn't hold, see
- * bgp_clear_route_table(), we queue all non-empty nodes.
- *
- * Additionally, there is a presumption in FSM that clearing is only
- * really needed if peer state is Established - peers in
- * pre-Established states shouldn't have any route-update state
- * associated with them (in or out).
- *
- * We still can get here in pre-Established though, through
- * peer_delete -> bgp_fsm_change_status, so this is a useful sanity
- * check to ensure the assumption above holds.
- *
- * At some future point, this check could be move to the top of the
- * function, and do a quick early-return when state is
- * pre-Established, avoiding above list and table scans. Once we're
- * sure it is safe..
- */
+
+ /* unlock if no nodes got added to the clear-node-queue. */
if (!peer->clear_node_queue->thread)
- bgp_clear_node_complete (peer->clear_node_queue);
+ peer_unlock (peer);
+
}
void