Diff: rfc9125.original

	rfc9125.original	rfc9125.txt


	BESS Working Group A. Farrel	Internet Engineering Task Force (IETF) A. Farrel
	Internet-Draft Old Dog Consulting	Request for Comments: 9125 Old Dog Consulting
	Intended status: Standards Track J. Drake	Category: Standards Track J. Drake
	Expires: January 23, 2022 E. Rosen	ISSN: 2070-1721 E. Rosen
	Juniper Networks	Juniper Networks
	K. Patel	K. Patel
	Arrcus, Inc.	Arrcus, Inc.
	L. Jalil	L. Jalil
	Verizon	Verizon

	July 22, 2021	August 2021


	Gateway Auto-Discovery and Route Advertisement for Segment Routing	Gateway Auto-Discovery and Route Advertisement for Site Interconnection
	Enabled Site Interconnection	Using Segment Routing
	draft-ietf-bess-datacenter-gateway-13

	Abstract	Abstract

	Data centers are attached to the Internet or a backbone network by	Data centers are attached to the Internet or a backbone network by
	gateway routers. One data center typically has more than one gateway	gateway routers. One data center typically has more than one gateway

	for commercial, load balancing, and resiliency reasons. Other sites,	for commercial, load-balancing, and resiliency reasons. Other sites,
	such as access networks, also need to be connected across backbone	such as access networks, also need to be connected across backbone
	networks through gateways.	networks through gateways.

	This document defines a mechanism using the BGP Tunnel Encapsulation	This document defines a mechanism using the BGP Tunnel Encapsulation
	attribute to allow data center gateway routers to advertise routes to	attribute to allow data center gateway routers to advertise routes to
	the prefixes reachable in the site, including advertising them on	the prefixes reachable in the site, including advertising them on
	behalf of other gateways at the same site. This allows segment	behalf of other gateways at the same site. This allows segment
	routing to be used to identify multiple paths across the Internet or	routing to be used to identify multiple paths across the Internet or
	backbone network between different gateways. The paths can be	backbone network between different gateways. The paths can be
	selected for load-balancing, resilience, and quality purposes.	selected for load-balancing, resilience, and quality purposes.

	Status of This Memo	Status of This Memo


	This Internet-Draft is submitted in full conformance with the	This is an Internet Standards Track document.
	provisions of BCP 78 and BCP 79.

	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at https://datatracker.ietf.org/drafts/current/.


	Internet-Drafts are draft documents valid for a maximum of six months	This document is a product of the Internet Engineering Task Force
	and may be updated, replaced, or obsoleted by other documents at any	(IETF). It represents the consensus of the IETF community. It has
	time. It is inappropriate to use Internet-Drafts as reference	received public review and has been approved for publication by the
	material or to cite them other than as "work in progress."	Internet Engineering Steering Group (IESG). Further information on
		Internet Standards is available in Section 2 of RFC 7841.


	This Internet-Draft will expire on January 23, 2022.	Information about the current status of this document, any errata,
		and how to provide feedback on it may be obtained at
		https://www.rfc-editor.org/info/rfc9125.

	Copyright Notice	Copyright Notice

	Copyright (c) 2021 IETF Trust and the persons identified as the	Copyright (c) 2021 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal
	Provisions Relating to IETF Documents	Provisions Relating to IETF Documents
	(https://trustee.ietf.org/license-info) in effect on the date of	(https://trustee.ietf.org/license-info) in effect on the date of
	publication of this document. Please review these documents	publication of this document. Please review these documents
	carefully, as they describe your rights and restrictions with respect	carefully, as they describe your rights and restrictions with respect
	to this document. Code Components extracted from this document must	to this document. Code Components extracted from this document must
	include Simplified BSD License text as described in Section 4.e of	include Simplified BSD License text as described in Section 4.e of
	the Trust Legal Provisions and are provided without warranty as	the Trust Legal Provisions and are provided without warranty as
	described in the Simplified BSD License.	described in the Simplified BSD License.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2	1. Introduction
	2. Requirements Language . . . . . . . . . . . . . . . . . . . . 5	2. Requirements Language
	3. Site Gateway Auto-Discovery . . . . . . . . . . . . . . . . . 5	3. Site Gateway Auto-Discovery
	4. Relationship to BGP Link State and Egress Peer Engineering . 7	4. Relationship to BGP - Link State and Egress Peer Engineering
	5. Advertising a Site Route Externally . . . . . . . . . . . . . 7	5. Advertising a Site Route Externally
	6. Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . 8	6. Encapsulation
	7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8	7. IANA Considerations
	8. Security Considerations . . . . . . . . . . . . . . . . . . . 8	8. Security Considerations
	9. Manageability Considerations . . . . . . . . . . . . . . . . 10	9. Manageability Considerations
	9.1. Relationship to Route Target Constraint . . . . . . . . . 10	9.1. Relationship to Route Target Constraint
	10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10	10. References
	11. References . . . . . . . . . . . . . . . . . . . . . . . . . 11	10.1. Normative References
	11.1. Normative References . . . . . . . . . . . . . . . . . . 11	10.2. Informative References
	11.2. Informative References . . . . . . . . . . . . . . . . . 11	Acknowledgements
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13	Authors' Addresses

	1. Introduction	1. Introduction

	Data centers (DCs) are critical components of the infrastructure used	Data centers (DCs) are critical components of the infrastructure used
	by network operators to provide services to their customers. DCs	by network operators to provide services to their customers. DCs
	(sites) are interconnected by a backbone network, which consists of	(sites) are interconnected by a backbone network, which consists of
	any number of private networks and/or the Internet. DCs are attached	any number of private networks and/or the Internet. DCs are attached

	to the backbone network by gateway routers (GWs). One DC typically	to the backbone network by routers that are gateways (GWs). One DC
	has more than one GW for various reasons including commercial	typically has more than one GW for various reasons including
	preferences, load balancing, or resiliency against connection or	commercial preferences, load balancing, or resiliency against
	device failure.	connection or device failure.


	Segment Routing (SR) [RFC8402] is a protocol mechanism that can be	Segment Routing (SR) ([RFC8402]) is a protocol mechanism that can be
	used within a DC, and also for steering traffic that flows between	used within a DC as well as for steering traffic that flows between
	two DC sites. In order for a source site (also known as an ingress	two DC sites. In order for a source site (also known as an ingress

	site) that uses SR to load balance the flows it sends to a	site) that uses SR to load-balance the flows it sends to a
	destination site (also known as an egress site), it needs to know the	destination site (also known as an egress site), it needs to know the
	complete set of entry nodes (i.e., GWs) for that egress DC from the	complete set of entry nodes (i.e., GWs) for that egress DC from the
	backbone network connecting the two DCs. Note that it is assumed	backbone network connecting the two DCs. Note that it is assumed
	that the connected set of DC sites and the border nodes in the	that the connected set of DC sites and the border nodes in the
	backbone network on the paths that connect the DC sites are part of	backbone network on the paths that connect the DC sites are part of

	the same SR BGP Link State (LS) instance ([RFC7752] and	the same SR BGP - Link State (LS) instance (see [RFC7752] and
	[I-D.ietf-idr-bgpls-segment-routing-epe]) so that traffic engineering	[RFC9086]) so that traffic engineering using SR may be used for these
	using SR may be used for these flows.	flows.

	Other sites, such as access networks, also need to be connected	Other sites, such as access networks, also need to be connected
	across backbone networks through gateways. For illustrative	across backbone networks through gateways. For illustrative
	purposes, consider the ingress and egress sites shown in Figure 1 as	purposes, consider the ingress and egress sites shown in Figure 1 as

	separate ASes (noting that the sites could be implemented as part of	separate Autonomous Systems (ASes) (noting that the sites could be
	the ASes to which they are attached, or as separate ASes). The	implemented as part of the ASes to which they are attached, or as
	various ASes that provide connectivity between the ingress and egress	separate ASes). The various ASes that provide connectivity between
	sites could each be constructed differently and use different	the ingress and egress sites could each be constructed differently
	technologies such as IP, MPLS using global table routing information	and use different technologies such as IP; MPLS using global table
	from native BGP, MPLS IP VPN, SR-MPLS IP VPN, or SRv6 IP VPN. That	routing information from BGP; MPLS IP VPN; SR-MPLS IP VPN; or SRv6 IP
	is, the ingress and egress sites can be connected by tunnels across a	VPN. That is, the ingress and egress sites can be connected by
	variety of technologies. This document describes how SR identifiers	tunnels across a variety of technologies. This document describes
	(SIDs) are used to identify the paths between the ingress and egress	how SR Segment Identifiers (SIDs) are used to identify the paths
	sites.	between the ingress and egress sites.

	The solution described in this document is agnostic as to whether the	The solution described in this document is agnostic as to whether the
	transit ASes do or do not have SR capabilities. The solution uses SR	transit ASes do or do not have SR capabilities. The solution uses SR

	to stitch together path segments between GWs and through the ASBRs.	to stitch together path segments between GWs and through the
	Thus, there is a requirement that the GWs and ASBRs are SR-capable.	Autonomous System Border Routers (ASBRs). Thus, there is a
	The solution supports the SR path being extended into the ingress and	requirement that the GWs and ASBRs are SR capable. The solution
	egress sites if they are SR-capable.	supports the SR path being extended into the ingress and egress sites
		if they are SR capable.

	The solution defined in this document can be seen in the broader	The solution defined in this document can be seen in the broader

	context of site interconnection in	context of site interconnection in [SR-INTERCONNECT]. That document
	[I-D.farrel-spring-sr-domain-interconnect]. That document shows how	shows how other existing protocol elements may be combined with the
	other existing protocol elements may be combined with the solution	solution defined in this document to provide a full system, but it is
	defined in this document to provide a full system, but is not a	not a necessary reference for understanding this document.
	necessary reference for understanding this document.

	Suppose that there are two gateways, GW1 and GW2 as shown in	Suppose that there are two gateways, GW1 and GW2 as shown in
	Figure 1, for a given egress site and that they each advertise a	Figure 1, for a given egress site and that they each advertise a

	route to prefix X which is located within the egress site with each	route to prefix X, which is located within the egress site with each
	setting itself as next hop. One might think that the GWs for X could	setting itself as next hop. One might think that the GWs for X could

	be inferred from the routes' next hop fields, but typically it is not	be inferred from the routes' next-hop fields, but typically it is not
	the case that both routes get distributed across the backbone: rather	the case that both routes get distributed across the backbone: rather
	only the best route, as selected by BGP, is distributed. This	only the best route, as selected by BGP, is distributed. This

	precludes load balancing flows across both GWs.	precludes load-balancing flows across both GWs.

	----------------- ---------------------	----------------- ---------------------
	\| Ingress \| \| Egress ------ \|	\| Ingress \| \| Egress ------ \|
	\| Site \| \| Site \|Prefix\| \|	\| Site \| \| Site \|Prefix\| \|
	\| \| \| \| X \| \|	\| \| \| \| X \| \|
	\| \| \| ------ \|	\| \| \| ------ \|
	\| -- \| \| --- --- \|	\| -- \| \| --- --- \|
	\| \|GW\| \| \| \|GW1\| \|GW2\| \|	\| \|GW\| \| \| \|GW1\| \|GW2\| \|
	-------++-------- ----+-----------+-+--	-------++-------- ----+-----------+-+--
	\| \ \| / \|	\| \ \| / \|

	skipping to change at page 4, line 30 ¶	skipping to change at line 165 ¶
	\| \| ----\| \|---- \| \|	\| \| ----\| \|---- \| \|
	\| \| AS1 \|ASBR+------+ASBR\| AS2 \| \|	\| \| AS1 \|ASBR+------+ASBR\| AS2 \| \|
	\| \| ----\| \|---- \| \|	\| \| ----\| \|---- \| \|
	\| --------------- -------------------- \|	\| --------------- -------------------- \|
	--+-----------------------------------------------+--	--+-----------------------------------------------+--
	\| \|ASBR\| \|ASBR\| \|	\| \|ASBR\| \|ASBR\| \|
	\| ---- AS3 ---- \|	\| ---- AS3 ---- \|
	\| \|	\| \|
	-----------------------------------------------------	-----------------------------------------------------


	Figure 1: Example Site Interconnection	Figure 1: Example Site Interconnection

	The obvious solution to this problem is to use the BGP feature that	The obvious solution to this problem is to use the BGP feature that
	allows the advertisement of multiple paths in BGP (known as Add-	allows the advertisement of multiple paths in BGP (known as Add-

	Paths) [RFC7911] to ensure that all routes to X get advertised by	Paths) ([RFC7911]) to ensure that all routes to X get advertised by
	BGP. However, even if this is done, the identity of the GWs will be	BGP. However, even if this is done, the identity of the GWs will be

	lost as soon as the routes get distributed through an Autonomous	lost as soon as the routes get distributed through an ASBR that will
	System Border Router (ASBR) that will set itself to be the next hop.	set itself to be the next hop. And if there are multiple ASes in the
	And if there are multiple Autonomous Systems (ASes) in the backbone,	backbone, not only will the next hop change several times, but the
	not only will the next hop change several times, but the Add-Paths	Add-Paths technique will experience scaling issues. This all means
	technique will experience scaling issues. This all means that the	that the Add-Paths approach is effectively limited to sites connected
	Add-Paths approach is effectively limited to sites connected over a	over a single AS.
	single AS.

	This document defines a solution that overcomes this limitation and	This document defines a solution that overcomes this limitation and
	works equally well with a backbone constructed from one or more ASes	works equally well with a backbone constructed from one or more ASes

	using the Tunnel Encapsulation attribute [RFC9012] as follows:	using the Tunnel Encapsulation attribute ([RFC9012]) as follows:

	When a GW to a given site advertises a route to a prefix X within	When a GW to a given site advertises a route to a prefix X within
	that site, it will include a Tunnel Encapsulation attribute that	that site, it will include a Tunnel Encapsulation attribute that
	contains the union of the Tunnel Encapsulation attributes	contains the union of the Tunnel Encapsulation attributes
	advertised by each of the GWs to that site, including itself.	advertised by each of the GWs to that site, including itself.

	In other words, each route advertised by a GW identifies all of the	In other words, each route advertised by a GW identifies all of the
	GWs to the same site (see Section 3 for a discussion of how GWs	GWs to the same site (see Section 3 for a discussion of how GWs

	discover each other). I.e., the Tunnel Encapsulation attribute	discover each other), i.e., the Tunnel Encapsulation attribute
	advertised by each GW contains multiple Tunnel TLVs, one or more from	advertised by each GW contains multiple Tunnel TLVs, one or more from
	each active GW, and each Tunnel TLV will contain a Tunnel Egress	each active GW, and each Tunnel TLV will contain a Tunnel Egress

	Endpoint Sub-TLV that identifies the GW for that Tunnel TLV.	Endpoint sub-TLV that identifies the GW for that Tunnel TLV.
	Therefore, even if only one of the routes is distributed to other	Therefore, even if only one of the routes is distributed to other
	ASes, it will not matter how many times the next hop changes, as the	ASes, it will not matter how many times the next hop changes, as the
	Tunnel Encapsulation attribute will remain unchanged.	Tunnel Encapsulation attribute will remain unchanged.

	To put this in the context of Figure 1, GW1 and GW2 discover each	To put this in the context of Figure 1, GW1 and GW2 discover each
	other as gateways for the egress site. Both GW1 and GW2 advertise	other as gateways for the egress site. Both GW1 and GW2 advertise
	themselves as having routes to prefix X. Furthermore, GW1 includes a	themselves as having routes to prefix X. Furthermore, GW1 includes a

	Tunnel Encapsulation attribute which is the union of its Tunnel	Tunnel Encapsulation attribute, which is the union of its Tunnel
	Encapsulation attribute and GW2's Tunnel Encapsulation attribute.	Encapsulation attribute and GW2's Tunnel Encapsulation attribute.

	Similarly, GW2 includes a Tunnel Encapsulation attribute which is the	Similarly, GW2 includes a Tunnel Encapsulation attribute, which is
	union of its Tunnel Encapsulation attribute and GW1's Tunnel	the union of its Tunnel Encapsulation attribute and GW1's Tunnel
	Encapsulation attribute. The gateway in the ingress site can now see	Encapsulation attribute. The gateway in the ingress site can now see
	all possible paths to X in the egress site regardless of which route	all possible paths to X in the egress site regardless of which route

	is propagated to it, and it can choose one, or balance traffic flows	is propagated to it, and it can choose one or balance traffic flows
	as it sees fit.	as it sees fit.

	2. Requirements Language	2. Requirements Language

	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and	"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and

	"OPTIONAL" in this document are to be interpreted as described in BCP	"OPTIONAL" in this document are to be interpreted as described in
	14 [RFC2119] [RFC8174] when, and only when, they appear in all	BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
	capitals, as shown here.	capitals, as shown here.

	3. Site Gateway Auto-Discovery	3. Site Gateway Auto-Discovery

	To allow a given site's GWs to auto-discover each other and to	To allow a given site's GWs to auto-discover each other and to
	coordinate their operations, the following procedures are	coordinate their operations, the following procedures are
	implemented:	implemented:


	o A route target ([RFC4360]) MUST be attached to each GW's auto-	* A route target ([RFC4360]) MUST be attached to each GW's auto-
	discovery route (defined below) and its value MUST be set to a	discovery route (defined below), and its value MUST be set to a
	value that indicates the site identifier. The rules for	value that indicates the site identifier. The rules for
	constructing a route target are detailed in [RFC4360]. It is	constructing a route target are detailed in [RFC4360]. It is
	RECOMMENDED that a Type x00 or x02 route target be used.	RECOMMENDED that a Type x00 or x02 route target be used.


	o Site identifiers are set through configuration. The site	* Site identifiers are set through configuration. The site
	identifiers MUST be the same across all GWs to the site (i.e., the	identifiers MUST be the same across all GWs to the site (i.e., the

	same identifier is used by all GWs to the same site), and MUST be	same identifier is used by all GWs to the same site) and MUST be
	unique across all sites that are connected (i.e., across all GWs	unique across all sites that are connected (i.e., across all GWs
	to all sites that are interconnected).	to all sites that are interconnected).


	o Each GW MUST construct an import filtering rule to import any	* Each GW MUST construct an import filtering rule to import any
	route that carries a route target with the same site identifier	route that carries a route target with the same site identifier
	that the GW itself uses. This means that only these GWs will	that the GW itself uses. This means that only these GWs will
	import those routes, and that all GWs to the same site will import	import those routes, and that all GWs to the same site will import
	each other's routes and will learn (auto-discover) the current set	each other's routes and will learn (auto-discover) the current set
	of active GWs for the site.	of active GWs for the site.

	The auto-discovery route that each GW advertises consists of the	The auto-discovery route that each GW advertises consists of the
	following:	following:


	o An IPv4 or IPv6 Network Layer Reachability Information (NLRI)	* IPv4 or IPv6 Network Layer Reachability Information (NLRI)
	[RFC4760] containing one of the GW's loopback addresses (that is,	([RFC4760]) containing one of the GW's loopback addresses (that
	with an AFI/SAFI pair that is one of IPv4/NLRI used for unicast	is, with an AFI/SAFI pair that is one of the following: IPv4/NLRI
	forwarding (1/1), IPv6/NLRI used for unicast forwarding (2/1),	used for unicast forwarding (1/1); IPv6/NLRI used for unicast
	IPv4/NLRI with MPLS Labels (1/4), or IPv6/NLRI with MPLS Labels	forwarding (2/1); IPv4/NLRI with MPLS Labels (1/4); or IPv6/NLRI
	(2/4)).	with MPLS Labels (2/4)).


	o A Tunnel Encapsulation attribute [RFC9012] containing the GW's	* A Tunnel Encapsulation attribute ([RFC9012]) containing the GW's
	encapsulation information encoded in one or more Tunnel TLVs.	encapsulation information encoded in one or more Tunnel TLVs.

	To avoid the side effect of applying the Tunnel Encapsulation	To avoid the side effect of applying the Tunnel Encapsulation
	attribute to any packet that is addressed to the GW itself, the	attribute to any packet that is addressed to the GW itself, the
	address advertised for auto-discovery MUST be a different loopback	address advertised for auto-discovery MUST be a different loopback
	address than is advertised for packets directed to the gateway	address than is advertised for packets directed to the gateway
	itself.	itself.

	As described in Section 1, each GW will include a Tunnel	As described in Section 1, each GW will include a Tunnel
	Encapsulation attribute with the GW encapsulation information for	Encapsulation attribute with the GW encapsulation information for
	each of the site's active GWs (including itself) in every route	each of the site's active GWs (including itself) in every route
	advertised externally to that site. As the current set of active GWs	advertised externally to that site. As the current set of active GWs
	changes (due to the addition of a new GW or the failure/removal of an	changes (due to the addition of a new GW or the failure/removal of an

	existing GW) each externally advertised route will be re-advertised	existing GW), each externally advertised route will be re-advertised
	with a new Tunnel Encapsulation attribute which reflects the current	with a new Tunnel Encapsulation attribute, which reflects the current
	set of active GWs.	set of active GWs.

	If a gateway becomes disconnected from the backbone network, or if	If a gateway becomes disconnected from the backbone network, or if
	the site operator decides to terminate the gateway's activity, it	the site operator decides to terminate the gateway's activity, it
	MUST withdraw the advertisements described above. This means that	MUST withdraw the advertisements described above. This means that
	remote gateways at other sites will stop seeing advertisements from	remote gateways at other sites will stop seeing advertisements from
	or about this gateway. Note that if the routing within a site is	or about this gateway. Note that if the routing within a site is
	broken (for example, such that there is a route from one GW to	broken (for example, such that there is a route from one GW to

	another, but not in the reverse direction), then it is possible that	another but not in the reverse direction), then it is possible that
	incoming traffic will be routed to the wrong GW to reach the	incoming traffic will be routed to the wrong GW to reach the

	destination prefix - in this degraded network situation, traffic may	destination prefix; in this degraded network situation, traffic may
	be dropped.	be dropped.

	Note that if a GW is (mis)configured with a different site identifier	Note that if a GW is (mis)configured with a different site identifier

	from the other GWs to the same site then it will not be auto-	from the other GWs to the same site, then it will not be auto-
	discovered by the other GWs (and will not auto-discover the other	discovered by the other GWs (and will not auto-discover the other
	GWs). This would result in a GW for another site receiving only the	GWs). This would result in a GW for another site receiving only the

	Tunnel Encapsulation attribute included in the BGP best route; i.e.,	Tunnel Encapsulation attribute included in the BGP best route, i.e.,
	the Tunnel Encapsulation attribute of the (mis)configured GW or that	the Tunnel Encapsulation attribute of the (mis)configured GW or that
	of the other GWs.	of the other GWs.


	4. Relationship to BGP Link State and Egress Peer Engineering	4. Relationship to BGP - Link State and Egress Peer Engineering

	When a remote GW receives a route to a prefix X, it uses the Tunnel	When a remote GW receives a route to a prefix X, it uses the Tunnel

	Egress Endpoint Sub-TLVs in the containing Tunnel Encapsulation	Egress Endpoint sub-TLVs in the containing Tunnel Encapsulation
	attribute to identify the GWs through which X can be reached. It	attribute to identify the GWs through which X can be reached. It
	uses this information to compute SR Traffic Engineering (SR TE) paths	uses this information to compute SR Traffic Engineering (SR TE) paths
	across the backbone network looking at the information advertised to	across the backbone network looking at the information advertised to

	it in SR BGP Link State (BGP-LS)	it in SR BGP - Link State (BGP-LS) ([RFC9085]) and correlated using
	[I-D.ietf-idr-bgp-ls-segment-routing-ext] and correlated using the	the site identity. SR Egress Peer Engineering (EPE) ([RFC9086]) can
	site identity. SR Egress Peer Engineering (EPE)	be used to supplement the information advertised in BGP-LS.
	[I-D.ietf-idr-bgpls-segment-routing-epe] can be used to supplement
	the information advertised in BGP-LS.

	5. Advertising a Site Route Externally	5. Advertising a Site Route Externally

	When a packet destined for prefix X is sent on an SR TE path to a GW	When a packet destined for prefix X is sent on an SR TE path to a GW
	for the site containing X (that is, the packet is sent in the ingress	for the site containing X (that is, the packet is sent in the ingress
	site on an SR TE path that describes the whole path including those	site on an SR TE path that describes the whole path including those
	parts that are within the egress site), it needs to carry the	parts that are within the egress site), it needs to carry the
	receiving GW's SID for X such that this SID becomes the next SID that	receiving GW's SID for X such that this SID becomes the next SID that
	is due to be processed before the GW completes its processing of the	is due to be processed before the GW completes its processing of the
	packet. To achieve this, each Tunnel TLV in the Tunnel Encapsulation	packet. To achieve this, each Tunnel TLV in the Tunnel Encapsulation

	attribute contains a Prefix-SID sub-TLV [RFC9012] for X.	attribute contains a Prefix-SID sub-TLV ([RFC9012]) for X.

	As defined in [RFC9012], the Prefix-SID sub-TLV is only for IPv4/IPV6	As defined in [RFC9012], the Prefix-SID sub-TLV is only for IPv4/IPV6

	labelled unicast routes, so the solution described in this document	Labeled Unicast routes, so the solution described in this document
	only applies to routes of those types. If the use of the Prefix-SID	only applies to routes of those types. If the use of the Prefix-SID
	sub-TLV for routes of other types is defined in the future, further	sub-TLV for routes of other types is defined in the future, further
	documents will be needed to describe their use for site	documents will be needed to describe their use for site
	interconnection consistent with this document.	interconnection consistent with this document.

	Alternatively, if MPLS SR is in use and if the GWs for a given egress	Alternatively, if MPLS SR is in use and if the GWs for a given egress
	site are configured to allow GWs at remote ingress sites to perform	site are configured to allow GWs at remote ingress sites to perform
	SR TE through that egress site for a prefix X, then each GW to the	SR TE through that egress site for a prefix X, then each GW to the

	egress site computes an SR TE path through the egress site to X, and	egress site computes an SR TE path through the egress site to X and
	places each in an MPLS label stack sub-TLV [RFC9012] in the SR Tunnel	places each in an MPLS Label Stack sub-TLV ([RFC9012]) in the SR
	TLV for that GW.	Tunnel TLV for that GW.


	Please refer to Section 7 of	Please refer to Section 7 of [SR-INTERCONNECT] for worked examples of
	[I-D.farrel-spring-sr-domain-interconnect] for worked examples of how	how the SID stack is constructed in this case and how the
	the SID stack is constructed in this case, and how the advertisements	advertisements would work.
	would work.

	6. Encapsulation	6. Encapsulation


	If a site is configured to allow remote GWs send packets to the site	If a site is configured to allow remote GWs to send packets to the
	in the site's native encapsulation, then each GW to the site will	site in the site's native encapsulation, then each GW to the site
	also include multiple instances of a Tunnel TLV for that native	will also include multiple instances of a Tunnel TLV for that native
	encapsulation in externally advertised routes: one for each GW and	encapsulation in externally advertised routes: one for each GW. Each
	each containing a Tunnel Egress Endpoint sub-TLV with that GW's	Tunnel TLV contains a Tunnel Egress Endpoint sub-TLV with the address
	address. A remote GW may then encapsulate a packet according to the	of the GW that the Tunnel TLV identifies. A remote GW may then
	rules defined via the sub-TLVs included in each of the Tunnel TLVs.	encapsulate a packet according to the rules defined via the sub-TLVs
		included in each of the Tunnel TLVs.

	7. IANA Considerations	7. IANA Considerations


	IANA maintains a registry called "Border Gateway Protocol (BGP)	IANA maintains the "BGP Tunnel Encapsulation Attribute Tunnel Types"
	Parameters" with a sub-registry called "BGP Tunnel Encapsulation	registry in the "Border Gateway Protocol (BGP) Tunnel Encapsulation"
	Attribute Tunnel Types." The registration policy for this registry	registry.
	is First-Come First-Served [RFC8126].


	IANA previously assigned the value 17 from this sub-registry for "SR	IANA had previously assigned the value 17 from this subregistry for
	Tunnel", referencing this document. IANA is now requested to mark	"SR Tunnel", referencing this document as an Internet-Draft. At that
	that assignment as deprecated. IANA may reclaim that codepoint at	time, the assignment policy for this range of the registry was "First
	such a time that the registry is depleted.	Come First Served" [RFC8126].

		IANA has marked that assignment as deprecated. IANA may reclaim that
		codepoint at such a time that the registry is depleted.

	8. Security Considerations	8. Security Considerations

	From a protocol point of view, the mechanisms described in this	From a protocol point of view, the mechanisms described in this
	document can leverage the security mechanisms already defined for	document can leverage the security mechanisms already defined for
	BGP. Further discussion of security considerations for BGP may be	BGP. Further discussion of security considerations for BGP may be

	found in the BGP specification itself [RFC4271] and in the security	found in the BGP specification itself ([RFC4271]) and in the security
	analysis for BGP [RFC4272]. The original discussion of the use of	analysis for BGP ([RFC4272]). The original discussion of the use of
	the TCP MD5 signature option to protect BGP sessions is found in	the TCP MD5 signature option to protect BGP sessions is found in
	[RFC5925], while [RFC6952] includes an analysis of BGP keying and	[RFC5925], while [RFC6952] includes an analysis of BGP keying and
	authentication issues.	authentication issues.

	The mechanisms described in this document involve sharing routing or	The mechanisms described in this document involve sharing routing or

	reachability information between sites: that may mean disclosing	reachability information between sites, which may mean disclosing
	information that is normally contained within a site. So it needs to	information that is normally contained within a site. So it needs to
	be understood that normal security paradigms based on the boundaries	be understood that normal security paradigms based on the boundaries
	of sites are weakened and interception of BGP messages may result in	of sites are weakened and interception of BGP messages may result in
	information being disclosed to third parties. Discussion of these	information being disclosed to third parties. Discussion of these
	issues with respect to VPNs can be found in [RFC4364], while	issues with respect to VPNs can be found in [RFC4364], while
	[RFC7926] describes many of the issues associated with the exchange	[RFC7926] describes many of the issues associated with the exchange
	of topology or TE information between sites.	of topology or TE information between sites.

	Particular exposures resulting from this work include:	Particular exposures resulting from this work include:


	o Gateways to a site will know about all other gateways to the same	* Gateways to a site will know about all other gateways to the same
	site. This feature applies within a site and so is not a	site. This feature applies within a site, so it is not a
	substantial exposure, but it does mean that if the BGP exchanges	substantial exposure, but it does mean that if the BGP exchanges

	within a site can be snooped or if a gateway can be subverted then	within a site can be snooped or if a gateway can be subverted,
	an attacker may learn the full set of gateways to a site. This	then an attacker may learn the full set of gateways to a site.
	would facilitate more effective attacks on that site.	This would facilitate more effective attacks on that site.


	o The existence of multiple gateways to a site becomes more visible	* The existence of multiple gateways to a site becomes more visible
	across the backbone and even into remote sites. This means that	across the backbone and even into remote sites. This means that
	an attacker is able to prepare a more comprehensive attack than	an attacker is able to prepare a more comprehensive attack than
	exists when only the locally attached backbone network (e.g., the	exists when only the locally attached backbone network (e.g., the
	AS that hosts the site) can see all of the gateways to a site.	AS that hosts the site) can see all of the gateways to a site.

	For example, a Denial of Service attack on a single GW is	For example, a Denial-of-Service attack on a single GW is
	mitigated by the existence of other GWs, but if the attacker knows	mitigated by the existence of other GWs, but if the attacker knows

	about all the gateways then the whole set can be attacked at once.	about all the gateways, then the whole set can be attacked at
		once.


	o A node in a site that does not have external BGP peering (i.e., is	* A node in a site that does not have external BGP peering (i.e., is
	not really a site gateway and cannot speak BGP into the backbone	not really a site gateway and cannot speak BGP into the backbone
	network) may be able to get itself advertised as a gateway by	network) may be able to get itself advertised as a gateway by
	letting other genuine gateways discover it (by speaking BGP to	letting other genuine gateways discover it (by speaking BGP to

	them within the site) and so may get those genuine gateways to	them within the site), so it may get those genuine gateways to
	advertise it as a gateway into the backbone network. This would	advertise it as a gateway into the backbone network. This would
	allow the malicious node to attract traffic without having to have	allow the malicious node to attract traffic without having to have
	secure BGP peerings with out-of-site nodes.	secure BGP peerings with out-of-site nodes.


	o An external party intercepting BGP messages anywhere between sites	* An external party intercepting BGP messages anywhere between sites
	may learn information about the functioning of the sites and the	may learn information about the functioning of the sites and the

	locations of end points. While this is not necessarily a	locations of endpoints. While this is not necessarily a
	significant security or privacy risk, it is possible that the	significant security or privacy risk, it is possible that the
	disclosure of this information could be used by an attacker.	disclosure of this information could be used by an attacker.


	o If it is possible to modify a BGP message within the backbone, it	* If it is possible to modify a BGP message within the backbone, it
	may be possible to spoof the existence of a gateway. This could	may be possible to spoof the existence of a gateway. This could
	cause traffic to be attracted to a specific node and might result	cause traffic to be attracted to a specific node and might result

	in black-holing of traffic.	in traffic not being delivered.

	All of the issues in the list above could cause disruption to site	All of the issues in the list above could cause disruption to site

	interconnection, but are not new protocol vulnerabilities so much as	interconnection, but they are not new protocol vulnerabilities so
	new exposures of information that SHOULD be protected against using	much as new exposures of information that SHOULD be protected against
	existing protocol mechanisms such as securing the TCP sessions over	using existing protocol mechanisms such as securing the TCP sessions
	which the BGP messages flow. Furthermore, it is a general	over which the BGP messages flow. Furthermore, it is a general
	observation that if these attacks are possible then it is highly	observation that if these attacks are possible, then it is highly
	likely that far more significant attacks can be made on the routing	likely that far more significant attacks can be made on the routing

	system. It should be noted that BGP peerings are not discovered, but	system. It should be noted that BGP peerings are not discovered but
	always arise from explicit configuration.	always arise from explicit configuration.

	Given that the gateways and ASBRs are connected by tunnels that may	Given that the gateways and ASBRs are connected by tunnels that may
	run across parts of the network that are not trusted, data center	run across parts of the network that are not trusted, data center
	operators using the approach set out in this network MUST consider	operators using the approach set out in this network MUST consider
	using gateway-to-gateway encryption to protect the data center	using gateway-to-gateway encryption to protect the data center
	traffic. Additionally, due consideration MUST be given to encrypting	traffic. Additionally, due consideration MUST be given to encrypting
	end-to-end traffic as it would be for any traffic that uses a public	end-to-end traffic as it would be for any traffic that uses a public
	or untrusted network for transport.	or untrusted network for transport.

	9. Manageability Considerations	9. Manageability Considerations

	The principal configuration item added by this solution is the	The principal configuration item added by this solution is the
	allocation of a site identifier. The same identifier MUST be	allocation of a site identifier. The same identifier MUST be
	assigned to every GW to the same site, and each site MUST have a	assigned to every GW to the same site, and each site MUST have a
	different identifier. This requires coordination, probably through a	different identifier. This requires coordination, probably through a
	central management agent.	central management agent.


	It should be noted that BGP peerings are not discovered, but always	It should be noted that BGP peerings are not discovered but always
	arise from explicit configuration. This is no different from any	arise from explicit configuration. This is no different from any
	other BGP operation.	other BGP operation.

	The site identifiers that are configured and carried in route targets	The site identifiers that are configured and carried in route targets
	(see Section 3) are an important feature to ensure that all of the	(see Section 3) are an important feature to ensure that all of the

	gateways to a site discover each other. It is, therefore, important	gateways to a site discover each other. Therefore, it is important
	that this value is not misconfigured since that would result in the	that this value is not misconfigured since that would result in the
	gateways not discovering each other and not advertising each other.	gateways not discovering each other and not advertising each other.

	9.1. Relationship to Route Target Constraint	9.1. Relationship to Route Target Constraint

	In order to limit the VPN routing information that is maintained at a	In order to limit the VPN routing information that is maintained at a

	given route reflector, [RFC4364] suggests the use of "Cooperative	given route reflector, [RFC4364] suggests that route reflectors use
	Route Filtering" [RFC5291] between route reflectors. [RFC4684]	"Cooperative Route Filtering", which was renamed "Outbound Route
	defines an extension to that mechanism to include support for	Filtering" and defined in [RFC5291]. [RFC4684] defines an extension
	multiple autonomous systems and asymmetric VPN topologies such as	to that mechanism to include support for multiple autonomous systems
	hub-and-spoke. The mechanism in RFC 4684 is known as Route Target	and asymmetric VPN topologies such as hub-and-spoke. The mechanism
	Constraint (RTC).	in RFC 4684 is known as Route Target Constraint (RTC).

	An operator would not normally configure RTC by default for any AFI/	An operator would not normally configure RTC by default for any AFI/

	SAFI combination, and would only enable it after careful	SAFI combination and would only enable it after careful
	consideration. When using the mechanisms defined in this document,	consideration. When using the mechanisms defined in this document,

	the operator should consider carefully the effects of filtering	the operator should carefully consider the effects of filtering
	routes. In some cases this may be desirable, and in others it could	routes. In some cases, this may be desirable, and in others, it
	limit the effectiveness of the procedures.	could limit the effectiveness of the procedures.

	10. Acknowledgements

	Thanks to Bruno Rijsman, Stephane Litkowski, Boris Hassanov, Linda
	Dunbar, Ravi Singh, and Daniel Migault for review comments, and to
	Robert Raszuk for useful discussions. Gyan Mishra provided a helpful
	GenArt review, and John Scudder and Benjamin Kaduk made helpful
	comments during IESG review.


	11. References	10. References


	11.1. Normative References	10.1. Normative References

	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
	Requirement Levels", BCP 14, RFC 2119,	Requirement Levels", BCP 14, RFC 2119,
	DOI 10.17487/RFC2119, March 1997,	DOI 10.17487/RFC2119, March 1997,
	<https://www.rfc-editor.org/info/rfc2119>.	<https://www.rfc-editor.org/info/rfc2119>.

	[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A	[RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A
	Border Gateway Protocol 4 (BGP-4)", RFC 4271,	Border Gateway Protocol 4 (BGP-4)", RFC 4271,
	DOI 10.17487/RFC4271, January 2006,	DOI 10.17487/RFC4271, January 2006,
	<https://www.rfc-editor.org/info/rfc4271>.	<https://www.rfc-editor.org/info/rfc4271>.

	skipping to change at page 11, line 47 ¶	skipping to change at line 509 ¶

	[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC	[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
	2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,	2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
	May 2017, <https://www.rfc-editor.org/info/rfc8174>.	May 2017, <https://www.rfc-editor.org/info/rfc8174>.

	[RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder,	[RFC9012] Patel, K., Van de Velde, G., Sangli, S., and J. Scudder,
	"The BGP Tunnel Encapsulation Attribute", RFC 9012,	"The BGP Tunnel Encapsulation Attribute", RFC 9012,
	DOI 10.17487/RFC9012, April 2021,	DOI 10.17487/RFC9012, April 2021,
	<https://www.rfc-editor.org/info/rfc9012>.	<https://www.rfc-editor.org/info/rfc9012>.


	11.2. Informative References	10.2. Informative References

	[I-D.farrel-spring-sr-domain-interconnect]
	Farrel, A. and J. Drake, "Interconnection of Segment
	Routing Sites - Problem Statement and Solution Landscape",
	draft-farrel-spring-sr-domain-interconnect-06 (work in
	progress), May 2021.

	[I-D.ietf-idr-bgp-ls-segment-routing-ext]
	Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H.,
	and M. Chen, "BGP Link-State extensions for Segment
	Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-18
	(work in progress), April 2021.

	[I-D.ietf-idr-bgpls-segment-routing-epe]
	Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray,
	S., and J. Dong, "BGP-LS extensions for Segment Routing
	BGP Egress Peer Engineering", draft-ietf-idr-bgpls-
	segment-routing-epe-19 (work in progress), May 2019.

	[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis",	[RFC4272] Murphy, S., "BGP Security Vulnerabilities Analysis",
	RFC 4272, DOI 10.17487/RFC4272, January 2006,	RFC 4272, DOI 10.17487/RFC4272, January 2006,
	<https://www.rfc-editor.org/info/rfc4272>.	<https://www.rfc-editor.org/info/rfc4272>.

	[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private	[RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
	Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February	Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
	2006, <https://www.rfc-editor.org/info/rfc4364>.	2006, <https://www.rfc-editor.org/info/rfc4364>.

	[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,	[RFC4684] Marques, P., Bonica, R., Fang, L., Martini, L., Raszuk,

	skipping to change at page 13, line 22 ¶	skipping to change at line 558 ¶
	[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for	[RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for
	Writing an IANA Considerations Section in RFCs", BCP 26,	Writing an IANA Considerations Section in RFCs", BCP 26,
	RFC 8126, DOI 10.17487/RFC8126, June 2017,	RFC 8126, DOI 10.17487/RFC8126, June 2017,
	<https://www.rfc-editor.org/info/rfc8126>.	<https://www.rfc-editor.org/info/rfc8126>.

	[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,	[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L.,
	Decraene, B., Litkowski, S., and R. Shakir, "Segment	Decraene, B., Litkowski, S., and R. Shakir, "Segment
	Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,	Routing Architecture", RFC 8402, DOI 10.17487/RFC8402,
	July 2018, <https://www.rfc-editor.org/info/rfc8402>.	July 2018, <https://www.rfc-editor.org/info/rfc8402>.


		[RFC9085] Previdi, S., Talaulikar, K., Ed., Filsfils, C., Gredler,
		H., and M. Chen, "Border Gateway Protocol - Link State
		(BGP-LS) Extensions for Segment Routing", RFC 9085,
		DOI 10.17487/RFC9085, August 2021,
		<https://www.rfc-editor.org/info/rfc9085>.

		[RFC9086] Previdi, S., Talaulikar, K., Ed., Filsfils, C., Patel, K.,
		Ray, S., and J. Dong, "Border Gateway Protocol - Link
		State (BGP-LS) Extensions for Segment Routing BGP Egress
		Peer Engineering", RFC 9086, DOI 10.17487/RFC9086, August
		2021, <https://www.rfc-editor.org/info/rfc9086>.

		[SR-INTERCONNECT]
		Farrel, A. and J. Drake, "Interconnection of Segment
		Routing Sites - Problem Statement and Solution Landscape",
		Work in Progress, Internet-Draft, draft-farrel-spring-sr-
		domain-interconnect-06, 19 May 2021,
		<https://datatracker.ietf.org/doc/html/draft-farrel-
		spring-sr-domain-interconnect-06>.

		Acknowledgements

		Thanks to Bruno Rijsman, Stephane Litkowski, Boris Hassanov, Linda
		Dunbar, Ravi Singh, and Daniel Migault for review comments, and to
		Robert Raszuk for useful discussions. Gyan Mishra provided a helpful
		GenArt review, and John Scudder and Benjamin Kaduk made helpful
		comments during IESG review.

	Authors' Addresses	Authors' Addresses

	Adrian Farrel	Adrian Farrel
	Old Dog Consulting	Old Dog Consulting

	Email: adrian@olddog.co.uk	Email: adrian@olddog.co.uk

	John Drake	John Drake
	Juniper Networks	Juniper Networks


End of changes. 73 change blocks.
	199 lines changed or deleted	197 lines changed or added
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/