rfc9544.original   rfc9544.txt 
Network Working Group G. Mirsky Internet Engineering Task Force (IETF) G. Mirsky
Internet-Draft J. Halpern Request for Comments: 9544 J. Halpern
Intended status: Informational Ericsson Category: Informational Ericsson
Expires: 3 June 2024 X. Min ISSN: 2070-1721 X. Min
ZTE Corp. ZTE Corp.
A. Clemm A. Clemm
J. Strassner J. Strassner
Futurewei Futurewei
J. Francois J. Francois
Inria and University of Luxembourg Inria and University of Luxembourg
1 December 2023 February 2024
Precision Availability Metrics for Services Governed by Service Level Precision Availability Metrics (PAMs) for Services Governed by Service
Objectives (SLOs) Level Objectives (SLOs)
draft-ietf-ippm-pam-09
Abstract Abstract
This document defines a set of metrics for networking services with This document defines a set of metrics for networking services with
performance requirements expressed as Service Level Objectives (SLO). performance requirements expressed as Service Level Objectives
These metrics, referred to as Precision Availability Metrics (PAM), (SLOs). These metrics, referred to as "Precision Availability
are useful for defining and monitoring SLOs. For example, PAM can be Metrics (PAMs)", are useful for defining and monitoring SLOs. For
used by providers and/or customers of an RFC XXXX Network Slice example, PAMs can be used by providers and/or customers of an RFC
Service to assess whether the service is provided in compliance with 9543 Network Slice Service to assess whether the service is provided
its defined SLOs. in compliance with its defined SLOs.
Note to the RFC Editor: Please update "RFC XXXX Network Slice" with
the RFC number assigned to draft-ietf-teas-ietf-network-slices.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This document is not an Internet Standards Track specification; it is
provisions of BCP 78 and BCP 79. published for informational purposes.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Not all documents
approved by the IESG are candidates for any level of Internet
Standard; see Section 2 of RFC 7841.
This Internet-Draft will expire on 3 June 2024. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
https://www.rfc-editor.org/info/rfc9544.
Copyright Notice Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the Copyright (c) 2024 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents
license-info) in effect on the date of publication of this document. (https://trustee.ietf.org/license-info) in effect on the date of
Please review these documents carefully, as they describe your rights publication of this document. Please review these documents
and restrictions with respect to this document. Code Components carefully, as they describe your rights and restrictions with respect
extracted from this document must include Revised BSD License text as to this document. Code Components extracted from this document must
described in Section 4.e of the Trust Legal Provisions and are include Revised BSD License text as described in Section 4.e of the
provided without warranty as described in the Revised BSD License. Trust Legal Provisions and are provided without warranty as described
in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 2. Conventions
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Terminology
2.2. Acronyms . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. Acronyms
3. Precision Availability Metrics . . . . . . . . . . . . . . . 5 3. Precision Availability Metrics
3.1. Introducing Violated Intervals . . . . . . . . . . . . . 5 3.1. Introducing Violated Intervals
3.2. Derived Precision Availability Metrics . . . . . . . . . 6 3.2. Derived Precision Availability Metrics
3.3. PAM Configuration Settings and Service Availability . . . 8 3.3. PAM Configuration Settings and Service Availability
4. Statistical SLO . . . . . . . . . . . . . . . . . . . . . . . 9 4. Statistical SLO
5. Other Expected PAM Benefits . . . . . . . . . . . . . . . . . 10 5. Other Expected PAM Benefits
6. Extensions and Future Work . . . . . . . . . . . . . . . . . 10 6. Extensions and Future Work
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 7. IANA Considerations
8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 8. Security Considerations
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 11 9. Informative References
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 Acknowledgments
10.1. Informative References . . . . . . . . . . . . . . . . . 11 Contributors
Contributors' Addresses . . . . . . . . . . . . . . . . . . . . . 13 Authors' Addresses
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
Service providers and users often need to assess the quality with Service providers and users often need to assess the quality with
which network services are being delivered. In particular, in cases which network services are being delivered. In particular, in cases
where service level guarantees are documented (including their where service-level guarantees are documented (including their
companion metrology) as part of a contract established between the companion metrology) as part of a contract established between the
customer and the service provider, and Service Level Objectives customer and the service provider, and Service Level Objectives
(SLOs) are defined, it is essential to provide means to verify that (SLOs) are defined, it is essential to provide means to verify that
what has been delivered complies with what has been possibly what has been delivered complies with what has been possibly
negotiated and (contractually) defined between the customer and the negotiated and (contractually) defined between the customer and the
service provider. Examples of SLOs would be target values for the service provider. Examples of SLOs would be target values for the
maximum packet delay (one-way and/or round-trip) or maximum packet maximum packet delay (one-way and/or round-trip) or maximum packet
loss ratio that would be deemed acceptable. loss ratio that would be deemed acceptable.
More generally, SLOs can be used to characterize the ability of a More generally, SLOs can be used to characterize the ability of a
particular set of nodes to communicate according to certain particular set of nodes to communicate according to certain
measurable expectations. Those expectations can include but are not measurable expectations. Those expectations can include but are not
limited to aspects such as latency, delay variation, loss, capacity/ limited to aspects such as latency, delay variation, loss, capacity/
throughput, ordering, and fragmentation. Whatever SLO parameters are throughput, ordering, and fragmentation. Whatever SLO parameters are
chosen and whichever way service level parameters are being measured, chosen and whichever way service-level parameters are being measured,
precision availability metrics indicate whether or not a given Precision Availability Metrics indicate whether or not a given
service has been available according to expectations at all times. service has been available according to expectations at all times.
Several metrics (often documented in the IANA Registry of Performance Several metrics (often documented in the IANA "Performance Metrics"
Metrics [IANA-PM-Registry] according to [RFC8911] and [RFC8912]), can registry [IANA-PM-Registry] according to [RFC8911] and [RFC8912]) can
be used to characterize the service quality, expressing the perceived be used to characterize the service quality, expressing the perceived
quality of delivered networking services versus their SLOs. Of quality of delivered networking services versus their SLOs. Of
concern is not so much the absolute service level (for example, concern is not so much the absolute service level (for example,
actual latency experienced) but whether the service is provided in actual latency experienced) but whether the service is provided in
compliance with the negotiated and eventually contracted service compliance with the negotiated and eventually contracted service
levels. For instance, this may include whether the experienced levels. For instance, this may include whether the experienced
packet delay falls within an acceptable range that has been packet delay falls within an acceptable range that has been
contracted for the service. The specific quality of service depends contracted for the service. The specific quality of service depends
on the SLO or a set thereof for a given service that is in effect. A on the SLO or a set thereof for a given service that is in effect.
non-compliance to an SLO might result in the degradation of the Non-compliance to an SLO might result in the degradation of the
quality of experience for gamers or even jeopardize the safety of a quality of experience for gamers or even jeopardize the safety of a
large geographical area. large geographical area.
The same service level may be deemed acceptable for one application, The same service level may be deemed acceptable for one application,
while unacceptable for another, depending on the needs of the while unacceptable for another, depending on the needs of the
application. Hence it is not sufficient to measure service levels application. Hence, it is not sufficient to measure service levels
per se over time, but to assess the quality of the service being per se over time; the quality of the service being contextually
contextually provided (e.g., with the applicable SLO in mind). provided (e.g., with the applicable SLO in mind) must be also
However, at this point, there are no standard metrics that can be assessed. However, at this point, there are no standard metrics that
used to account for the quality with which services are delivered can be used to account for the quality with which services are
relative to their SLOs, and whether their SLOs are being met at all delivered relative to their SLOs or to determine whether their SLOs
times. Such metrics and the instrumentation to support them are are being met at all times. Such metrics and the instrumentation to
essential for various purposes, including monitoring (to ensure that support them are essential for various purposes, including monitoring
networking services are performing according to their objectives) as (to ensure that networking services are performing according to their
well as accounting (to maintain a record of service levels delivered, objectives) as well as accounting (to maintain a record of service
which is important for the monetization of such services as well as levels delivered, which is important for the monetization of such
for the triaging of problems). services as well as for the triaging of problems).
The current state-of-the-art of metrics includes, for example, The current state-of-the-art of metrics include, for example,
interface metrics, useful to obtain statistical data on traffic interface metrics that can be used to obtain statistical data on
volume and behavior that can be observed at an interface [RFC2863] traffic volume and behavior that can be observed at an interface
and [RFC8343]. However, they are agnostic of actual service levels [RFC2863] [RFC8343]. However, they are agnostic of actual service
and not specific to distinct flows. Flow records [RFC7011] and levels and not specific to distinct flows. Flow records [RFC7011]
[RFC7012] maintain statistics about flows, including flow volume and [RFC7012] maintain statistics about flows, including flow volume and
flow duration, but again, contain very little information about flow duration, but again, they contain very little information about
service levels, let alone whether the service levels delivered meet service levels, let alone whether the service levels delivered meet
their respective targets, i.e., their associated SLOs. their respective targets, i.e., their associated SLOs.
This specification introduces a new set of metrics, Precision This specification introduces a new set of metrics, Precision
Availability Metrics (PAM), aimed at capturing service levels for a Availability Metrics (PAMs), aimed at capturing service levels for a
flow, specifically the degree to which the flow complies with the flow, specifically the degree to which the flow complies with the
SLOs that are in effect. PAM can be used to assess whether a service SLOs that are in effect. PAMs can be used to assess whether a
is provided in compliance with its defined SLOs. This information service is provided in compliance with its defined SLOs. This
can be used in multiple ways, for example, to optimize service information can be used in multiple ways, for example, to optimize
delivery, take timely counteractions in the event of service service delivery, take timely counteractions in the event of service
degradation, or account for the quality of services being delivered. degradation, or account for the quality of services being delivered.
Availability is discussed in Section 3.4 of [RFC7297]. In this Availability is discussed in Section 3.4 of [RFC7297]. In this
document, the term "availability" reflects that a service that is document, the term "availability" reflects that a service that is
characterized by its SLOs is considered unavailable whenever those characterized by its SLOs is considered unavailable whenever those
SLOs are violated, even if basic connectivity is still working. SLOs are violated, even if basic connectivity is still working.
"Precision" refers to services whose service levels are governed by "Precision" refers to services whose service levels are governed by
SLOs and must be delivered precisely according to the associated SLOs and must be delivered precisely according to the associated
quality and performance requirements. It should be noted that quality and performance requirements. It should be noted that
precision refers to what is being assessed, not the mechanism used to precision refers to what is being assessed, not the mechanism used to
measure it. In other words, it does not refer to the precision of measure it. In other words, it does not refer to the precision of
the mechanism with which actual service levels are measured. the mechanism with which actual service levels are measured.
Furthermore, the precision, with respect to the delivery of an SLO, Furthermore, the precision, with respect to the delivery of an SLO,
particularly applies when a metric value approaches the specified particularly applies when a metric value approaches the specified
threshold levels in the SLO. threshold levels in the SLO.
The specification and implementation of methods that provide for The specification and implementation of methods that provide for
accurate measurements are separate topics independent of the accurate measurements are separate topics independent of the
definition of the metrics in which the results of such measurements definition of the metrics in which the results of such measurements
would be expressed. Likewise, Service Level Expectations (SLEs), as would be expressed. Likewise, Service Level Expectations (SLEs), as
defined in Section 5.1 of [I-D.ietf-teas-ietf-network-slices], are defined in Section 5.1 of [RFC9543], are outside the scope of this
outside the scope of this document. document.
2. Conventions and Terminology 2. Conventions
2.1. Terminology 2.1. Terminology
In this document, SLA and SLO are used as defined in [RFC3198]. The In this document, SLA and SLO are used as defined in [RFC3198]. The
reader may refer to Section 5.1 of reader may refer to Section 5.1 of [RFC9543] for an applicability
[I-D.ietf-teas-ietf-network-slices] for an applicability example of example of these concepts in the context of RFC 9543 Network Slice
these concepts in the context of RFC XXXX Network Slice Services. Services.
Note to the RFC Editor: Please update "RFC XXXX Network Slice" with
the RFC number assigned to [I-D.ietf-teas-ietf-network-slices].
2.2. Acronyms 2.2. Acronyms
PAM Precision Availability Metric IPFIX IP Flow Information Export
OAM Operations, Administration, and Maintenance PAM Precision Availability Metric
SLA Service Level Agreement
SLE Service Level Expectations SLA Service Level Agreement
SLO Service Level Objective SLE Service Level Expectation
VI Violated Interval SLO Service Level Objective
VIR Violated Interval Ratio SVI Severely Violated Interval
VPC Violated Packets Count SVIR Severely Violated Interval Ratio
SVI Severely Violated Interval SVPC Severely Violated Packets Count
SVIR Severely Violated Interval Ratio VFI Violation-Free Interval
SVPC Severely Violated Packets Count VI Violated Interval
VFI Violation-Free Interval VIR Violated Interval Ratio
VPC Violated Packets Count
3. Precision Availability Metrics 3. Precision Availability Metrics
3.1. Introducing Violated Intervals 3.1. Introducing Violated Intervals
When analyzing the availability metrics of a service between two When analyzing the availability metrics of a service between two
measurement points, a time interval as the unit of PAM needs to be measurement points, a time interval as the unit of PAMs needs to be
selected. In [ITU.G.826], a time interval of one second is used. selected. In [ITU.G.826], a time interval of one second is used.
That is reasonable, but some services may require different That is reasonable, but some services may require different
granularity (e.g., decamillisecond). For that reason, the time granularity (e.g., decamillisecond). For that reason, the time
interval in PAM is viewed as a variable parameter though constant for interval in PAMs is viewed as a variable parameter, though constant
a particular measurement session. Furthermore, for the purpose of for a particular measurement session. Furthermore, for the purpose
PAM, each time interval is classified either as Violated Interval of PAMs, each time interval is classified as either Violated Interval
(VI), Severely Violated Interval (SVI), or Violation-Free Interval (VI), Severely Violated Interval (SVI), or Violation-Free Interval
(VFI). These are defined as follows: (VFI). These are defined as follows:
* VI is a time interval during which at least one of the performance * VI is a time interval during which at least one of the performance
parameters degraded below its configurable optimal level parameters degraded below its configurable optimal threshold.
threshold.
* SVI is a time interval during which at least one of the * SVI is a time interval during which at least one of the
performance parameters degraded below its configurable critical performance parameters degraded below its configurable critical
threshold. threshold.
* Consequently, VFI is a time interval during which all performance * Consequently, VFI is a time interval during which all performance
parameters are at or better than their respective pre-defined parameters are at or better than their respective pre-defined
optimal levels. optimal levels.
The monitoring of performance parameters to determine the quality of The monitoring of performance parameters to determine the quality of
an interval is performed between the elements of the network that are an interval is performed between the elements of the network that are
referred to for the SLO corresponding to the performance parameter. identified in the SLO corresponding to the performance parameter.
Mechanisms of setting levels of a threshold of an SLO are outside the Mechanisms for setting levels of a threshold of an SLO are outside
scope of this document. the scope of this document.
From these definitions, a set of basic metrics can be defined that From the definitions above, a set of basic metrics can be defined
count the numbers of time intervals that fall into each category: that count the number of time intervals that fall into each category:
* VI count. * VI count
* SVI count. * SVI count
* VFI count. * VFI count
These count metrics are essential in calculating respective ratios These count metrics are essential in calculating respective ratios
(see Section 3.2) that can be used to assess the instability of a (see Section 3.2) that can be used to assess the instability of a
service. service.
Beyond accounting for violated intervals, it is sometimes beneficial Beyond accounting for violated intervals, it is sometimes beneficial
to maintain counts of packets for which a performance threshold is to maintain counts of packets for which a performance threshold is
violated. For example, this allows distinguishing between cases in violated. For example, this allows for distinguishing between cases
which violated intervals are caused by isolated violation occurrences in which violated intervals are caused by isolated violation
(such as, a sporadic issue that may be caused by a temporary spike in occurrences (such as a sporadic issue that may be caused by a
a queue depth along the packet's path) or by broad violations across temporary spike in a queue depth along the packet's path) or by broad
multiple packets (such as a problem with slow route convergence violations across multiple packets (such as a problem with slow route
across the network or more foundational issues such as insufficient convergence across the network or more foundational issues such as
network resources). Maintaining such counts and comparing them with insufficient network resources). Maintaining such counts and
the overall amount of traffic also facilitates assessing compliance comparing them with the overall amount of traffic also facilitate
with statistical SLOs (see Section 4). For these reasons, the assessing compliance with statistical SLOs (see Section 4). For
following additional metrics are defined: these reasons, the following additional metrics are defined:
* VPC: Violated packets count * VPC (Violated Packets Count)
* SVPC: Severely violated packets count * SVPC (Severely Violated Packets Count)
3.2. Derived Precision Availability Metrics 3.2. Derived Precision Availability Metrics
A set of metrics can be created based on PAM introduced in Section 3. A set of metrics can be created based on PAMs as introduced in this
In this document, these metrics are referred to as "derived PAM". document. In this document, these metrics are referred to as
Some of these metrics are modeled after Mean Time Between Failure "derived PAMs". Some of these metrics are modeled after Mean Time
(MTBF) metrics - a "failure" in this context referring to a failure Between Failure (MTBF) metrics; a "failure" in this context refers to
to deliver a service according to its SLO. a failure to deliver a service according to its SLO.
* Time since the last violated interval (e.g., since last violated * Time since the last violated interval (e.g., since last violated
ms, since last violated second). (This parameter is suitable for ms or since last violated second). This parameter is suitable for
monitoring the current compliance status of the service, e.g., for monitoring the current compliance status of the service, e.g., for
trending analysis.) trending analysis.
* Number of packets since the last violated packet. (This parameter * Number of packets since the last violated packet. This parameter
is suitable for the monitoring of the current compliance status of is suitable for the monitoring of the current compliance status of
the service.) the service.
* Mean time between VIs (e.g., between violated milliseconds, * Mean time between VIs (e.g., between violated milliseconds or
violated seconds) is the arithmetic mean of time between between violated seconds). This parameter is the arithmetic mean
consecutive VIs. of time between consecutive VIs.
* Mean packets between VIs is the arithmetic mean of the number of * Mean packets between VIs. This parameter is the arithmetic mean
SLO-compliant packets between consecutive VIs. (Another variation of the number of SLO-compliant packets between consecutive VIs.
of "MTBF" in a service setting.) It is another variation of MTBF in a service setting.
An analogous set of metrics can be produced for SVI: An analogous set of metrics can be produced for SVI:
* Time since the last SVI (e.g., since last violated ms, since last * Time since the last SVI (e.g., since last violated ms or since
violated second). (This parameter is suitable for the monitoring last violated second). This parameter is suitable for the
of the current compliance status of the service.) monitoring of the current compliance status of the service.
* Number of packets since the last severely violated packet. (This * Number of packets since the last severely violated packet. This
parameter is suitable for the monitoring of the current compliance parameter is suitable for the monitoring of the current compliance
status of the service.) status of the service.
* Mean time between SVIs (e.g., between severely violated * Mean time between SVIs (e.g., between severely violated
milliseconds, severely violated seconds) is the arithmetic mean of milliseconds or between severely violated seconds). This
time between consecutive SVIs. parameter is the arithmetic mean of time between consecutive SVIs.
* Mean packets between SVIs is the arithmetic mean of the number of * Mean packets between SVIs. This parameter is the arithmetic mean
SLO-compliant packets between consecutive SVIs. (Another of the number of SLO-compliant packets between consecutive SVIs.
variation of "MTBF" in a service setting.) It is another variation of "MTBF" in a service setting.
To indicate a historic degree of precision availability, additional To indicate a historic degree of precision availability, additional
derived PAMs can be defined as follows: derived PAMs can be defined as follows:
* Violated Interval Ratio (VIR) is the ratio of the summed numbers * Violated Interval Ratio (VIR) is the ratio of the summed numbers
of VIs and SVIs to the total number of time unit intervals in a of VIs and SVIs to the total number of time unit intervals in a
time of the availability periods during a fixed measurement time of the availability periods during a fixed measurement
session. session.
* Severely Violated Interval Ratio (SVIR) is the ratio of SVIs to * Severely Violated Interval Ratio (SVIR) is the ratio of SVIs to
the total number of time unit intervals in a time of the the total number of time unit intervals in a time of the
availability periods during a fixed measurement session. availability periods during a fixed measurement session.
3.3. PAM Configuration Settings and Service Availability 3.3. PAM Configuration Settings and Service Availability
It might be useful for a service provider to determine the current It might be useful for a service provider to determine the current
condition of the service for which PAMs are maintained. To condition of the service for which PAMs are maintained. To
facilitate this, it is conceivable to complement PAM with a state facilitate this, it is conceivable to complement PAMs with a state
model. Such a state model can be used to indicate whether a service model. Such a state model can be used to indicate whether a service
is currently considered as available or unavailable depending on the is currently considered as available or unavailable depending on the
network's recent ability to provide service without incurring network's recent ability to provide service without incurring
intervals during which violations occur. It is conceivable to define intervals during which violations occur. It is conceivable to define
such a state model in which transitions occur per some predefined PAM such a state model in which transitions occur per some predefined PAM
settings. settings.
While the definition of a service state model is outside the scope of While the definition of a service state model is outside the scope of
this document, the following section provides some considerations for this document, this section provides some considerations for how such
how such a state model and accompanying configuration settings could a state model and accompanying configuration settings could be
be defined. defined.
For example, a state model could be defined by a Finite State Machine For example, a state model could be defined by a Finite State Machine
featuring two states, "available" and "unavailable". The initial featuring two states: "available" and "unavailable". The initial
state could be "available". A service could subsequently be deemed state could be "available". A service could subsequently be deemed
as "unavailable" based on the number of successive interval as "unavailable" based on the number of successive interval
violations that have been experienced up to the particular violations that have been experienced up to the particular
observation time moment. To return to a state of "available", a observation time moment. To return to a state of "available", a
number of intervals without violations would need to be observed. number of intervals without violations would need to be observed.
The number of successive intervals with violations, as well as the The number of successive intervals with violations, as well as the
number of successive intervals that are free of violations, required number of successive intervals that are free of violations, required
for a state to transition to another state is defined by a for a state to transition to another state is defined by a
configuration setting. Specifically, the following configuration configuration setting. Specifically, the following configuration
parameters are defined: parameters are defined:
* Unavailability threshold: The number of successive intervals Unavailability threshold: The number of successive intervals during
during which a violation occurs to transition to an unavailable which a violation occurs to transition to an unavailable state.
state.
* Availability threshold: The number of successive intervals during Availability threshold: The number of successive intervals during
which no violations must occur to allow transition to an available which no violations must occur to allow transition to an available
state from a previously unavailable state. state from a previously unavailable state.
Additional configuration parameters could be defined to account for Additional configuration parameters could be defined to account for
the severity of violations. Likewise, it is conceivable to define the severity of violations. Likewise, it is conceivable to define
configuration settings that also take VIR and SVIR into account. configuration settings that also take VIR and SVIR into account.
4. Statistical SLO 4. Statistical SLO
It should be noted that certain SLAs may be statistical, requiring It should be noted that certain SLAs may be statistical, requiring
skipping to change at page 9, line 23 skipping to change at line 384
not necessarily constitute an SLO violation. However, it is still not necessarily constitute an SLO violation. However, it is still
useful to maintain those statistics, as the number of out-of-SLO useful to maintain those statistics, as the number of out-of-SLO
packets still matters when looked at in proportion to the total packets still matters when looked at in proportion to the total
number of packets. number of packets.
Along that vein, an SLA might establish a multi-tiered SLO of, say, Along that vein, an SLA might establish a multi-tiered SLO of, say,
end-to-end latency (from the lowest to highest tier) as follows: end-to-end latency (from the lowest to highest tier) as follows:
* not to exceed 30 ms for any packet; * not to exceed 30 ms for any packet;
* to not exceed 25 ms for 99.999% of packets; * not to exceed 25 ms for 99.999% of packets; and
* to not exceed 20 ms for 99% of packets. * not to exceed 20 ms for 99% of packets.
In that case, any individual packet with a latency greater than 20 ms In that case, any individual packet with a latency greater than 20 ms
latency and lower than 30 ms cannot be considered an SLO violation in latency and lower than 30 ms cannot be considered an SLO violation in
itself, but compliance with the SLO may need to be assessed after the itself, but compliance with the SLO may need to be assessed after the
fact. fact.
To support statistical SLOs more directly requires additional To support statistical SLOs more directly requires additional
metrics, for example, metrics that represent histograms for service metrics, for example, metrics that represent histograms for service-
level parameters with buckets corresponding to individual service level parameters with buckets corresponding to individual SLOs.
level objectives. Although the definition of histogram metrics is Although the definition of histogram metrics is outside the scope of
outside the scope of this document and could be considered for future this document and could be considered for future work (see
work Section 6, for the example just given, a histogram for a Section 6), for the example just given, a histogram for a particular
particular flow could be maintained with four buckets: one containing flow could be maintained with four buckets: one containing the count
the count of packets within 20 ms, a second with a count of packets of packets within 20 ms, a second with a count of packets between 20
between 20 and 25 ms (or simply all within 25 ms), a third with a and 25 ms (or simply all within 25 ms), a third with a count of
count of packets between 25 and 30 ms (or merely all packets within packets between 25 and 30 ms (or merely all packets within 30 ms),
30 ms, and a fourth with a count of anything beyond (or simply a and a fourth with a count of anything beyond (or simply a total
total count). Of course, the number of buckets and the boundaries count). Of course, the number of buckets and the boundaries between
between those buckets should correspond to the needs of the SLA those buckets should correspond to the needs of the SLA associated
associated with the application, i.e., to the specific guarantees and with the application, i.e., to the specific guarantees and SLOs that
SLOs that were provided. were provided.
5. Other Expected PAM Benefits 5. Other Expected PAM Benefits
PAM provides several benefits with other, more conventional PAMs provide several benefits with other, more conventional
performance metrics. Without PAM, it would be possible to conduct performance metrics. Without PAMs, it would be possible to conduct
ongoing measurements of service levels and maintain a time-series of ongoing measurements of service levels, maintain a time series of
service level records, then assess compliance with specific SLOs service-level records, and then assess compliance with specific SLOs
after the fact. However, doing so would require the collection of after the fact. However, doing so would require the collection of
vast amounts of data that would need to be generated, exported, vast amounts of data that would need to be generated, exported,
transmitted, collected, and stored. In addition, extensive transmitted, collected, and stored. In addition, extensive post-
postprocessing would be required to compare that data against SLOs processing would be required to compare that data against SLOs and
and analyze its compliance. Being able to perform these tasks at analyze its compliance. Being able to perform these tasks at scale
scale and in real-time would present significant additional and in real time would present significant additional challenges.
challenges.
Adding PAM allows for a more compact expression of service level Adding PAMs allows for a more compact expression of service-level
compliance. In that sense, PAM does not simply represent raw data compliance. In that sense, PAMs do not simply represent raw data but
but expresses actionable information. In conjunction with proper expresses actionable information. In conjunction with proper
instrumentation, PAM can thus help avoid expensive postprocessing. instrumentation, PAMs can thus help avoid expensive post-processing.
6. Extensions and Future Work 6. Extensions and Future Work
The following is a list of items that are outside the scope of this The following is a list of items that are outside the scope of this
specification, but which will be useful extensions and opportunities specification but will be useful extensions and opportunities for
for future work: future work:
* A YANG data model will allow PAM to be incorporated into * A YANG data model will allow PAMs to be incorporated into
monitoring applications based on the YANG/NETCONF/RESTCONF monitoring applications based on the YANG, NETCONF, and RESTCONF
framework. In addition, a YANG data model will enable the frameworks. In addition, a YANG data model will enable the
configuration and retrieval of PAM-related settings. configuration and retrieval of PAM-related settings.
* A set of IPFIX Information Elements will allow PAM to be * A set of IPFIX Information Elements will allow PAMs to be
associated with flow records and exported as part of flow data, associated with flow records and exported as part of flow data,
for example, for processing by accounting applications that assess for example, for processing by accounting applications that assess
compliance of delivered services with quality guarantees. compliance of delivered services with quality guarantees.
* Additional second-order metrics, such as "longest disruption of * Additional second-order metrics, such as "longest disruption of
service time" (measuring consecutive time units with SVIs), can be service time" (measuring consecutive time units with SVIs), can be
defined and would be deemed useful by some users. At the same defined and would be deemed useful by some users. At the same
time, such metrics can be computed in a straightforward manner and time, such metrics can be computed in a straightforward manner and
will in many cases be application-specific. For this reason, will be application specific in many cases. For this reason, such
further such metrics are omitted here in order to not overburden metrics are omitted here in order to not overburden this
this specification. specification.
* The definition of the metrics that represent histograms for * Metrics can be defined to represent histograms for service-level
service level parameters with buckets corresponding to individual parameters with buckets corresponding to individual SLOs.
service level objectives,
7. IANA Considerations 7. IANA Considerations
This document has no IANA actions. This document has no IANA actions.
8. Security Considerations 8. Security Considerations
Instrumentation for metrics that are used to assess compliance with Instrumentation for metrics that are used to assess compliance with
SLOs constitute an attractive target for an attacker. By interfering SLOs constitutes an attractive target for an attacker. By
with the maintenance of such metrics, services could be falsely interfering with the maintenance of such metrics, services could be
identified as complying (when they are not) or vice-versa (i.e., falsely identified as complying (when they are not) or vice versa
flagged as being non-compliant when indeed they are). While this (i.e., flagged as being non-compliant when indeed they are). While
document does not specify how networks should be instrumented to this document does not specify how networks should be instrumented to
maintain the identified metrics, such instrumentation needs to be maintain the identified metrics, such instrumentation needs to be
adequately secured to ensure accurate measurements and prohibit adequately secured to ensure accurate measurements and prohibit
tampering with metrics being kept. tampering with metrics being kept.
Where metrics are being defined relative to an SLO, the configuration Where metrics are being defined relative to an SLO, the configuration
of those SLOs needs to be adequately secured. Likewise, where SLOs of those SLOs needs to be adequately secured. Likewise, where SLOs
can be adjusted, the correlation between any metric instance and a can be adjusted, the correlation between any metric instance and a
particular SLO must be unambiguous. The same service levels that particular SLO must be unambiguous. The same service levels that
constitute SLO violations for one flow that should be maintained as constitute SLO violations for one flow and should be maintained as
part of the "violated time units" and related metrics, may be part of the "violated time units" and related metrics may be
compliant for another flow. In cases when it is impossible to tie compliant for another flow. In cases when it is impossible to tie
together SLOs and PAM, it will be preferable to merely maintain together SLOs and PAMs, it is preferable to merely maintain
statistics about service levels delivered (for example, overall statistics about service levels delivered (for example, overall
histograms of end-to-end latency) without assessing which constitutes histograms of end-to-end latency) without assessing which constitute
violations. violations.
By the same token, where the definition of what constitutes a By the same token, the definition of what constitutes a "severe" or a
"severe" or a "significant" violation depends on configuration "significant" violation depends on configuration settings or context.
settings or context. The configuration of such settings or context The configuration of such settings or context needs to be specially
needs to be specially secured. Also, the configuration must be bound secured. Also, the configuration must be bound to the metrics being
to the metrics being maintained. Thus, it will be clear which maintained. Thus, it will be clear which configuration setting was
configuration setting was in effect when those metrics were being in effect when those metrics were being assessed. An attacker that
assessed. An attacker that can tamper with such configuration can tamper with such configuration settings will render the
settings will render the corresponding metrics useless (in the best corresponding metrics useless (in the best case) or misleading (in
case) or misleading (in the worst case). the worst case).
9. Acknowledgments
The authors greatly appreciate review and comments by Bjørn Ivar
Teigen and Christian Jacquenet.
10. References
10.1. Informative References
[I-D.ietf-teas-ietf-network-slices] 9. Informative References
Farrel, A., Drake, J., Rokui, R., Homma, S., Makhijani,
K., Contreras, L. M., and J. Tantsura, "A Framework for
Network Slices in Networks Built from IETF Technologies",
Work in Progress, Internet-Draft, draft-ietf-teas-ietf-
network-slices-25, 14 September 2023,
<https://datatracker.ietf.org/doc/html/draft-ietf-teas-
ietf-network-slices-25>.
[IANA-PM-Registry] [IANA-PM-Registry]
IANA, "IANA Registry of Performance Metrics", March 2020, IANA, "Performance Metrics",
<https://www.iana.org/assignments/performance-metrics/ <https://www.iana.org/assignments/performance-metrics>.
performance-metrics.xhtml>.
[ITU.G.826] [ITU.G.826]
ITU-T, "End-to-end error performance parameters and ITU-T, "End-to-end error performance parameters and
objectives for international, constant bit-rate digital objectives for international, constant bit-rate digital
paths and connections", ITU-T G.826, December 2002. paths and connections", ITU-T G.826, December 2002.
[RFC2863] McCloghrie, K. and F. Kastenholz, "The Interfaces Group [RFC2863] McCloghrie, K. and F. Kastenholz, "The Interfaces Group
MIB", RFC 2863, DOI 10.17487/RFC2863, June 2000, MIB", RFC 2863, DOI 10.17487/RFC2863, June 2000,
<https://www.rfc-editor.org/info/rfc2863>. <https://www.rfc-editor.org/info/rfc2863>.
skipping to change at page 13, line 15 skipping to change at line 543
[RFC8911] Bagnulo, M., Claise, B., Eardley, P., Morton, A., and A. [RFC8911] Bagnulo, M., Claise, B., Eardley, P., Morton, A., and A.
Akhter, "Registry for Performance Metrics", RFC 8911, Akhter, "Registry for Performance Metrics", RFC 8911,
DOI 10.17487/RFC8911, November 2021, DOI 10.17487/RFC8911, November 2021,
<https://www.rfc-editor.org/info/rfc8911>. <https://www.rfc-editor.org/info/rfc8911>.
[RFC8912] Morton, A., Bagnulo, M., Eardley, P., and K. D'Souza, [RFC8912] Morton, A., Bagnulo, M., Eardley, P., and K. D'Souza,
"Initial Performance Metrics Registry Entries", RFC 8912, "Initial Performance Metrics Registry Entries", RFC 8912,
DOI 10.17487/RFC8912, November 2021, DOI 10.17487/RFC8912, November 2021,
<https://www.rfc-editor.org/info/rfc8912>. <https://www.rfc-editor.org/info/rfc8912>.
Contributors' Addresses [RFC9543] Farrel, A., Ed., Drake, J., Ed., Rokui, R., Homma, S.,
Makhijani, K., Contreras, L., and J. Tantsura, "A
Framework for Network Slices in Networks Built from IETF
Technologies", RFC 9543, DOI 10.17487/RFC9543, February
2024, <https://www.rfc-editor.org/info/rfc9543>.
Acknowledgments
The authors greatly appreciate review and comments by Bjørn Ivar
Teigen and Christian Jacquenet.
Contributors
Liuyan Han Liuyan Han
China Mobile China Mobile
32 XuanWuMenXi Street 32 XuanWuMenXi Street
Beijing Beijing
100053 100053
China China
Email: hanliuyan@chinamobile.com Email: hanliuyan@chinamobile.com
Mohamed Boucadair Mohamed Boucadair
skipping to change at page 14, line 4 skipping to change at line 587
Greg Mirsky Greg Mirsky
Ericsson Ericsson
Email: gregimirsky@gmail.com Email: gregimirsky@gmail.com
Joel Halpern Joel Halpern
Ericsson Ericsson
Email: joel.halpern@ericsson.com Email: joel.halpern@ericsson.com
Xiao Min Xiao Min
ZTE Corp. ZTE Corp.
Email: xiao.min2@zte.com.cn Email: xiao.min2@zte.com.cn
Alexander Clemm Alexander Clemm
Futurewei
2330 Central Expressway
Santa Clara, CA 95050
United States of America
Email: ludwig@clemm.org Email: ludwig@clemm.org
John Strassner John Strassner
Futurewei Futurewei
2330 Central Expressway 2330 Central Expressway
Santa Clara, CA 95050 Santa Clara, CA 95050
United States of America United States of America
Email: strazpdj@gmail.com Email: strazpdj@gmail.com
Jerome Francois Jerome Francois
Inria and University of Luxembourg Inria and University of Luxembourg
615 Rue du Jardin Botanique 615 Rue du Jardin Botanique
54600 Villers-les-Nancy 54600 Villers-les-Nancy
France France
Email: jerome.francois@inria.fr Email: jerome.francois@inria.fr
 End of changes. 84 change blocks. 
252 lines changed or deleted 231 lines changed or added

This html diff was produced by rfcdiff 1.48.