| [[osmux]] |
| = OSmux: reduce of SAT uplink costs by protocol optimizations |
| |
| == Problem |
| |
| In case of satellite based GSM systems, the transmission cost on the back-haul |
| is relatively expensive. The billing for such SAT uplink is usually done in a |
| pay-per-byte basis. Thus, reducing the amount of bytes transfered would |
| significantly reduce the cost of such uplinks. In such environment, even |
| seemingly small protocol optimizations, eg. message batching and trunking, can |
| result in significant cost reduction. |
| |
| This is true not only for speech codec frames, but also for the constant |
| background load caused by the signalling link (A protocol). Optimizations in |
| this protocol are applicable to both VSAT back-haul (best-effort background IP) |
| as well as Inmarsat based links (QoS with guaranteed bandwidth). |
| |
| == Proposed solution |
| |
| In order to reduce the bandwidth consumption, this document proposes to develop |
| a multiplex protocol that will be used to proxy voice and signalling traffic |
| through the SAT links. |
| |
| === Voice |
| |
| For the voice case, we propose a protocol that provides: |
| |
| * Batching: that consists of putting multiple codec frames on the sender side |
| into one single packet to reduce the protocol header overhead. This batch |
| is then sent as one RTP/UDP/IP packet at the same time. Currently, AMR 5.9 |
| codec frames are transported in a RTP/UDP/IP protocol stacking. This means |
| there are 15 bytes of speech codec frame, plus a 2 byte RTP payload header, |
| plus the RTP (12 bytes), UDP (8 bytes) and IP (20 bytes) overhead. This means |
| we have 40 byte overhead for 17 byte payload. |
| |
| * Trunking: in case of multiple concurrent voice calls, each of them will |
| generate one speech codec frame every 20ms. Instead of sending only codec |
| frames of one voice call in a given IP packet, we can 'interleave' or trunk |
| the codec frames of multiple calls into one IP. This further increases the |
| IP packet size and thus improves the payload/overhead ratio. |
| |
| Both techniques should be applied without noticeable impact in terms of user |
| experience. As the satellite back-haul has very high round trip time (several |
| hundred milliseconds), adding some more delay is not going to make things |
| significantly worse. |
| |
| For the batching, the idea consists of batching multiple codec frames on the |
| sender side, A batching factor (B) of '4' means that we will send 4 codec |
| frames in one underlying protocol packet. The additional delay of the batching |
| can be computed as (B-1)*20ms as 20ms is the duration of one codec frame. |
| Existing experimentation has shown that a batching factor of 4 to 8 (causing a |
| delay of 60ms to 140ms) is acceptable and does not cause significant quality |
| degradation. |
| |
| The main requirements for such voice RTP proxy are: |
| |
| * Always batch codec frames of multiple simultaneous calls into single UDP |
| message. |
| |
| * Batch configurable number codec frames of the same call into one UDP |
| message. |
| |
| * Make sure to properly reconstruct timing at receiver (non-bursty but |
| one codec frame every 20ms). |
| |
| * Implementation in libosmo-netif to make sure it can be used |
| in osmo-bts (towards osmo-bsc), osmo-bsc (towards osmo-bts and |
| osmo-bsc_nat) and osmo-bsc_nat (towards osmo-bsc) |
| |
| * Primary application will be with osmo-bsc connected via satellite link to |
| osmo-bsc_nat. |
| |
| * Make sure to properly deal with SID (silence detection) frames in case |
| of DTX. |
| |
| * Make sure to transmit and properly re-construct the M (marker) bit of |
| the RTP header, as it is used in AMR. |
| |
| * Primary use case for AMR codec, probably not worth to waste extra |
| payload byte on indicating codec type (amr/hr/fr/efr). If we can add |
| the codec type somewhere without growing the packet, we'll do it. |
| Otherwise, we'll skip this. |
| |
| === Signalling |
| |
| Signalling uses SCCP/IPA/TCP/IP stacking. Considering SCCP as payload, this |
| adds 3 (IPA) + 20 (TCP) + 20 (IP) = 43 bytes overhead for every signalling |
| message, plus of course the 40-byte-sized TCP ACK sent in the opposite |
| direction. |
| |
| While trying to look for alternatives, we consider that none of the standard IP |
| layer 4 protocols are suitable for this application. We detail the reasons |
| why: |
| |
| * TCP is a streaming protocol aimed at maximizing the throughput of a stream |
| withing the constraints of the underlying transport layer. This feature is |
| not really required for the low-bandwidth and low-pps GSM signalling. |
| Moreover, TCP is stream oriented and does not conserve message boundaries. |
| As such, the IPA header has to serve as a boundary between messages in the |
| stream. Moreover, assuming a generally quite idle signalling link, the |
| assumption of a pure TCP ACK (without any data segment) is very likely to |
| happen. |
| |
| * Raw IP or UDP as alternative is not a real option, as it does not recover |
| lost packets. |
| |
| * SCTP preserves message boundaries and allows for multiple streams |
| (multiplexing) within one connection, but it has too much overhead. |
| |
| For that reason, we propose the use of LAPD for this task. This protocol was |
| originally specified to be used on top of E1 links for the A interface, who |
| do not expose any kind of noticeable latency. LAPD resolves (albeit not as |
| good as TCP does) packet loss and copes with packet re-ordering. |
| |
| LAPD has a very small header (3-5 octets) compared to TCPs 20 bytes. Even if |
| LAPD is put inside UDP, the combination of 11 to 13 octets still saves a |
| noticable number of bytes per packet. Moreover, LAPD has been modified for less |
| reliable interfaces such as the GSM Um interface (LAPDm), as well as for the |
| use in satellite systems (LAPsat in ETSI GMR). |
| |
| == OSmux protocol |
| |
| The OSmux protocol is the core of our proposed solution. This protocol operates |
| over UDP or, alternatively, over raw IP. The designated default UDP port number |
| and IP protocol type have not been yet decided. |
| |
| Every OSmux message starts with a control octet. The control octet contains a |
| 2-bit Field Type (FT) and its location starts on the 2nd bit for backward |
| compatibility with older versions (used to be 3 bits). The FT defines the |
| structure of the remaining header as well as the payload. |
| |
| The following FT values are assigned: |
| |
| * FT == 0: LAPD Signalling |
| * FT == 1: AMR Codec |
| * FT == 2: Dummy |
| * FT == 3: Reserved for Fture Use |
| |
| There can be any number of OSmux messages batched up in one underlaying packet. |
| In this case, the multiple OSmux messages are simply concatenated, i.e. the |
| OSmux header control octet directly follows the last octet of the payload of the |
| previous OSmux message. |
| |
| |
| === LAPD Signalling (0) |
| |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |X|FT |X X X X X| PL-LENGTH | LAPD header + payload | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| Field Type (FT): 2 bits:: |
| The Field Type allocated for AMR codec is "0". |
| |
| This frame type is not yet supported inside OsmoCom and may be subject to |
| change in future versions of the protocol. |
| |
| |
| === AMR Codec (1) |
| |
| This OSmux packet header is used to transport one or more RTP-AMR packets for a |
| specific RTP stream identified by the Circuit ID field. |
| |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |M|FT | CTR |F|Q| Red. TS/SeqNR | Circuit ID |AMR FT |AMR CMR| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| Marker (M): 1 bit:: |
| This is a 1:1 mapping from the RTP Marker (M) bit as specified in RFC3550 |
| Section 5.1 (RTP) as well as RFC3267 Section 4.1 (RTP-AMR). In AMR, the Marker |
| is used to indicate the beginning of a talk-spurt, i.e. the end of a silence |
| period. In case more than one AMR frame from the specific stream is batched into |
| this OSmux header, it is guaranteed that the first AMR frame is the first in the |
| talkspurt. |
| |
| Field Type (FT): 2 bits:: |
| The Field Type allocated for AMR codec is "1". |
| |
| Frame Counter (CTR): 2 bits:: |
| Provides the number of batched AMR payloads (starting 0) after the header. For |
| instance, if there are 2 AMR payloads batched, CTR will be "1". |
| |
| AMR-F (F): 1 bit:: |
| This is a 1:1 mapping from the AMR F field in RFC3267 Section 4.3.2. In case |
| there are multiple AMR codec frames with different F bit batched together, we |
| only use the last F and ignore any previous F. |
| |
| AMR-Q (Q): 1 bit:: |
| This is a 1:1 mapping from the AMR Q field (Frame quality indicator) in RFC3267 |
| Section 4.3.2. In case there are multiple AMR codec frames with different Q bit |
| batched together, we only use the last Q and ignore any previous Q. |
| |
| Circuit ID Code (CIC): 8 bits:: |
| Identifies the Circuit (Voice call), which in RTP is identified by {srcip, |
| srcport, dstip, dstport, ssrc}. |
| |
| Reduced/Combined Timestamp and Sequence Number (RCTS): 8 bits:: |
| Resembles a combination of the RTP timestamp and sequence number. In the GSM |
| system, speech codec frames are generated at a rate of 20ms. Thus, there is no |
| need to have independent timestamp and sequence numbers (related to a 8kHz |
| clock) as specified in AMR-RTP. |
| |
| AMR Codec Mode Request (AMR-FT): 4 bits:: |
| This is a mapping from te AMR FT field (Frame type index) in RFC3267 Section |
| 4.3.2. The length of each codec frame needs to be determined from this field. It |
| is thus guaranteed that all frames for a specific stream in an OSmux batch are |
| of the same AMR type. |
| |
| AMR Codec Mode Request (AMR-CMR): 4 bits:: |
| The RTP AMR payload header as specified in RFC3267 contains a 4-bit CMR field. |
| Rather than transporting it in a separate octet, we squeeze it in the lower four |
| bits of the clast octet. In case there are multiple AMR codec frames with |
| different CMR, we only use the last CMR and ignore any previous CMR. |
| |
| ==== Additional considerations |
| |
| * It can be assumed that all OSmux frames of type AMR Codec contain at least 1 |
| AMR frame. |
| * Given a batch factor of N frames (N>1), it can not be assumed that the amount |
| of AMR frames in any OSmux frame will always be N, due to some restrictions |
| mentioned above. For instance, a sender can decide to send before queueing the |
| expected N frames due to timing issues, or to conform with the restriction |
| that the first AMR frame in the batch must be the first in the talkspurt |
| (Marker M bit). |
| |
| |
| === Dummy (2) |
| |
| This kind of frame is used for NAT traversal. If a peer is behind a NAT, its |
| source port specified in SDP will be a private port not accessible from the |
| outside. Before other peers are able to send any packet to it, they require the |
| mapping between the private and the public port to be set by the firewall, |
| otherwise the firewall will most probably drop the incoming messages or send it |
| to a wrong destination. The firewall in most cases won't create a mapping until |
| the peer behind the NAT sends a packet to the peer residing outside. |
| |
| In this scenario, if the peer behind the nat is expecting to receive but never |
| transmit audio, no packets will ever reach him. To solve this, the peer sends |
| dummy packets to let the firewall create the port mapping. When the other peers |
| receive this dummy packet, they can infer the relation between the original |
| private port and the public port and start sending packets to it. |
| |
| When opening a connection, the peer is expected to send dummy packets until it |
| starts sending real audio, at which point dummy packets are not needed anymore. |
| |
| 0 1 2 3 |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |X|FT | CTR |X X|X X X X X X X X X| Circuit ID |AMR FT |X X X X| |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| |
| Field Type (FT): 2 bits:: |
| The Field Type allocated for AMR codec is "2". |
| |
| Frame Counter (CTR): 2 bits:: |
| Provides the number of dummy batched AMR payloads (starting 0) after the header. |
| For instance, if there are 2 AMR payloads batched, CTR will be "1". |
| |
| Circuit ID Code (CIC): 8 bits:: |
| Identifies the Circuit (Voice call), which in RTP is identified by {srcip, |
| srcport, dstip, dstport, ssrc}. |
| |
| AMR Codec Mode Request (AMR-FT): 4 bits:: |
| This field must contain any valid value described in the AMR FT field (Frame |
| type index) in RFC3267 Section 4.3.2. |
| |
| ==== Additional considerations |
| |
| * After the header, additional padding needs to be allocated to conform with CTR |
| and AMR FT fields. For instance, if CTR is 0 and AMR FT is AMR 6.9, a padding |
| of 17 bytes is to be allocated after the header. |
| |
| * On receival of this kind of OSmux frame, it's usually enough for the reader to |
| discard the header plus the calculated padding and keep operating. |
| |
| |
| == Evaluation: Expected traffic savings |
| |
| The following figure shows the traffic saving (in %) depending on the number |
| of concurrent numbers of callings (asumming trunking but no batching at all): |
| ---- |
| Traffic savings (%) |
| 100 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| + + + + + + batch factor 1 **E*** + |
| | | |
| 80 ++ ++ |
| | | |
| | | |
| | ****E********E |
| 60 ++ ****E*******E********E*** ++ |
| | **E**** | |
| | **** | |
| 40 ++ *E** ++ |
| | ** | |
| | ** | |
| | ** | |
| 20 ++ E ++ |
| | | |
| + + + + + + + + + |
| 0 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| 0 1 2 3 4 5 6 7 8 |
| Concurrent calls |
| ---- |
| |
| The results shows a saving of 15.79% with only one concurrent call, that |
| quickly improves with more concurrent calls (due to trunking). |
| |
| We also provide the expected results by batching 4 messages for a single call: |
| ---- |
| Traffic savings (%) |
| 100 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| + + + + + + batch factor 4 **E*** + |
| | | |
| 80 ++ ++ |
| | | |
| | | |
| | ****E********E*******E********E*******E********E |
| 60 ++ ****E**** ++ |
| | E*** | |
| | | |
| 40 ++ ++ |
| | | |
| | | |
| | | |
| 20 ++ ++ |
| | | |
| + + + + + + + + + |
| 0 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| 0 1 2 3 4 5 6 7 8 |
| Concurrent calls |
| ---- |
| |
| The results show a saving of 56.68% with only one concurrent call. Trunking |
| slightly improves the situation with more concurrent calls. |
| |
| We also provide the figure with batching factor of 8: |
| ---- |
| Traffic savings (%) |
| 100 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| + + + + + + batch factor 8 **E*** + |
| | | |
| 80 ++ ++ |
| | | |
| | ****E*******E********E |
| | ****E********E********E*******E**** | |
| 60 ++ E*** ++ |
| | | |
| | | |
| 40 ++ ++ |
| | | |
| | | |
| | | |
| 20 ++ ++ |
| | | |
| + + + + + + + + + |
| 0 ++-------+-------+--------+--------+-------+--------+-------+-------++ |
| 0 1 2 3 4 5 6 7 8 |
| Concurrent calls |
| ---- |
| |
| That shows very little improvement with regards to batching 4 messages. |
| Still, we risk to degrade user experience. Thus, we consider a batching factor |
| of 3 and 4 is adecuate. |
| |
| == Other proposed follow-up works |
| |
| The following sections describe features that can be considered in the mid-run |
| to be included in the OSmux infrastructure. They will be considered for future |
| proposals as extensions to this work. Therefore, they are NOT included in |
| this proposal. |
| |
| === Encryption |
| |
| Voice streams within OSmux can be encrypted in a similar manner to SRTP |
| (RFC3711). The only potential problem is the use of a reduced sequence number, |
| as it wraps in (20ms * 2^256 * B), i.e. 5.12s to 40.96s. However, as the |
| receiver knows at which rate the codec frames are generated at the sender, he |
| should be able to compute how much time has passed using his own timebase. |
| |
| Another alternative can be the use of DTLS (RFC 6347) that can be used to |
| secure datagram traffic using TLS facilities (libraries like openssl and |
| gnutls already support this). |
| |
| === Multiple OSmux messages in one packet |
| |
| In case there is already at least one active voice call, there will be |
| regular transmissions of voice codec frames. Depending on the batching |
| factor, they will be sent every 70ms to 140ms. The size even of a |
| batched (and/or trunked) codec message is still much lower than the MTU. |
| |
| Thus, any signalling (related or unrelated to the call causing the codec |
| stream) can just be piggy-backed to the packets containing the voice |
| codec frames. |