blob: 068bc19e0fff50f063f6b5252a05e6c9a3b6cacb [file] [log] [blame]
Pau Espin Pedrolb1202722017-05-03 12:38:05 +02001[[osmux]]
2= OSmux: reduce of SAT uplink costs by protocol optimizations
3
4== Problem
5
6In case of satellite based GSM systems, the transmission cost on the back-haul
7is relatively expensive. The billing for such SAT uplink is usually done in a
8pay-per-byte basis. Thus, reducing the amount of bytes transfered would
9significantly reduce the cost of such uplinks. In such environment, even
10seemingly small protocol optimizations, eg. message batching and trunking, can
11result in significant cost reduction.
12
13This is true not only for speech codec frames, but also for the constant
14background load caused by the signalling link (A protocol). Optimizations in
15this protocol are applicable to both VSAT back-haul (best-effort background IP)
16as well as Inmarsat based links (QoS with guaranteed bandwidth).
17
18== Proposed solution
19
20In order to reduce the bandwidth consumption, this document proposes to develop
21a multiplex protocol that will be used to proxy voice and signalling traffic
22through the SAT links.
23
24=== Voice
25
26For the voice case, we propose a protocol that provides:
27
28* Batching: that consists of putting multiple codec frames on the sender side
29 into one single packet to reduce the protocol header overhead. This batch
30 is then sent as one RTP/UDP/IP packet at the same time. Currently, AMR 5.9
31 codec frames are transported in a RTP/UDP/IP protocol stacking. This means
32 there are 15 bytes of speech codec frame, plus a 2 byte RTP payload header,
33 plus the RTP (12 bytes), UDP (8 bytes) and IP (20 bytes) overhead. This means
34 we have 40 byte overhead for 17 byte payload.
35
36* Trunking: in case of multiple concurrent voice calls, each of them will
37 generate one speech codec frame every 20ms. Instead of sending only codec
38 frames of one voice call in a given IP packet, we can 'interleave' or trunk
39 the codec frames of multiple calls into one IP. This further increases the
40 IP packet size and thus improves the payload/overhead ratio.
41
42Both techniques should be applied without noticeable impact in terms of user
43experience. As the satellite back-haul has very high round trip time (several
44hundred milliseconds), adding some more delay is not going to make things
45significantly worse.
46
47For the batching, the idea consists of batching multiple codec frames on the
48sender side, A batching factor (B) of '4' means that we will send 4 codec
49frames in one underlying protocol packet. The additional delay of the batching
50can be computed as (B-1)*20ms as 20ms is the duration of one codec frame.
51Existing experimentation has shown that a batching factor of 4 to 8 (causing a
52delay of 60ms to 140ms) is acceptable and does not cause significant quality
53degradation.
54
55The main requirements for such voice RTP proxy are:
56
57* Always batch codec frames of multiple simultaneous calls into single UDP
58 message.
59
60* Batch configurable number codec frames of the same call into one UDP
61 message.
62
63* Make sure to properly reconstruct timing at receiver (non-bursty but
64 one codec frame every 20ms).
65
66* Implementation in libosmo-netif to make sure it can be used
67 in osmo-bts (towards osmo-bsc), osmo-bsc (towards osmo-bts and
68 osmo-bsc_nat) and osmo-bsc_nat (towards osmo-bsc)
69
70* Primary application will be with osmo-bsc connected via satellite link to
71 osmo-bsc_nat.
72
73* Make sure to properly deal with SID (silence detection) frames in case
74 of DTX.
75
76* Make sure to transmit and properly re-construct the M (marker) bit of
77 the RTP header, as it is used in AMR.
78
79* Primary use case for AMR codec, probably not worth to waste extra
80 payload byte on indicating codec type (amr/hr/fr/efr). If we can add
81 the codec type somewhere without growing the packet, we'll do it.
82 Otherwise, we'll skip this.
83
84=== Signalling
85
86Signalling uses SCCP/IPA/TCP/IP stacking. Considering SCCP as payload, this
87adds 3 (IPA) + 20 (TCP) + 20 (IP) = 43 bytes overhead for every signalling
88message, plus of course the 40-byte-sized TCP ACK sent in the opposite
89direction.
90
91While trying to look for alternatives, we consider that none of the standard IP
92layer 4 protocols are suitable for this application. We detail the reasons
93why:
94
95* TCP is a streaming protocol aimed at maximizing the throughput of a stream
96 withing the constraints of the underlying transport layer. This feature is
97 not really required for the low-bandwidth and low-pps GSM signalling.
98 Moreover, TCP is stream oriented and does not conserve message boundaries.
99 As such, the IPA header has to serve as a boundary between messages in the
100 stream. Moreover, assuming a generally quite idle signalling link, the
101 assumption of a pure TCP ACK (without any data segment) is very likely to
102 happen.
103
104* Raw IP or UDP as alternative is not a real option, as it does not recover
105 lost packets.
106
107* SCTP preserves message boundaries and allows for multiple streams
108 (multiplexing) within one connection, but it has too much overhead.
109
110For that reason, we propose the use of LAPD for this task. This protocol was
111originally specified to be used on top of E1 links for the A interface, who
112do not expose any kind of noticeable latency. LAPD resolves (albeit not as
113good as TCP does) packet loss and copes with packet re-ordering.
114
115LAPD has a very small header (3-5 octets) compared to TCPs 20 bytes. Even if
116LAPD is put inside UDP, the combination of 11 to 13 octets still saves a
117noticable number of bytes per packet. Moreover, LAPD has been modified for less
118reliable interfaces such as the GSM Um interface (LAPDm), as well as for the
119use in satellite systems (LAPsat in ETSI GMR).
120
121== OSmux protocol
122
123The OSmux protocol is the core of our proposed solution. This protocol operates
124over UDP or, alternatively, over raw IP. The designated default UDP port number
125and IP protocol type have not been yet decided.
126
127Every OSmux message starts with a control octet. The control octet contains a
1282-bit Field Type (FT) and its location starts on the 2nd bit for backward
129compatibility with older versions (used to be 3 bits). The FT defines the
130structure of the remaining header as well as the payload.
131
132The following FT values are assigned:
133
134* FT == 0: LAPD Signalling
135* FT == 1: AMR Codec
136* FT == 2: Dummy
137* FT == 3: Reserved for Fture Use
138
139There can be any number of OSmux messages batched up in one underlaying packet.
140In this case, the multiple OSmux messages are simply concatenated, i.e. the
141OSmux header control octet directly follows the last octet of the payload of the
142previous OSmux message.
143
144
145=== LAPD Signalling (0)
146
147 0 1 2 3
148 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
149+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
150|X|FT |X X X X X| PL-LENGTH | LAPD header + payload |
151+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
152
153Field Type (FT): 2 bits::
154The Field Type allocated for AMR codec is "0".
155
156This frame type is not yet supported inside OsmoCom and may be subject to
157change in future versions of the protocol.
158
159
160=== AMR Codec (1)
161
162This OSmux packet header is used to transport one or more RTP-AMR packets for a
163specific RTP stream identified by the Circuit ID field.
164
165 0 1 2 3
166 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
167+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
168|M|FT | CTR |F|Q| Red. TS/SeqNR | Circuit ID |AMR FT |AMR CMR|
169+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
170
171Marker (M): 1 bit::
172This is a 1:1 mapping from the RTP Marker (M) bit as specified in RFC3550
173Section 5.1 (RTP) as well as RFC3267 Section 4.1 (RTP-AMR). In AMR, the Marker
174is used to indicate the beginning of a talk-spurt, i.e. the end of a silence
175period. In case more than one AMR frame from the specific stream is batched into
176this OSmux header, it is guaranteed that the first AMR frame is the first in the
177talkspurt.
178
179Field Type (FT): 2 bits::
180The Field Type allocated for AMR codec is "1".
181
182Frame Counter (CTR): 2 bits::
183Provides the number of batched AMR payloads (starting 0) after the header. For
184instance, if there are 2 AMR payloads batched, CTR will be "1".
185
186AMR-F (F): 1 bit::
187This is a 1:1 mapping from the AMR F field in RFC3267 Section 4.3.2. In case
188there are multiple AMR codec frames with different F bit batched together, we
189only use the last F and ignore any previous F.
190
191AMR-Q (Q): 1 bit::
192This is a 1:1 mapping from the AMR Q field (Frame quality indicator) in RFC3267
193Section 4.3.2. In case there are multiple AMR codec frames with different Q bit
194batched together, we only use the last Q and ignore any previous Q.
195
196Circuit ID Code (CIC): 8 bits::
197Identifies the Circuit (Voice call), which in RTP is identified by {srcip,
198srcport, dstip, dstport, ssrc}.
199
200Reduced/Combined Timestamp and Sequence Number (RCTS): 8 bits::
201Resembles a combination of the RTP timestamp and sequence number. In the GSM
202system, speech codec frames are generated at a rate of 20ms. Thus, there is no
203need to have independent timestamp and sequence numbers (related to a 8kHz
204clock) as specified in AMR-RTP.
205
206AMR Codec Mode Request (AMR-FT): 4 bits::
207This is a mapping from te AMR FT field (Frame type index) in RFC3267 Section
2084.3.2. The length of each codec frame needs to be determined from this field. It
209is thus guaranteed that all frames for a specific stream in an OSmux batch are
210of the same AMR type.
211
212AMR Codec Mode Request (AMR-CMR): 4 bits::
213The RTP AMR payload header as specified in RFC3267 contains a 4-bit CMR field.
214Rather than transporting it in a separate octet, we squeeze it in the lower four
215bits of the clast octet. In case there are multiple AMR codec frames with
216different CMR, we only use the last CMR and ignore any previous CMR.
217
218==== Additional considerations
219
220* It can be assumed that all OSmux frames of type AMR Codec contain at least 1
221 AMR frame.
222* Given a batch factor of N frames (N>1), it can not be assumed that the amount
223 of AMR frames in any OSmux frame will always be N, due to some restrictions
224 mentioned above. For instance, a sender can decide to send before queueing the
225 expected N frames due to timing issues, or to conform with the restriction
226 that the first AMR frame in the batch must be the first in the talkspurt
227 (Marker M bit).
228
229
230=== Dummy (2)
231
232This kind of frame is used for NAT traversal. If a peer is behind a NAT, its
233source port specified in SDP will be a private port not accessible from the
234outside. Before other peers are able to send any packet to it, they require the
235mapping between the private and the public port to be set by the firewall,
236otherwise the firewall will most probably drop the incoming messages or send it
237to a wrong destination. The firewall in most cases won't create a mapping until
238the peer behind the NAT sends a packet to the peer residing outside.
239
240In this scenario, if the peer behind the nat is expecting to receive but never
241transmit audio, no packets will ever reach him. To solve this, the peer sends
242dummy packets to let the firewall create the port mapping. When the other peers
243receive this dummy packet, they can infer the relation between the original
244private port and the public port and start sending packets to it.
245
246When opening a connection, the peer is expected to send dummy packets until it
247starts sending real audio, at which point dummy packets are not needed anymore.
248
249 0 1 2 3
250 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
251+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
252|X|FT | CTR |X X|X X X X X X X X X| Circuit ID |AMR FT |X X X X|
253+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
254
255Field Type (FT): 2 bits::
256The Field Type allocated for AMR codec is "2".
257
258Frame Counter (CTR): 2 bits::
259Provides the number of dummy batched AMR payloads (starting 0) after the header.
260For instance, if there are 2 AMR payloads batched, CTR will be "1".
261
262Circuit ID Code (CIC): 8 bits::
263Identifies the Circuit (Voice call), which in RTP is identified by {srcip,
264srcport, dstip, dstport, ssrc}.
265
266AMR Codec Mode Request (AMR-FT): 4 bits::
267This field must contain any valid value described in the AMR FT field (Frame
268type index) in RFC3267 Section 4.3.2.
269
270==== Additional considerations
271
272* After the header, additional padding needs to be allocated to conform with CTR
273and AMR FT fields. For instance, if CTR is 0 and AMR FT is AMR 6.9, a padding
274of 17 bytes is to be allocated after the header.
275
276* On receival of this kind of OSmux frame, it's usually enough for the reader to
277 discard the header plus the calculated padding and keep operating.
278
279
280== Evaluation: Expected traffic savings
281
282The following figure shows the traffic saving (in %) depending on the number
283of concurrent numbers of callings (asumming trunking but no batching at all):
284----
285 Traffic savings (%)
286 100 ++-------+-------+--------+--------+-------+--------+-------+-------++
287 + + + + + + batch factor 1 **E*** +
288 | |
289 80 ++ ++
290 | |
291 | |
292 | ****E********E
293 60 ++ ****E*******E********E*** ++
294 | **E**** |
295 | **** |
296 40 ++ *E** ++
297 | ** |
298 | ** |
299 | ** |
300 20 ++ E ++
301 | |
302 + + + + + + + + +
303 0 ++-------+-------+--------+--------+-------+--------+-------+-------++
304 0 1 2 3 4 5 6 7 8
305 Concurrent calls
306----
307
308The results shows a saving of 15.79% with only one concurrent call, that
309quickly improves with more concurrent calls (due to trunking).
310
311We also provide the expected results by batching 4 messages for a single call:
312----
313 Traffic savings (%)
314 100 ++-------+-------+--------+--------+-------+--------+-------+-------++
315 + + + + + + batch factor 4 **E*** +
316 | |
317 80 ++ ++
318 | |
319 | |
320 | ****E********E*******E********E*******E********E
321 60 ++ ****E**** ++
322 | E*** |
323 | |
324 40 ++ ++
325 | |
326 | |
327 | |
328 20 ++ ++
329 | |
330 + + + + + + + + +
331 0 ++-------+-------+--------+--------+-------+--------+-------+-------++
332 0 1 2 3 4 5 6 7 8
333 Concurrent calls
334----
335
336The results show a saving of 56.68% with only one concurrent call. Trunking
337slightly improves the situation with more concurrent calls.
338
339We also provide the figure with batching factor of 8:
340----
341 Traffic savings (%)
342 100 ++-------+-------+--------+--------+-------+--------+-------+-------++
343 + + + + + + batch factor 8 **E*** +
344 | |
345 80 ++ ++
346 | |
347 | ****E*******E********E
348 | ****E********E********E*******E**** |
349 60 ++ E*** ++
350 | |
351 | |
352 40 ++ ++
353 | |
354 | |
355 | |
356 20 ++ ++
357 | |
358 + + + + + + + + +
359 0 ++-------+-------+--------+--------+-------+--------+-------+-------++
360 0 1 2 3 4 5 6 7 8
361 Concurrent calls
362----
363
364That shows very little improvement with regards to batching 4 messages.
365Still, we risk to degrade user experience. Thus, we consider a batching factor
366of 3 and 4 is adecuate.
367
368== Other proposed follow-up works
369
370The following sections describe features that can be considered in the mid-run
371to be included in the OSmux infrastructure. They will be considered for future
372proposals as extensions to this work. Therefore, they are NOT included in
373this proposal.
374
375=== Encryption
376
377Voice streams within OSmux can be encrypted in a similar manner to SRTP
378(RFC3711). The only potential problem is the use of a reduced sequence number,
379as it wraps in (20ms * 2^256 * B), i.e. 5.12s to 40.96s. However, as the
380receiver knows at which rate the codec frames are generated at the sender, he
381should be able to compute how much time has passed using his own timebase.
382
383Another alternative can be the use of DTLS (RFC 6347) that can be used to
384secure datagram traffic using TLS facilities (libraries like openssl and
385gnutls already support this).
386
387=== Multiple OSmux messages in one packet
388
389In case there is already at least one active voice call, there will be
390regular transmissions of voice codec frames. Depending on the batching
391factor, they will be sent every 70ms to 140ms. The size even of a
392batched (and/or trunked) codec message is still much lower than the MTU.
393
394Thus, any signalling (related or unrelated to the call causing the codec
395stream) can just be piggy-backed to the packets containing the voice
396codec frames.