Blame - doc/manuals/osmux-reference.adoc - osmo-bsc

Pau Espin Pedrol

b120272

2017-05-03 12:38:05 +0200

[diff] [blame]

1

[[osmux]]

2

= OSmux: reduce of SAT uplink costs by protocol optimizations

== Problem

In case of satellite based GSM systems, the transmission cost on the back-haul

7

is relatively expensive. The billing for such SAT uplink is usually done in a

8

pay-per-byte basis. Thus, reducing the amount of bytes transfered would

9

significantly reduce the cost of such uplinks. In such environment, even

10

seemingly small protocol optimizations, eg. message batching and trunking, can

11

result in significant cost reduction.

12

13

This is true not only for speech codec frames, but also for the constant

14

background load caused by the signalling link (A protocol). Optimizations in

15

this protocol are applicable to both VSAT back-haul (best-effort background IP)

16

as well as Inmarsat based links (QoS with guaranteed bandwidth).

== Proposed solution

In order to reduce the bandwidth consumption, this document proposes to develop

21

a multiplex protocol that will be used to proxy voice and signalling traffic

22

through the SAT links.

=== Voice

For the voice case, we propose a protocol that provides:

27

28

* Batching: that consists of putting multiple codec frames on the sender side

29

into one single packet to reduce the protocol header overhead. This batch

30

is then sent as one RTP/UDP/IP packet at the same time. Currently, AMR 5.9

31

codec frames are transported in a RTP/UDP/IP protocol stacking. This means

32

there are 15 bytes of speech codec frame, plus a 2 byte RTP payload header,

33

plus the RTP (12 bytes), UDP (8 bytes) and IP (20 bytes) overhead. This means

34

we have 40 byte overhead for 17 byte payload.

35

36

* Trunking: in case of multiple concurrent voice calls, each of them will

37

generate one speech codec frame every 20ms. Instead of sending only codec

38

frames of one voice call in a given IP packet, we can 'interleave' or trunk

39

the codec frames of multiple calls into one IP. This further increases the

40

IP packet size and thus improves the payload/overhead ratio.

41

42

Both techniques should be applied without noticeable impact in terms of user

43

experience. As the satellite back-haul has very high round trip time (several

44

hundred milliseconds), adding some more delay is not going to make things

45

significantly worse.

46

47

For the batching, the idea consists of batching multiple codec frames on the

48

sender side, A batching factor (B) of '4' means that we will send 4 codec

49

frames in one underlying protocol packet. The additional delay of the batching

50

can be computed as (B-1)*20ms as 20ms is the duration of one codec frame.

51

Existing experimentation has shown that a batching factor of 4 to 8 (causing a

52

delay of 60ms to 140ms) is acceptable and does not cause significant quality

53

degradation.

54

55

The main requirements for such voice RTP proxy are:

56

57

* Always batch codec frames of multiple simultaneous calls into single UDP

58

message.

59

60

* Batch configurable number codec frames of the same call into one UDP

61

message.

62

63

* Make sure to properly reconstruct timing at receiver (non-bursty but

64

one codec frame every 20ms).

65

66

* Implementation in libosmo-netif to make sure it can be used

67

in osmo-bts (towards osmo-bsc), osmo-bsc (towards osmo-bts and

68

osmo-bsc_nat) and osmo-bsc_nat (towards osmo-bsc)

69

70

* Primary application will be with osmo-bsc connected via satellite link to

71

osmo-bsc_nat.

72

73

* Make sure to properly deal with SID (silence detection) frames in case

74

of DTX.

75

76

* Make sure to transmit and properly re-construct the M (marker) bit of

77

the RTP header, as it is used in AMR.

78

79

* Primary use case for AMR codec, probably not worth to waste extra

80

payload byte on indicating codec type (amr/hr/fr/efr). If we can add

81

the codec type somewhere without growing the packet, we'll do it.

82

Otherwise, we'll skip this.

=== Signalling

Signalling uses SCCP/IPA/TCP/IP stacking. Considering SCCP as payload, this

87

adds 3 (IPA) + 20 (TCP) + 20 (IP) = 43 bytes overhead for every signalling

88

message, plus of course the 40-byte-sized TCP ACK sent in the opposite

89

direction.

90

91

While trying to look for alternatives, we consider that none of the standard IP

92

layer 4 protocols are suitable for this application. We detail the reasons

93

why:

94

95

* TCP is a streaming protocol aimed at maximizing the throughput of a stream

96

withing the constraints of the underlying transport layer. This feature is

97

not really required for the low-bandwidth and low-pps GSM signalling.

98

Moreover, TCP is stream oriented and does not conserve message boundaries.

99

As such, the IPA header has to serve as a boundary between messages in the

100

stream. Moreover, assuming a generally quite idle signalling link, the

101

assumption of a pure TCP ACK (without any data segment) is very likely to

102

happen.

103

104

* Raw IP or UDP as alternative is not a real option, as it does not recover

105

lost packets.

106

107

* SCTP preserves message boundaries and allows for multiple streams

108

(multiplexing) within one connection, but it has too much overhead.

109

110

For that reason, we propose the use of LAPD for this task. This protocol was

111

originally specified to be used on top of E1 links for the A interface, who

112

do not expose any kind of noticeable latency. LAPD resolves (albeit not as

113

good as TCP does) packet loss and copes with packet re-ordering.

114

115

LAPD has a very small header (3-5 octets) compared to TCPs 20 bytes. Even if

116

LAPD is put inside UDP, the combination of 11 to 13 octets still saves a

117

noticable number of bytes per packet. Moreover, LAPD has been modified for less

118

reliable interfaces such as the GSM Um interface (LAPDm), as well as for the

119

use in satellite systems (LAPsat in ETSI GMR).

== OSmux protocol

The OSmux protocol is the core of our proposed solution. This protocol operates

124

over UDP or, alternatively, over raw IP. The designated default UDP port number

125

and IP protocol type have not been yet decided.

126

127

Every OSmux message starts with a control octet. The control octet contains a

128

2-bit Field Type (FT) and its location starts on the 2nd bit for backward

129

compatibility with older versions (used to be 3 bits). The FT defines the

130

structure of the remaining header as well as the payload.

131

132

The following FT values are assigned:

133

134

* FT == 0: LAPD Signalling

135

* FT == 1: AMR Codec

136

* FT == 2: Dummy

137

* FT == 3: Reserved for Fture Use

138

139

There can be any number of OSmux messages batched up in one underlaying packet.

140

In this case, the multiple OSmux messages are simply concatenated, i.e. the

141

OSmux header control octet directly follows the last octet of the payload of the

142

previous OSmux message.

143

144

145

=== LAPD Signalling (0)

146

147

0 1 2 3

148

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

149

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

150

151

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

152

153

Field Type (FT): 2 bits::

154

The Field Type allocated for AMR codec is "0".

155

156

This frame type is not yet supported inside OsmoCom and may be subject to

157

change in future versions of the protocol.

=== AMR Codec (1)

This OSmux packet header is used to transport one or more RTP-AMR packets for a

163

specific RTP stream identified by the Circuit ID field.

164

165

0 1 2 3

166

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

167

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

168

|M|FT | CTR |F|Q| Red. TS/SeqNR | Circuit ID |AMR FT |AMR CMR|

169

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

170

171

Marker (M): 1 bit::

172

This is a 1:1 mapping from the RTP Marker (M) bit as specified in RFC3550

173

Section 5.1 (RTP) as well as RFC3267 Section 4.1 (RTP-AMR). In AMR, the Marker

174

is used to indicate the beginning of a talk-spurt, i.e. the end of a silence

175

period. In case more than one AMR frame from the specific stream is batched into

176

this OSmux header, it is guaranteed that the first AMR frame is the first in the

177

talkspurt.

178

179

Field Type (FT): 2 bits::

180

The Field Type allocated for AMR codec is "1".

181

182

Frame Counter (CTR): 2 bits::

183

Provides the number of batched AMR payloads (starting 0) after the header. For

184

instance, if there are 2 AMR payloads batched, CTR will be "1".

185

186

AMR-F (F): 1 bit::

187

This is a 1:1 mapping from the AMR F field in RFC3267 Section 4.3.2. In case

188

there are multiple AMR codec frames with different F bit batched together, we

189

only use the last F and ignore any previous F.

190

191

AMR-Q (Q): 1 bit::

192

This is a 1:1 mapping from the AMR Q field (Frame quality indicator) in RFC3267

193

Section 4.3.2. In case there are multiple AMR codec frames with different Q bit

194

batched together, we only use the last Q and ignore any previous Q.

195

196

Circuit ID Code (CIC): 8 bits::

197

Identifies the Circuit (Voice call), which in RTP is identified by {srcip,

198

srcport, dstip, dstport, ssrc}.

199

200

Reduced/Combined Timestamp and Sequence Number (RCTS): 8 bits::

201

Resembles a combination of the RTP timestamp and sequence number. In the GSM

202

system, speech codec frames are generated at a rate of 20ms. Thus, there is no

203

need to have independent timestamp and sequence numbers (related to a 8kHz

204

clock) as specified in AMR-RTP.

205

206

AMR Codec Mode Request (AMR-FT): 4 bits::

207

This is a mapping from te AMR FT field (Frame type index) in RFC3267 Section

208

4.3.2. The length of each codec frame needs to be determined from this field. It

209

is thus guaranteed that all frames for a specific stream in an OSmux batch are

210

of the same AMR type.

211

212

AMR Codec Mode Request (AMR-CMR): 4 bits::

213

The RTP AMR payload header as specified in RFC3267 contains a 4-bit CMR field.

214

Rather than transporting it in a separate octet, we squeeze it in the lower four

215

bits of the clast octet. In case there are multiple AMR codec frames with

216

different CMR, we only use the last CMR and ignore any previous CMR.

217

218

==== Additional considerations

219

220

* It can be assumed that all OSmux frames of type AMR Codec contain at least 1

221

AMR frame.

222

* Given a batch factor of N frames (N>1), it can not be assumed that the amount

223

of AMR frames in any OSmux frame will always be N, due to some restrictions

224

mentioned above. For instance, a sender can decide to send before queueing the

225

expected N frames due to timing issues, or to conform with the restriction

226

that the first AMR frame in the batch must be the first in the talkspurt

(Marker M bit).

=== Dummy (2)

This kind of frame is used for NAT traversal. If a peer is behind a NAT, its

233

source port specified in SDP will be a private port not accessible from the

234

outside. Before other peers are able to send any packet to it, they require the

235

mapping between the private and the public port to be set by the firewall,

236

otherwise the firewall will most probably drop the incoming messages or send it

237

to a wrong destination. The firewall in most cases won't create a mapping until

238

the peer behind the NAT sends a packet to the peer residing outside.

239

240

In this scenario, if the peer behind the nat is expecting to receive but never

241

transmit audio, no packets will ever reach him. To solve this, the peer sends

242

dummy packets to let the firewall create the port mapping. When the other peers

243

receive this dummy packet, they can infer the relation between the original

244

private port and the public port and start sending packets to it.

245

246

When opening a connection, the peer is expected to send dummy packets until it

247

starts sending real audio, at which point dummy packets are not needed anymore.

248

249

0 1 2 3

250

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

251

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

252

|X|FT | CTR |X X|X X X X X X X X X| Circuit ID |AMR FT |X X X X|

253

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

254

255

Field Type (FT): 2 bits::

256

The Field Type allocated for AMR codec is "2".

257

258

Frame Counter (CTR): 2 bits::

259

Provides the number of dummy batched AMR payloads (starting 0) after the header.

260

For instance, if there are 2 AMR payloads batched, CTR will be "1".

261

262

Circuit ID Code (CIC): 8 bits::

263

Identifies the Circuit (Voice call), which in RTP is identified by {srcip,

264

srcport, dstip, dstport, ssrc}.

265

266

AMR Codec Mode Request (AMR-FT): 4 bits::

267

This field must contain any valid value described in the AMR FT field (Frame

268

type index) in RFC3267 Section 4.3.2.

269

270

==== Additional considerations

271

272

* After the header, additional padding needs to be allocated to conform with CTR

273

and AMR FT fields. For instance, if CTR is 0 and AMR FT is AMR 6.9, a padding

274

of 17 bytes is to be allocated after the header.

275

276

* On receival of this kind of OSmux frame, it's usually enough for the reader to

277

discard the header plus the calculated padding and keep operating.

278

279

280

== Evaluation: Expected traffic savings

281

282

The following figure shows the traffic saving (in %) depending on the number

283

of concurrent numbers of callings (asumming trunking but no batching at all):

284

----

285

Traffic savings (%)

286

100 ++-------+-------+--------+--------+-------+--------+-------+-------++

287

+ + + + + + batch factor 1 **E*** +

| |

80 ++ ++

| |

| |

| ****E********E

60 ++ ****E*******E********E*** ++

| **E**** |

| **** |

40 ++ *E** ++

| ** |

| ** |

| ** |

20 ++ E ++

| |

+ + + + + + + + +

0 ++-------+-------+--------+--------+-------+--------+-------+-------++

0 1 2 3 4 5 6 7 8

Concurrent calls

----

The results shows a saving of 15.79% with only one concurrent call, that

309

quickly improves with more concurrent calls (due to trunking).

310

311

We also provide the expected results by batching 4 messages for a single call:

312

----

313

Traffic savings (%)

314

100 ++-------+-------+--------+--------+-------+--------+-------+-------++

315

+ + + + + + batch factor 4 **E*** +

| |

80 ++ ++

| |

| |

| ****E********E*******E********E*******E********E

60 ++ ****E**** ++

| E*** |

| |

40 ++ ++

| |

| |

| |

20 ++ ++

| |

+ + + + + + + + +

0 ++-------+-------+--------+--------+-------+--------+-------+-------++

0 1 2 3 4 5 6 7 8

Concurrent calls

----

The results show a saving of 56.68% with only one concurrent call. Trunking

337

slightly improves the situation with more concurrent calls.

338

339

We also provide the figure with batching factor of 8:

340

----

341

Traffic savings (%)

342

100 ++-------+-------+--------+--------+-------+--------+-------+-------++

343

+ + + + + + batch factor 8 **E*** +

| |

80 ++ ++

| |

| ****E*******E********E

348

| ****E********E********E*******E**** |

60 ++ E*** ++

| |

| |

40 ++ ++

| |

| |

| |

20 ++ ++

| |

+ + + + + + + + +

0 ++-------+-------+--------+--------+-------+--------+-------+-------++

0 1 2 3 4 5 6 7 8

Concurrent calls

----

That shows very little improvement with regards to batching 4 messages.

365

Still, we risk to degrade user experience. Thus, we consider a batching factor

366

of 3 and 4 is adecuate.

367

368

== Other proposed follow-up works

369

370

The following sections describe features that can be considered in the mid-run

371

to be included in the OSmux infrastructure. They will be considered for future

372

proposals as extensions to this work. Therefore, they are NOT included in

this proposal.

=== Encryption

Voice streams within OSmux can be encrypted in a similar manner to SRTP

378

(RFC3711). The only potential problem is the use of a reduced sequence number,

379

as it wraps in (20ms * 2^256 * B), i.e. 5.12s to 40.96s. However, as the

380

receiver knows at which rate the codec frames are generated at the sender, he

381

should be able to compute how much time has passed using his own timebase.

382

383

Another alternative can be the use of DTLS (RFC 6347) that can be used to

384

secure datagram traffic using TLS facilities (libraries like openssl and

385

gnutls already support this).

386

387

=== Multiple OSmux messages in one packet

388

389

In case there is already at least one active voice call, there will be

390

regular transmissions of voice codec frames. Depending on the batching

391

factor, they will be sent every 70ms to 140ms. The size even of a

392

batched (and/or trunked) codec message is still much lower than the MTU.

393

394

Thus, any signalling (related or unrelated to the call causing the codec

395

stream) can just be piggy-backed to the packets containing the voice

396

codec frames.