core/conv: implement optimized Viterbi decoder

Add a separate, faster convolution decoding implementation for rates
up to N=4 and constraint lengths of K=5 and K=7, which covers the
most GSM code uses. The decoding algorithm exploits the symmetric
structure of the Viterbi add-compare-select (ACS) operation - commonly
known as the ACS butterfly. This shift-register optimization can be
found in the well-known text by Dave Forney.

Forney, G.D., "The Viterbi Algorithm," Proc. of the IEEE, March 1973.

Implementation is non-architecture specific and improves performance on
x86 as well as ARM processors. Existing API is unchanged with optimized
code being called internally for supported codes.

The original code was relicensed under GPLv2-or-later with permission
of copyright holder - Tom Tsou.

Change-Id: I74d355274b4176a7d924f91ef3c96912ce338fb2
diff --git a/src/conv.c b/src/conv.c
index f13deef..79b3a7c 100644
--- a/src/conv.c
+++ b/src/conv.c
@@ -238,6 +238,11 @@
 
 #define MAX_AE 0x00ffffff
 
+/* Forward declaration for accerlated decoding with certain codes */
+int
+osmo_conv_decode_acc(const struct osmo_conv_code *code,
+                     const sbit_t *input, ubit_t *output);
+
 void
 osmo_conv_decode_init(struct osmo_conv_decoder *decoder,
                       const struct osmo_conv_code *code, int len, int start_state)
@@ -606,6 +611,10 @@
 	struct osmo_conv_decoder decoder;
 	int rv, l;
 
+	/* Use accelerated implementation for supported codes */
+	if ((code->N <= 4) && ((code->K == 5) || (code->K == 7)))
+		return osmo_conv_decode_acc(code, input, output);
+
 	osmo_conv_decode_init(&decoder, code, 0, 0);
 
 	if (code->term == CONV_TERM_TAIL_BITING) {