blob: c7c6868c6b5dd53f2da14e4cd7206adeeff1e8c8 [file] [log] [blame]
Pau Espin Pedrol7e0b2dd2020-03-10 11:46:39 +01001== Troubleshooting
2
3=== Format: YAML, and its Drawbacks
4
5The general configuration format used is YAML. The stock python YAML parser
6does have several drawbacks: too many complex possibilities and alternative
7ways of formatting a configuration, but at the time of writing seems to be the
8only widely used configuration format that offers a simple and human readable
9formatting as well as nested structuring. It is recommended to use only the
10exact YAML subset seen in this manual in case the osmo-gsm-tester should move
11to a less bloated parser in the future.
12
13Careful: if a configuration item consists of digits and starts with a zero, you
14need to quote it, or it may be interpreted as an octal notation integer! Please
15avoid using the octal notation on purpose, it is not provided intentionally.
Pau Espin Pedrolcc0ad7d2020-03-16 19:03:44 +010016
17=== {app-name} not running but resources still allocated
18
19The <<state_dir,reserved_resources.state>> is used to keep shared state of the
20the resources allocated by any {app-name} instance. Each {app-name} instance
21being run is responsible to de-allocate the used resources before exiting. In
22general, upon receiving a shutdown action (ie. 'CTRL+C', 'SIGINT', python
23exception, etc.), {app-name} is able to handle properly the situation and
24de-allocate the resources before the process exits. Similarly, {app-name} also
25takes care of terminating all its children processes being managed before
26exiting itself.
27
28However, under some circumstances, {app-name} will be unable to de-allocate the
29resources and they will remain allocated for subsequent {app-name} instances
30which try to use them. That situation is usually reached when someone terminates
31{app-name} in a hard way. Main reasons are {app-name} process receiving a
32'SIGKILL' signal ('kill -9 $pid') which cannot be caught, or due to the entire
33host being shut down in a non proper way.
34
35As a noticeable example, SIGKILL is known to be sent to {app-name} when it runs
36under a jenkins shell script and any of the two following things happen:
37
38- User presses the red cross icon in the Jenkins UI to terminate the running
39 job.
40- Connection between Jenkins master (UI) and Jenkins slave running the job is
41 lost.
42
43Once this situation is reached, one needs to follow 2 steps:
44
45- Gain console access to the <<install_main_unit,Main Unit>> and manually clean
46 or completely remove the 'reserved_resources.state' in the
47 <<state_dir,state_dir>>. In general it's a good idea to make sure no
48 {app-name} instance is running at all and then remove completely all files in
49 <<state_dir,state_dir>>, since {app-name} could theoretically have been killed
50 while writing some file and it may have ended up with corrupt content.
51- Gain console access to the <<install_main_unit,Main Unit>> and each of the
52 <<install_slave_unit,Slave Units>> and kill any hanging long-termed processes
53 in there which may have been started by {app-name}. Some popular processes in
54 this list include 'tcpdump', 'osmo-\*', 'srs*', etc.