Pau Espin Pedrol | 7e0b2dd | 2020-03-10 11:46:39 +0100 | [diff] [blame] | 1 | == Troubleshooting |
| 2 | |
| 3 | === Format: YAML, and its Drawbacks |
| 4 | |
| 5 | The general configuration format used is YAML. The stock python YAML parser |
| 6 | does have several drawbacks: too many complex possibilities and alternative |
| 7 | ways of formatting a configuration, but at the time of writing seems to be the |
| 8 | only widely used configuration format that offers a simple and human readable |
| 9 | formatting as well as nested structuring. It is recommended to use only the |
| 10 | exact YAML subset seen in this manual in case the osmo-gsm-tester should move |
| 11 | to a less bloated parser in the future. |
| 12 | |
| 13 | Careful: if a configuration item consists of digits and starts with a zero, you |
| 14 | need to quote it, or it may be interpreted as an octal notation integer! Please |
| 15 | avoid using the octal notation on purpose, it is not provided intentionally. |
Pau Espin Pedrol | cc0ad7d | 2020-03-16 19:03:44 +0100 | [diff] [blame] | 16 | |
| 17 | === {app-name} not running but resources still allocated |
| 18 | |
| 19 | The <<state_dir,reserved_resources.state>> is used to keep shared state of the |
| 20 | the resources allocated by any {app-name} instance. Each {app-name} instance |
| 21 | being run is responsible to de-allocate the used resources before exiting. In |
| 22 | general, upon receiving a shutdown action (ie. 'CTRL+C', 'SIGINT', python |
| 23 | exception, etc.), {app-name} is able to handle properly the situation and |
| 24 | de-allocate the resources before the process exits. Similarly, {app-name} also |
| 25 | takes care of terminating all its children processes being managed before |
| 26 | exiting itself. |
| 27 | |
| 28 | However, under some circumstances, {app-name} will be unable to de-allocate the |
| 29 | resources and they will remain allocated for subsequent {app-name} instances |
| 30 | which try to use them. That situation is usually reached when someone terminates |
| 31 | {app-name} in a hard way. Main reasons are {app-name} process receiving a |
| 32 | 'SIGKILL' signal ('kill -9 $pid') which cannot be caught, or due to the entire |
| 33 | host being shut down in a non proper way. |
| 34 | |
| 35 | As a noticeable example, SIGKILL is known to be sent to {app-name} when it runs |
| 36 | under a jenkins shell script and any of the two following things happen: |
| 37 | |
| 38 | - User presses the red cross icon in the Jenkins UI to terminate the running |
| 39 | job. |
| 40 | - Connection between Jenkins master (UI) and Jenkins slave running the job is |
| 41 | lost. |
| 42 | |
| 43 | Once this situation is reached, one needs to follow 2 steps: |
| 44 | |
| 45 | - Gain console access to the <<install_main_unit,Main Unit>> and manually clean |
| 46 | or completely remove the 'reserved_resources.state' in the |
| 47 | <<state_dir,state_dir>>. In general it's a good idea to make sure no |
| 48 | {app-name} instance is running at all and then remove completely all files in |
| 49 | <<state_dir,state_dir>>, since {app-name} could theoretically have been killed |
| 50 | while writing some file and it may have ended up with corrupt content. |
| 51 | - Gain console access to the <<install_main_unit,Main Unit>> and each of the |
| 52 | <<install_slave_unit,Slave Units>> and kill any hanging long-termed processes |
| 53 | in there which may have been started by {app-name}. Some popular processes in |
| 54 | this list include 'tcpdump', 'osmo-\*', 'srs*', etc. |