RuntimeError: can't start new thread / Exception in thread

PBSH · February 9, 2024, 9:40am

I’ve got multiple scanners deployed with the Greenbone community container.
Since a while I’m experiencing problems (which I think are related, and) which cause scans to fail.

The ospd-openvas (version 22.6.1) container reports these errors:

Traceback (most recent call last):
Exception occurred during processing of request from 
----------------------------------------
RuntimeError: can't start new thread
   _start_new_thread(self._bootstrap, ())
 File "/usr/lib/python3.11/threading.py", line 957, in start
    t.start()
  File "/usr/lib/python3.11/socketserver.py", line 705, in process_request
    self.process_request(request, client_address)
 File "/usr/lib/python3.11/socketserver.py", line 317, in _handle_request_noblock

and

Exception in thread Thread-2 (accepter):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.11/multiprocessing/managers.py", line 194, in accepter
    t.start()
  File "/usr/lib/python3.11/threading.py", line 957, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
  File "/usr/lib/python3.11/multiprocessing/managers.py", line 814, in _callmethod
    conn = self._tls.connection
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
 - 
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/ospd/ospd.py", line 546, in handle_client_stream
    self.handle_command(data, stream)
  File "/usr/local/lib/python3.11/dist-packages/ospd/ospd.py", line 1065, in handle_command
    response = command.handle_xml(tree)
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ospd/command/command.py", line 487, in handle_xml
    self._daemon.check_scan_process(scan_id)
  File "/usr/local/lib/python3.11/dist-packages/ospd/ospd.py", line 1289, in check_scan_process
    status = self.get_scan_status(scan_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ospd/ospd.py", line 723, in get_scan_status
    status = self.scan_collection.get_status(scan_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/ospd/scan.py", line 352, in get_status
    status = self.scans_table.get(scan_id, {}).get('status', None)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 2, in get
  File "/usr/lib/python3.11/multiprocessing/managers.py", line 818, in _callmethod
    self._connect()
  File "/usr/lib/python3.11/multiprocessing/managers.py", line 805, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 507, in Client
    answer_challenge(c, authkey)
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 751, in answer_challenge
    message = connection.recv_bytes(256)         # reject large message
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 215, in recv_bytes
    buf = self._recv_bytes(maxlength)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
    raise EOFError
EOFError

OSP also reports: (400) Fatal error

I’ve checked the system cpu thread count and limit, but there’s no significant increase or limit reached as it looks.

I’ve searched the internet and found threads about potential issues with the Python version in combination with libseccomp, where users reported they did not have issues with threads (in other software) when they downgraded libseccomp or python, but I couldn’t relate this to Greenbone’s setup.

Does anyone experience this issue or know where to look?

JosephL · February 9, 2024, 10:19am

Can you share more details about your setup, how did you install Greenbone, and what exactly is the use-case you are trying to achieve.

PBSH · February 9, 2024, 10:45am

Sure! I’ve used the docker-compose file from Greenbone as base and I talk directly to the OSPD socket with XML commands according to the ospd documentation (I don’t use gvmd). I use the images compiled by Greenbone.

I’m trying to run vulnerability scans for a /24 and in general this goes well. This issue occurs randomly after the scanner has run successfully for many days. For example: the issue could occur after 21 days of running scans successfully and suddenly give this error. The solution is to reboot the setup (including the host).

Host has 2 vCPU cores and 4 GB RAM. Issue is experienced on both Hyper-V and VMware hypervisor.

The issue occurs after a while. The scan may be running and have a progress of 20% and suddenly experience this issue. Not sure if this is related: nasl_pread: Failed to fork child process (Resource temporarily unavailable)

rippledj · February 9, 2024, 2:27pm

I would assume that the culprit in this case would be resource exhaustion. Perhaps some cache is not being cleared which eventually builds up and causes the issue or some threads are stuck and not exiting properly. The error itself can be caused by reaching a set limit for threads, or from resource exhausion. It’s interesting that you need to reboot the host as well.

Maybe you can use top or count of ps -eLf or a debugger to track how many theads are running and RAM usage. Maybe there is some information in syslog. Otherwise, I guess this is not an issue with ospd-openvas itself. Feel free to share the code or commands you are using for this continuous scanning operation and perhaps someone can notice an issue.

cfi · February 9, 2024, 4:02pm

4 GB of RAM is also the absolute minimum and higher / more RAM is suggested:

Documentation might get updated in the future:

PBSH · February 13, 2024, 10:32am

Thank you for your answers @rippledj and @cfi

I’ve increased the RAM and CPU and will monitor if this resolves the issue.

While I understand this may solve the problem, I’ve got a question about the behavior.
The documentation states 2 vCPU and 4 GB RAM as minimum (while this may get increased in the future).

I would expect the scanner to limit its speed to not exceed these resources, and if it does, throw a clear error message that it exceeded its resources. I understand it may not be a feature now, but I’m curious how you look at this. (it’s no attack ;)). I’m willing to put some effort in this if possible to make it a bit more stable.

Martin · February 13, 2024, 11:01am

We have several mitigations on the scanner side. One of these is scan queueing. If you start a lot of scans, you should see corresponding info messages that not enough RAM is available and that scans are queued. If you want to get technical, you can adjust the limits for the scan queue in /etc/ospd/ospd.conf.

Related, there exist other scanner limits that you can tweak, see min_free_mem and max_sysload from the scanner preferences, alternatively these can be set in /etc/openvas/openvas.conf.

There are some remaining problems, for example the vHost and CGI caching described in my post here. These could be classified as bugs, and we are currently working on a new scanner generation to solve these.

Last but not least, other system services besides the scanner may use large amounts of system resources, for example gvmd. This can happen at short notice, giving the scanner little time to react. We are aware of this, however improvements are not trivial in this case.

In any case, if you want to help out, feel free to contact our developers via our GitHub projects!

rippledj · February 14, 2024, 5:55am

Also, you can set the max concurrent hosts and VTs per host tasks from the task dialog.