I have a large, sparsely populated scan environment. One particular scan is of a /21 range, hitting 187 hosts. The scan is running out of memory. The Linux oom killer is killing Redis.
The scan is limited to 10 simultaneous hosts, with 3 consecutive nvts per host.
Currently our OpenVAS environment is a bit old, I need to have the team upgrade the software, presently 21.4.1
The machine is presently running on 16 cores and 24GB of RAM.
How do I tune this? I would rather not have to break up the scan into smaller scans, as distribution of hosts in the subnets is not even, so I can’t just use /24s, I would have to dig and analyze and fiddle with lists of hosts, losing broad coverage of the “unknowns” on the network.
Why is Redis using so much RAM? Is there something that can be further tuned?
I’m continuing to troubleshoot this, and I think I’ve identified something odd in our environment.
The software on many of our production servers respond to arbitrary endpoints with HTTP200 and API usage information. This means we’re seeing a lot of traffic from the servers as every URL response with many KB of info.
I don’t know if Redis stores this information, because if so, the responses would be enormous and could explain why we exhaust 22GB+ of RAM when scanning.
Still, any tips would be helpful. I’ve resorted to tcpdump and ps auxww in analysis as the logs are not giving me anything to work with and redis traces are too large to work with.
Why don´t you go with a professional setup ?
You seems to use this enormous setup for a corporate use and depend on voluntary support here …
Your software is end of live and end of support, with a Greenbone Appliance you will get migration support to a supported version as well help with complex issues.
I don’t think this is being dependant on a community, but it’s being part of one.
We’re considering the commercial solution for enhanced signatures and more up-to-date software, but if we can’t resolve stability issues, we’ll have to compare other scanner software as an alternative.
As for the hugeness of our environment, we’ve narrowed it down to seeing OOM with a scan of a /24, on a scanner with 24GB of RAM. Whether we run it on 21.4 or 22.5 we’re seeing the same issue.
For the most part though the software works… we just seem to have a problematic subnet.
Unfortunately it is not possible to determine if and how this causes problems with OpenVAS or redis based on this information alone. You can create an issue at Issues · greenbone/openvas-scanner · GitHub, maybe you can exchange the required information with our developers there.
We cannot give any guarantee that this is solveable with the current generation of OpenVAS, however. But we are currently working on the next generation which should hopefully address such problems.
For further performance tuning you can also check the man page of openvas regarding the configuration file options min_free_mem, max_hosts, max_checks, and the man page of ospd-openvas for the configuration file options min_free_mem_scan_queue, max_scans, max_queued_scans and scaninfo_store_time.
Last but not least, with our commercial solution it may also be possible to distribute the scan load by employing sensor appliances.
Thanks for the tips. I don’t think the CGI caching is making much of a difference though. The page coming back from the servers is a statistics page, so every hit is unique and will always be a cache miss.
Removing the port used by the statistics page and splitting the scan into 8 separate scans chained together with alerts has allowed me to get a full scan result (less the bad service)
Still, I shouldn’t have needed to split up the scans like this as although it’s a big network, it’s a very sparse network. It’s difficult to determine which hosts, ports, or signatures are causing problems.
Earlier it was suggested that I use the commercial version. If it has tools to troubleshoot scan performance on a per-signature, per-host way, it might well be worth it.
I’ll dig through those docs and try to narrow down the cause. From my other posts on this topic, I think I’m getting closer to figuring it out.
scaninfo_store_time doesn’t show in the openvas man page, only in the ospd-openvas page. It seems to affect the retention of scans. I’ll have a look at the scan and memory limits, maybe I can get from being OOM killed.
When time permits, I’m going to try out log_whole_attack.
I had the same issue, I reduced the amount of simultaneous hosts scanned and it ran fine. I have 90 hosts that I was scanning 20 simultaneously on a 16 core VM, added RAM and swap first but when I reduce the number to 10 simultaneous hosts it passed. The task ran fine with 20 simultaneous for ~2 years.