OOM Killing Redis on Large Scan with OpenVAS

mgjk · March 15, 2023, 11:50am

I have a large, sparsely populated scan environment. One particular scan is of a /21 range, hitting 187 hosts. The scan is running out of memory. The Linux oom killer is killing Redis.

The scan is limited to 10 simultaneous hosts, with 3 consecutive nvts per host.

Currently our OpenVAS environment is a bit old, I need to have the team upgrade the software, presently 21.4.1

The machine is presently running on 16 cores and 24GB of RAM.

How do I tune this? I would rather not have to break up the scan into smaller scans, as distribution of hosts in the subnets is not even, so I can’t just use /24s, I would have to dig and analyze and fiddle with lists of hosts, losing broad coverage of the “unknowns” on the network.

Why is Redis using so much RAM? Is there something that can be further tuned?

Martin · March 15, 2023, 4:07pm

Hello mgjk, and welcome to the Greenbone community!

The problem you describe is not easy to solve as it can have several root causes, from known issues to usage behaviour. In particular, there can be problems with vHosts and CGI caching.

In general, we recommend the following:

Prevent overloading the system by adjusting the usage:
- Do not start scan tasks all at once, use schedules to start them at intervals
- Reconfigure scan targets to include less hosts, split the hosts into more targets and tasks instead
- Do not run or schedule feed updates for times where scan tasks are running or scheduled to run
- Do not view or download large reports while scan tasks are running
Disable vHost expansion for scans that cause problems:
- Clone and edit the used scan config
- Set the scanner preference expand_vhosts to 0 and save the change
Disable CGI caching for scans that cause problems:
- Clone and edit the used scan config
- Browse to the VT family Settings
- Edit the VT Global Variable Settings (OID: 1.3.6.1.4.1.25623.1.0.12288)
- Set the preference Disable caching of web pages during CGI scanning to Yes and save the change

Last but not least, if you think that you can narrow the problem down to a specific host and/or vulnerability test, please either open an issue for the scanner at https://github.com/greenbone/openvas-scanner/issues or the vulnerability test at Vulnerability Tests - Greenbone Community Forum!

mgjk · March 20, 2023, 11:16am

Thank you so much for the tips. Is there a recommended way to increasing log verbosity? to e…g, figure out exactly which plugins launched when resource usage started to get excessive?

What in redis would be using so much memory? Is that what the vHost expansion and redis caching suggestion is trying to address?

mgjk · August 28, 2023, 10:27am

I’m continuing to troubleshoot this, and I think I’ve identified something odd in our environment.

The software on many of our production servers respond to arbitrary endpoints with HTTP200 and API usage information. This means we’re seeing a lot of traffic from the servers as every URL response with many KB of info.

I don’t know if Redis stores this information, because if so, the responses would be enormous and could explain why we exhaust 22GB+ of RAM when scanning.

Still, any tips would be helpful. I’ve resorted to tcpdump and ps auxww in analysis as the logs are not giving me anything to work with and redis traces are too large to work with.

Lukas · August 29, 2023, 11:54am

Why don´t you go with a professional setup ?
You seems to use this enormous setup for a corporate use and depend on voluntary support here …
Your software is end of live and end of support, with a Greenbone Appliance you will get migration support to a supported version as well help with complex issues.

mgjk · August 30, 2023, 3:46pm

Hello Lukas,

This is a community forum, I’m here to

try to find answers to my problems
share those answers and solutions

I don’t think this is being dependant on a community, but it’s being part of one.

We’re considering the commercial solution for enhanced signatures and more up-to-date software, but if we can’t resolve stability issues, we’ll have to compare other scanner software as an alternative.

As for the hugeness of our environment, we’ve narrowed it down to seeing OOM with a scan of a /24, on a scanner with 24GB of RAM. Whether we run it on 21.4 or 22.5 we’re seeing the same issue.

For the most part though the software works… we just seem to have a problematic subnet.

We’ll figure it out eventually.

Martin · August 31, 2023, 9:53am

Unfortunately it is not possible to determine if and how this causes problems with OpenVAS or redis based on this information alone. You can create an issue at Issues · greenbone/openvas-scanner · GitHub, maybe you can exchange the required information with our developers there.

We cannot give any guarantee that this is solveable with the current generation of OpenVAS, however. But we are currently working on the next generation which should hopefully address such problems.

For further performance tuning you can also check the man page of openvas regarding the configuration file options min_free_mem, max_hosts, max_checks, and the man page of ospd-openvas for the configuration file options min_free_mem_scan_queue, max_scans, max_queued_scans and scaninfo_store_time.

Last but not least, with our commercial solution it may also be possible to distribute the scan load by employing sensor appliances.

cfi · September 4, 2023, 1:41pm

In such cases the Redis KB might be indeed include a lot “cached” HTTP responses by default. Info how to disable this caching for a specific task / target has been given previously by @Martin via:

Note that disabling the caching of web pages might:

slow down the scan itself
put additional load on the target host

as the internal cache isn’t used anymore and additional TCP / HTTP repeated requests to the target is done.

mgjk · September 11, 2023, 3:05pm

Thanks for the tips. I don’t think the CGI caching is making much of a difference though. The page coming back from the servers is a statistics page, so every hit is unique and will always be a cache miss.

Removing the port used by the statistics page and splitting the scan into 8 separate scans chained together with alerts has allowed me to get a full scan result (less the bad service)

Still, I shouldn’t have needed to split up the scans like this as although it’s a big network, it’s a very sparse network. It’s difficult to determine which hosts, ports, or signatures are causing problems.

Earlier it was suggested that I use the commercial version. If it has tools to troubleshoot scan performance on a per-signature, per-host way, it might well be worth it.

mgjk · September 11, 2023, 3:50pm

I appreciate the advice and all the help here.

I’ll dig through those docs and try to narrow down the cause. From my other posts on this topic, I think I’m getting closer to figuring it out.

scaninfo_store_time doesn’t show in the openvas man page, only in the ospd-openvas page. It seems to affect the retention of scans. I’ll have a look at the scan and memory limits, maybe I can get from being OOM killed.

When time permits, I’m going to try out log_whole_attack.

Thanks again everyone,

cfi · September 12, 2023, 6:32am

The NASL based “caching” shouldn’t be mixed with a caching mechanism like used in e.g. a browser. If a VT is e.g. calling:

res1 = http_get_cache( item:"/", port:port );
res2 = http_get_cache( item:"/foo", port:port );

both responses will be cached (means two entries in the Redis KB with the “full response” will be created) no matter if the responses are the same or not.

And for such special environments like mentioned previously:

disabling this mechanism like mentioned above should decrease the Redis usage (means disabling the “caching” of HTTP pages in Redis) and thus lower the RAM usage during scans.

warnaud · September 18, 2023, 6:56am

I had the same issue, I reduced the amount of simultaneous hosts scanned and it ran fine. I have 90 hosts that I was scanning 20 simultaneously on a 16 core VM, added RAM and swap first but when I reduce the number to 10 simultaneous hosts it passed. The task ran fine with 20 simultaneous for ~2 years.

bricks · September 18, 2023, 11:52am

Maybe this message can help also?

kayapo · September 19, 2023, 12:00pm

Deeper investigation is needed but i think this should be usefull for you