Openvasmd hangs ocassionally at 100% cpu when resuming a stopped task

I observed this behavior after upgrading from OpenVAS8 to OpenVAS9 (OpenVAS Manager 7.0.3).

Resuming of stopped tasks usually works as intended but sometimes it hangs at 100% cpu for hours.
I examined a few such hangs in gdb and their backtraces look similar:

#0  0x00007fb408c34b42 in do_fcntl (fd=8, cmd=6, arg=0x7ffe43cc1160) at ../sysdeps/unix/sysv/linux/fcntl.c:39
#1  0x00007fb408c34c19 in __libc_fcntl (fd=<optimized out>, cmd=<optimized out>) at ../sysdeps/unix/sysv/linux/fcntl.c:88
#2  0x00007fb4086782ad in ?? () from /usr/lib/x86_64-linux-gnu/
#3  0x00007fb408678436 in ?? () from /usr/lib/x86_64-linux-gnu/
#4  0x00007fb40869c522 in ?? () from /usr/lib/x86_64-linux-gnu/
#5  0x00007fb4086db49c in ?? () from /usr/lib/x86_64-linux-gnu/
#6  0x00007fb4086e1ec7 in sqlite3_step () from /usr/lib/x86_64-linux-gnu/
#7  0x00000000004e71d9 in sql_exec_internal (retry=retry@entry=1, stmt=0xb21c000)
#8  0x000000000047effc in sqlv (retry=retry@entry=1, sql=sql@entry=0x572000 "DELETE FROM report_counts WHERE report = %llu   AND \"user\" = %llu   AND override = %d   AND min_qod = %d", 
#9  0x000000000047f178 in sql (sql=sql@entry=0x572000 "DELETE FROM report_counts WHERE report = %llu   AND \"user\" = %llu   AND override = %d   AND min_qod = %d")
#10 0x00000000004b184d in report_cache_counts (report=232, clear_original=1, clear_overridden=1, users_where=<optimized out>)
#11 0x00000000004b1ae6 in trim_partial_report (report=<optimized out>)
#12 0x0000000000472813 in run_task_prepare_report (task=2, report_id=0x7ffe43cc1e20, from=<optimized out>, run_status=<optimized out>, last_stopped_report=0x7ffe43cc18b0)
#13 0x0000000000477447 in run_otp_task (task=2, scanner=1, from=1, report_id=0x7ffe43cc1e20)
#14 0x000000000047886c in run_task (task_id=0x8 <error: Cannot access memory at address 0x8>, task_id@entry=0xb230c50 "7bce3e90-fff9-4c81-b78b-04eac24c931a", report_id=0x7ffe43cc1e20, from=1)
#15 0x000000000047938a in resume_task (task_id=0xb230c50 "7bce3e90-fff9-4c81-b78b-04eac24c931a", report_id=0x7ffe43cc1e20)
#16 0x0000000000444c30 in omp_xml_handle_end_element (context=0x8, element_name=0x3affb80 <command_data> "P\f#\v", user_data=0xb2189d0, error=0xffffffffffffffff)
#17 0x00007fb40839a2a7 in ?? () from /lib/x86_64-linux-gnu/
#18 0x00007fb40839b105 in g_markup_parse_context_parse () from /lib/x86_64-linux-gnu/
#19 0x000000000046e881 in process_omp_client_input ()

I stepped a little bit in gdb and it seems the execution never returns from sql_exec_internal() as it keeps calling in a loop (sqlite3_step () always returns SQLITE_BUSY). At the same time I can usually resume other tasks without problem (that I guess call the same function).

The current used category is/was (Description: About the Community Edition (GCE) category) which is about the downloadable ready-to use virtual machine.

Based on your posted issue it sees you have an own installation either build from source or installed via 3rdparty repositories. For such installations the (Description: About the Source Edition (GSE) category) needs to be chosen.

I have moved this question to the correct category for now.

1 Like

Have you checked for any database locks ? If SQLITE Busy returns it might point you to your issue.

Not sure what would be the best way to test it?
Check processes that have file descriptor open on tasks.db and try to debug these?

Maybe use PSQL and not SQLite there is a full stack and howtos how to debug PSQL.