"Couldn't fork OSP scan handler" error upgrading to 22.4

Hi folks,

I’m late to upgrading from 21.4 to 22.4 (built as per Building 22.4 from Source - Greenbone Community Documentation)

Old in current production:

Greenbone Vulnerability Manager 21.4.5
Manager DB revision 242

New in test:

Greenbone Vulnerability Manager 22.4.2
Manager DB revision 250

We use the python-gvm GMP client (https://python-gvm.readthedocs.io/en/latest/) to control our scans.

Testing against the new 22.4 server we have several still-born status = New tasks. The gvmd.log shows this error:

md manage:WARNING:2023-05-01 09h27.26 UTC:117652: Couldn't fork OSP scan handler
event task:MESSAGE:2023-05-01 09h27.26 UTC:117652: Task 2023-05-01 09:27:24.175111 - 7506 - 3.3.3.3 - Discovery (d3178774-99ef-4eb9-9c33-9d6b68e0ab0b) could not be started by admin

A bit about our GMP client processes

Setting up and starting a task involves several consecutive requests. Processing the results of a completed task can involve dozens of requests.

We don’t create a new connection and authenticate for every individual request. We reuse a connection over bursts of requests and then disconnect when idle. This idle disconnect is ~6s.

Each worker process handles one task at a time. When a task X completes a worker may immediately pick up start processing a task Y.

Our client code is separated from the connection code and the worker isn’t aware of when it’s reusing a connection vs creating a new one.

My naive debugging

Without trying to understand the code I put some extra debugging into manage.c and manage_sql.c. The problem appeared to be in manage_sql.c create_current_report(…)

int
create_current_report (task_t task, char **report_id, task_status_t status)
{
  char *id;

  g_warning("create_current_report() 1, global_current_report: %lld", global_current_report);

  assert (global_current_report == (report_t) 0);

  if (global_current_report) {
    g_warning("create_current_report() 2, have global_current_report so returning -1");
    return -1;
  }

  if (report_id == NULL) {
    g_warning("create_current_report() 3, report_id == NULL, setting report_id");
    report_id = &id;
  }

  /* Generate report UUID. */
  g_warning("create_current_report() 4, creating UUID");

  *report_id = gvm_uuid_make ();

  g_warning("create_current_report() 5, UUID report_id: %s", *report_id);

  if (*report_id == NULL) {
    g_warning("create_current_report() 6, no report_id, returning -1");
    return -2;
  }

  /* Create the report. */
  g_warning("create_current_report() 7, creating report");
  global_current_report = make_report (task, *report_id, status);

  g_warning("create_current_report() 8, new global_current_report: %lld", global_current_report);

  set_report_scheduled (global_current_report);

  g_warning("create_current_report() 9, report scheduled, all good");

  return 0;
}

Debugging results

Below are interleaved excerpts from our python-gvm worker log (CLIENT), and the 22.4.2 gvmd.log (GVMD).

Client worker process [732] creates, starts and processes a CVE task. In create_current_report(…) global_current_report is set to 24.

/* Create the report. */
global_current_report = make_report (task, *report_id, status);  // Is set to 24
GVMD   event task:MESSAGE:2023-05-01 09h15.30 UTC:28171: Status of task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - OpenVAS (9be77392-d252-4ab5-a5b6-9d0a4c1f0217) has changed to Done
CLIENT 2023-05-01 09:15:47,311 [732] INFO: OpenVasService.py (212): [a.com / 1.1.1.1] Creating CVE task on target: 0ef9a3b3-5b89-4fa9-a378-04a497f42f2f
GVMD   event task:MESSAGE:2023-05-01 09h15.47 UTC:117652: Status of task  (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has changed to New
GVMD   event task:MESSAGE:2023-05-01 09h15.47 UTC:117652: Task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - CVE (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has been created by admin
CLIENT 2023-05-01 09:15:47,568 [732] INFO: OpenVasService.py (224): [a.com / 1.1.1.1] Starting task: c176bae0-4bba-4afb-a2a9-dde82bf33da7
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 1, global_current_report: 0
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 4, creating UUID
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 5, UUID report_id: 2f7493e2-4081-4ca2-9177-31a45ab80fda
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 7, creating report
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 8, new global_current_report: 24
GVMD   md manage:WARNING:2023-05-01 09h15.48 UTC:117652: create_current_report() 9, report scheduled, all good
GVMD   event task:MESSAGE:2023-05-01 09h15.48 UTC:117652: Status of task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - CVE (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has changed to Requested
GVMD   event task:MESSAGE:2023-05-01 09h15.48 UTC:117652: Task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - CVE (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has been requested to start by admin
GVMD   event task:MESSAGE:2023-05-01 09h15.48 UTC:117689: Status of task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - CVE (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has changed to Running
GVMD   event task:MESSAGE:2023-05-01 09h15.48 UTC:117689: Status of task 2023-05-01 08:03:36.663820 - 7507 - 1.1.1.1 - CVE (c176bae0-4bba-4afb-a2a9-dde82bf33da7) has changed to Done
CLIENT 2023-05-01 09:15:49,064 [732] INFO: InProcessProcessTarget.py (41): [OpenVasModule, DomainWithIpTarget(a.com, 1.1.1.1)] Completed with result: data

Client worker process [906] creates, starts and processes another CVE task. global_current_report is set to 25.

GVMD   event task:MESSAGE:2023-05-01 09h24.24 UTC:34175: Status of task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - OpenVAS (58d835dd-1ac6-4580-b5eb-70c5d891021e) has changed to Done
CLIENT 2023-05-01 09:24:55,232 [906] INFO: OpenVasService.py (212): [b.com / 2.2.2.2] Creating CVE task on target: 0bb36bb5-d6d5-4386-adc9-534eac32afbc
GVMD   event task:MESSAGE:2023-05-01 09h24.55 UTC:129123: Status of task  (4a7ed1e0-8415-4214-b62f-85a4317b614f) has changed to New
GVMD   event task:MESSAGE:2023-05-01 09h24.55 UTC:129123: Task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - CVE (4a7ed1e0-8415-4214-b62f-85a4317b614f) has been created by admin
CLIENT 2023-05-01 09:24:55,618 [906] INFO: OpenVasService.py (224): [b.com / 2.2.2.2] Starting task: 4a7ed1e0-8415-4214-b62f-85a4317b614f
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 1, global_current_report: 0
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 4, creating UUID
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 5, UUID report_id: af6f8c38-26cc-4451-b57a-bb6fac77c53b
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 7, creating report
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 8, new global_current_report: 25
GVMD   md manage:WARNING:2023-05-01 09h24.56 UTC:129123: create_current_report() 9, report scheduled, all good
GVMD   event task:MESSAGE:2023-05-01 09h24.56 UTC:129123: Status of task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - CVE (4a7ed1e0-8415-4214-b62f-85a4317b614f) has changed to Requested
GVMD   event task:MESSAGE:2023-05-01 09h24.56 UTC:129123: Task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - CVE (4a7ed1e0-8415-4214-b62f-85a4317b614f) has been requested to start by admin
GVMD   event task:MESSAGE:2023-05-01 09h24.56 UTC:129137: Status of task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - CVE (4a7ed1e0-8415-4214-b62f-85a4317b614f) has changed to Running
GVMD   event task:MESSAGE:2023-05-01 09h24.56 UTC:129137: Status of task 2023-05-01 08:03:36.690061 - 7507 - 2.2.2.2 - CVE (4a7ed1e0-8415-4214-b62f-85a4317b614f) has changed to Done
CLIENT 2023-05-01 09:24:57,112 [906] INFO: InProcessProcessTarget.py (41): [OpenVasModule, DomainWithIpTarget(b.com, 2.2.2.2)] Completed with result: data

The original worker [732] creates and tries to start an OpenVAS Discovery scan. This is > 10 mins after the last task this client process worked on, so it should be a new connection. But I don’t know… maybe consider that it’s somehow the same one?

This time create_current_report(…) fails because global_current_report is still set to 24!

if (global_current_report) return -1;  // Is still 24?!
CLIENT 2023-05-01 09:27:24,175 [732] INFO: PortListCache.py (42): [c.com / 3.3.3.3] Get or create port list: T:25,465,587
GVMD   event port_list:MESSAGE:2023-05-01 09h27.24 UTC:117652: Port List T:25,465,587 (eeb67496-d135-4560-819e-4d10a7ed0146) has been created by admin
CLIENT 2023-05-01 09:27:24,182 [732] INFO: OpenVasService.py (198): [c.com / 3.3.3.3] Creating target 3.3.3.3 with ports: eeb67496-d135-4560-819e-4d10a7ed0146
GVMD   event target:MESSAGE:2023-05-01 09h27.24 UTC:117652: Target 2023-05-01 09:27:24.175111 - 7506 - 3.3.3.3 (8765541c-0923-4ae2-8140-53735c8ccad3) has been created by admin
CLIENT 2023-05-01 09:27:25,187 [732] INFO: OpenVasService.py (212): [c.com / 3.3.3.3] Creating Discovery task on target: 8765541c-0923-4ae2-8140-53735c8ccad3
GVMD   event task:MESSAGE:2023-05-01 09h27.25 UTC:117652: Status of task  (d3178774-99ef-4eb9-9c33-9d6b68e0ab0b) has changed to New
GVMD   event task:MESSAGE:2023-05-01 09h27.25 UTC:117652: Task 2023-05-01 09:27:24.175111 - 7506 - 3.3.3.3 - Discovery (d3178774-99ef-4eb9-9c33-9d6b68e0ab0b) has been created by admin
CLIENT 2023-05-01 09:27:25,705 [732] INFO: OpenVasService.py (224): [c.com / 3.3.3.3] Starting task: d3178774-99ef-4eb9-9c33-9d6b68e0ab0b
GVMD   md manage:WARNING:2023-05-01 09h27.26 UTC:117652: run_osp_scan_get_report() calling create_current_report(), from: 0, resume_report: 0
GVMD   md manage:WARNING:2023-05-01 09h27.26 UTC:117652: create_current_report() 1, global_current_report: 24
GVMD   md manage:WARNING:2023-05-01 09h27.26 UTC:117652: create_current_report() 2, have global_current_report so returning -1
GVMD   md manage:WARNING:2023-05-01 09h27.26 UTC:117652: run_osp_scan_get_report() create report failed, returning -1
GVMD   md manage:WARNING:2023-05-01 09h27.26 UTC:117652: Couldn't fork OSP scan handler
GVMD   event task:MESSAGE:2023-05-01 09h27.26 UTC:117652: Task 2023-05-01 09:27:24.175111 - 7506 - 3.3.3.3 - Discovery (d3178774-99ef-4eb9-9c33-9d6b68e0ab0b) could not be started by admin
CLIENT 2023-05-01 09:27:26,184 [732] ERROR: OpenVasService.py (226): [c.com / 3.3.3.3] Failed to start task: GvmServerError Internal error, 500
CLIENT 2023-05-01 09:27:26,184 [732] INFO: InProcessProcessTarget.py (41): [OpenVasModule, DomainWithIpTarget(c.com, 3.3.3.3)] Completed with result: error type ErrorType.INTERNAL

I’m going to do a bit more digging but thought now was the time to get in touch with you. I admit I haven’t tried to understand how global_current_report is used.

Do you have any insights or suggestions?

Note the we have the same client code running and working without problem against the older 21.4 GVM.

Does this look like a bug in the GVM to have a “stale” global_current_report?

Does this look like a problem with how I’m using the GMP?

Thank you very much for your help!

After putting this aside for a while I’ve come back to it. Below is my current thinking.

  • The gvmd processes are long-lived on the server. I haven’t looked at the code that spawns/manages these but assume this is working as intended. I.e. if a client disconnects the server process remains and is available for handling future client requests.
  • My “Couldn’t fork OSP scan handler” errors occur on gvmd processes that previously handled a “CVE scanner” request. I have not seen the error after an “OpenVAS scanner” request.
  • My hypothesis is the CVE scanner request isn’t cleaning up after itself and leaves the gvmd process in a “corrupt” state.

Looking at the code this seems possible. Below is a comparison of fork_osp_scan_handler() and fork_cve_scan_handler() (with edits and comments).

Notice that the CVE code path doesn’t seem to zero out the global_current_report and current_scanner_task globals in the parent process.

fork_osp_scan_handler (...)
{
  ...
  if (run_osp_scan_get_report (task, from, &report_id))
    return -1;

  // run_osp_scan_get_report() has called create_current_report() 
  // and set global_current_report in the parent process
  ...
  switch (fork ())
    {
      case 0:
        // Child
        ...
        break;
      case -1:
        /* Parent, failed to fork. */
        global_current_report = 0;
        current_scanner_task = 0;
        ...
        return -9;
      default:
        /* Parent, successfully forked. */
        global_current_report = 0;
        current_scanner_task = 0;
        ...
        return 0;
    }
  ...
  // Child process continues, performs scan and exit()s
}
fork_cve_scan_handler (...)
{
  ...
  if (create_current_report (task, &report_id, TASK_STATUS_REQUESTED))
    return -1;

  // create_current_report() has set global_current_report in the parent process
  ...
  switch (fork ())
    {
      case 0:
        // Child
        ...
      case -1:
        /* Parent, failed to fork. */
        global_current_report = (report_t) 0;
        // Does NOT null out current_scanner_task
        ...
        return -9;
      default:
        /* Parent, successfully forked. */
        // Does NOT null out global_current_report or current_scanner_task
        return 0;
    }
  ...
  // Child process continues, creates CVE report and exit()s
}

My apologies about the amount of information here, but I thought it better to keep this up-to-date while I work through it!

My next step is to add lines nulling out these globals in the CVE code path and see if that solves things. If it does fix things then it’s a mystery why I didn’t see this problem in 21.4 (though I DID use a PPA rather than built from source).

Some questions:

  • Does my hypothesis and proposed fix make sense?
  • Is there something else I’m missing? (I’m more inclined to think I’m doing something wrong than that I’ve discovered a bug!)

Thanks again.

While i’m not involved into any of these topics / development two notes:

  • This is a great information collection :+1:
  • If there is actually a bug somewhere it would be recommended to start a conversation with the development team via an opened issue at https://github.com/greenbone/gvmd as it is not guaranteed that this posting is getting noticed
1 Like

Thanks for the advice!

I opened an issue here: https://github.com/greenbone/gvmd/issues/1979

1 Like

Marked as solved: the code fix described in https://github.com/greenbone/gvmd/issues/1979 works for me. I intend to make a PR, so this could end up in a future gvmd release.

2 Likes

For tracking purposes the link to the relevant PR, thanks a lot for your contribution.

https://github.com/greenbone/gvmd/pull/1989

1 Like