Skip to content

CWE View for Benchmarking

Illustrative Scenario

Per Top25 Dataset, some CVEs have more than one CWE.

How do we score this scenario?

  • Benchmark: CWE-79, CWE-74 (yellow arrows)
    • Benchmark is the known-good CWE mappings for a given CVE.
  • Prediction: CWE-79, CWE-89, CWE-352 (orange arrows)
    • Prediction is the assigned CWE mappings for a given CVE from a tool/service.


CWEs after Top 25 remapping

See Interactive CWE diagram.

  • ✅ CWE-79 was correctly assigned
  • ❓ CWE-89 was assigned. This is more specific than CWE-74
    • Is this wrong or more right?
    • Does this get partial credit or no credit?
  • ❌ CWE-352 was incorrectly assigned
    • Is there a penalty for a wrong guess?

Similarly, how would we score it for

  • Benchmark: CWE-79, CWE-89, CWE-352
  • Prediction: CWE-79, CWE-74

Question

Do we reward proximity? i.e. close but not exactly the Benchmark CWE(s).

What View determines proximity? e.g. View-1003, View-1000.

Do we penalize wrong? i.e. far from any Benchmark CWE(s).

Are classification errors in the upper levels of the hierarchy (e.g. when wrongly classifying a CWE in the wrong pillar) more severe than those in deeper levels (e.g. when classifying a CVE from Base to Variant)

Do we reward specificity? i.e. for 2 CWEs in proximity to the Benchmark CWE(s), do we reward a more specific CWE(s) versus a more general CWE(s)?

Benchmark CWE View Selection

Info

The most specific accurate CWE should be assigned to a CVE i.e. not limited by views which contain a subset of CWEs e.g. View-1003.

  • Therefore, the benchmark should support all CWEs

Tip

Research View (View-1000) will be used as the CWE View to base benchmark scoring on.

See details on different views and why Research View (View-1000) is the best view to use.

Benchmark Scoring Algorithm

This section is an overview of different CWE benchmarking scoring approaches.

Data Points

Info

A CVE can have one or more CWEs i.e. it's a multi-label classification problem.

MITRE CWE list is a rich document containing detailed information on CWEs (~2800 pages).

The RelatedNatureEnumerations form a (view-dependent e.g. CWE-1000) Directed Acyclic Graph (DAG)

  • 1309 ChildOf/ParentOf
  • 141 CanPrecede/CanFollow
  • 13 Requires/RequiredBy

CVE info (Description and References) may lead to some ambiguity as to the best CWE.


Scoring System: Overview

The goal is to assign a numeric score that:

  • Rewards exact matches highly.
  • Rewards partial matches based on hierarchical proximity (parent-child relationships).
  • Penalizes wrong assignments.
  • Provides a standardized range (e.g. 0 to 1) for easy comparison.
  • Is based on the CWE hierarchy.
  • Reinforces good mapping behavior.

Definitions & Assumptions

  • Ground Truth (Benchmark): One or more known-good CWEs per CVE.
  • Prediction: One or more CWEs assigned by the tested solution.
  • Hierarchy: Relationships defined by MITRE’s CWE hierarchy (RelatedNatureEnumerations).
  • Proximity: Measured as shortest hierarchical path (edge distance) between two CWEs for a given View.

Interesting Cases

Evaluation Measures for Hierarchical Classification: a unified view and novel approaches, June 2013, presents the interesting cases when evaluating hierarchical classifiers.

Interesting cases when evaluating hierarchical classifiers. Nodes surrounded by circles are the true classes while the nodes surrounded by rectangles are the predicted classes.

Scoring Approaches

The following Scoring Approaches are detailed:

  1. Path Based using CWE graph distance based on ChildOf
    1. Path based Proposal
    2. See Critique
  2. Hierarchical Classification Standards
    1. A survey of Hierarchical Classification Standards
    2. A proposal with Worked Examples

Tip

Hierarchical Classification Standards look the most promising.