CWE View for Benchmarking¶

Illustrative Scenario¶

Per Top25 Dataset, some CVEs have more than one CWE.

How do we score this scenario?

Benchmark: CWE-79, CWE-74 (yellow arrows)
- Benchmark is the known-good CWE mappings for a given CVE.
Prediction: CWE-79, CWE-89, CWE-352 (orange arrows)
- Prediction is the assigned CWE mappings for a given CVE from a tool/service.

CWEs after Top 25 remapping

See Interactive CWE diagram.

✅ CWE-79 was correctly assigned
❓ CWE-89 was assigned. This is more specific than CWE-74
- Is this wrong or more right?
- Does this get partial credit or no credit?
❌ CWE-352 was incorrectly assigned
- Is there a penalty for a wrong guess?

Similarly, how would we score it for

Benchmark: CWE-79, CWE-89, CWE-352
Prediction: CWE-79, CWE-74

Question

Do we reward proximity? i.e. close but not exactly the Benchmark CWE(s).

What View determines proximity? e.g. View-1003, View-1000.

Do we penalize wrong? i.e. far from any Benchmark CWE(s).

Are classification errors in the upper levels of the hierarchy (e.g. when wrongly classifying a CWE in the wrong pillar) more severe than those in deeper levels (e.g. when classifying a CVE from Base to Variant)

Do we reward specificity? i.e. for 2 CWEs in proximity to the Benchmark CWE(s), do we reward a more specific CWE(s) versus a more general CWE(s)?

Benchmark CWE View Selection¶

Info

The most specific accurate CWE should be assigned to a CVE i.e. not limited by views which contain a subset of CWEs e.g. View-1003.

Therefore, the benchmark should support all CWEs

Tip

Research View (View-1000) will be used as the CWE View to base benchmark scoring on.

See details on different views and why Research View (View-1000) is the best view to use.

Benchmark Scoring Algorithm¶

This section is an overview of different CWE benchmarking scoring approaches.

Data Points¶

Info

A CVE can have one or more CWEs i.e. it's a multi-label classification problem.

MITRE CWE list is a rich document containing detailed information on CWEs (~2800 pages).

The RelatedNatureEnumerations form a (view-dependent e.g. CWE-1000) Directed Acyclic Graph (DAG)

1309 ChildOf/ParentOf
141 CanPrecede/CanFollow
13 Requires/RequiredBy

CVE info (Description and References) may lead to some ambiguity as to the best CWE.

Scoring System: Overview¶

The goal is to assign a numeric score that:

Rewards exact matches highly.
Rewards partial matches based on hierarchical proximity (parent-child relationships).
Penalizes wrong assignments.
Provides a standardized range (e.g. 0 to 1) for easy comparison.
Is based on the CWE hierarchy.
Reinforces good mapping behavior.

Definitions & Assumptions¶

Ground Truth (Benchmark): One or more known-good CWEs per CVE.
Prediction: One or more CWEs assigned by the tested solution.
Hierarchy: Relationships defined by MITRE’s CWE hierarchy (RelatedNatureEnumerations).
Proximity: Measured as shortest hierarchical path (edge distance) between two CWEs for a given View.
- If there is more than one path, the Primary Path is used.

Interesting Cases¶

Evaluation Measures for Hierarchical Classification: a unified view and novel approaches, June 2013, presents the interesting cases when evaluating hierarchical classifiers.

Scoring Approaches¶

The following Scoring Approaches are detailed:

Path Based using CWE graph distance based on ChildOf
1. Path based Proposal
2. See Critique
Hierarchical Classification Standards
1. A survey of Hierarchical Classification Standards
2. A proposal with Worked Examples

Tip

Hierarchical Classification Standards look the most promising.