Skip to main content

Blog

SOUP in the real world — where the inventory actually breaks

Audited a Class C SaMD team last quarter where the SOUP inventory listed 47 items. Their `package-lock.json` had 3,412 resolved packages. I pointed at the gap. The QA lead said "we only track direct dependencies." Reviewer guidance doesn't agree. Clause 5.3.3 of IEC 62304 is clear: all SOUP items in the system, not just the ones you imported by name.

That conversation happens in some form on almost every SaMD audit I participate in. The inventory isn't wrong because the team is careless. It's wrong because the mechanism for staying current is manual, the dependency tree updates faster than any human process, and nobody has authority over which libraries a developer can `npm install` on a Tuesday.

This is what SOUP management actually looks like at scale — where it breaks, where 483s show up, and what has to be structural rather than documentary.

What Section 8 actually requires

IEC 62304 Section 8 defines SOUP management. For Class B and Class C software — where most clinically consequential SaMD sits — the standard requires:

  • Identification of all SOUP items with unique version identifiers (version number, commit hash, release tag — not "latest," not "main").
  • Documentation of the functional and performance requirements your system places on each SOUP item. Not the SOUP item's general capabilities. What you specifically rely on it to do.
  • Risk evaluation per item: how could it fail, what hazardous situation does that create, what's the severity of harm.
  • Anomaly list relevant to your system's safety — open issues in the dependency's tracker, CVEs against your pinned version, documented limitations that matter in your use case.
  • Suitability testing where the safety class of the software item using the SOUP warrants it.

Class A software gets lighter requirements — identification and version tracking without full risk evaluation. Most SaMD with clinical consequence doesn't live there, so the full set applies.

What counts as SOUP — and the grey zones

SOUP is any software item not developed under your own lifecycle and quality system. The obvious cases: open-source libraries (React, NumPy, PyTorch, OpenSSL), commercial components, operating system libraries, cloud SDKs, pre-trained ML models, third-party medical device software components.

Not SOUP: software your team developed under your QMS with full records.

The grey zones are where teams accumulate debt.

Contract-developed components. If the contractor followed your procedures and you hold the development records, not SOUP. If they delivered a black box with a commit, SOUP.

Internal libraries from other product teams. Technically developed under your QMS, but often not documented with IEC 62304 rigor. Treat as SOUP unless the development records demonstrate full lifecycle compliance.

Forked open-source you modified. Still SOUP. The fact that you patched a bug doesn't convert the upstream into your QMS-controlled software.

LLM API endpoints. The API itself — the hosted model behind OpenAI, Anthropic, Google — isn't SOUP in the traditional sense because you don't embed it. But its outputs drive behaviour in your system, and under current FDA AI/ML draft guidance, the provider-supplied model card and performance characterization belong in your design record. Expect this grey zone to harden into clearer expectations in the 2026–2027 guidance cycle.

Is thissoftware?NoNot SOUP.No action.YesRuns on device or affectsdevice safety function?NoSOUP candidate —evaluate for riskcontributionYesContributes to a safetyfunction per IEC 62304?NoClass A — documentin inventory, monitorfor updatesYesSafety Class B or C?BClass B — anomalylist, version pinnedCClass C — full docs,anomaly mgmt,change control
SOUP identification decision tree per IEC 62304. The documentation burden scales with the safety class of the software that uses it — not the SOUP itself.

Where the SOUP inventory actually breaks

The breakdown isn't usually at initial inventory. It's during months three through twenty-four of development, when the team is shipping features and nobody is watching the dependency tree.

Transitive dependencies. A direct import of one package pulls in 40 transitive ones. None get added to the SOUP list unless the process captures them. Six months in, the lock file has 2,000 packages and the inventory still lists 60. This is the most common 483 pattern I see on SaMD inspections.

Dependency drift between builds. Developer installs a library locally to prototype. Commits the change. CI builds succeed. SOUP inventory doesn't get updated because nobody noticed. The submission build uses a dependency that was never evaluated for risk. Reviewer finds the discrepancy in the SBOM.

Version bumps that look minor. 4.2.1 to 4.2.2 looks cosmetic. Under IEC 62304 Clause 5.3.4, it's a SOUP change event requiring re-evaluation. In an agile team shipping weekly, that's 30+ SOUP change events per quarter. Manual tracking breaks at 5 per quarter.

Build-tool-only packages. Teams often list runtime dependencies only. Build tools, linters, test frameworks — excluded. Some are legitimately out of scope. Some get pulled into the build pipeline in ways that affect delivered software. A code-generation tool that produces parts of your runtime is SOUP for your runtime. Distinction matters, and reviewers check.

ML framework minor releases. PyTorch 2.3 to 2.4 changes numerical precision behaviour in several operators. Model output distributions can shift by a few hundredths of a percent — below the noise floor of integration tests, above the floor of clinical performance. If your performance requirement is "≥94% sensitivity on the validation set" and the rebuild produces 93.94%, you just failed. Without a SOUP change event tying it back to the PyTorch bump, you won't know why.

Post-submission 483 patterns worth knowing

Three patterns show up repeatedly in recent FDA post-submission observations on SOUP.

Inventory-to-SBOM mismatch. FDA cybersecurity premarket guidance (2023) expects an SBOM in CycloneDX or SPDX format. The SBOM is generated from the build. The SOUP inventory is maintained by the QA team. When the two disagree, the reviewer flags it. Fix: generate the inventory from the same build that generates the SBOM, not from a parallel spreadsheet.

Anomaly list staleness. "Known anomalies" as of 18 months ago. Observation cites that CVEs published in the intervening period aren't reflected. Fix: automated CVE feed tied to the pinned versions in your inventory, with a documented review cadence.

Missing suitability evidence for Class C SOUP. Clause 5.3.3 expects suitability testing for SOUP used in Class B or C software items. Observation: no test records for the SOUP item's functional and performance requirements in the system. Fix: every Class B/C SOUP item has an integration-test suite exercising the functional requirements your system relies on, run in CI with results archived.

The inventory that actually scales

Fields per item that matter. Anything less is inadequate at Class B/C. Anything more is ceremony.

  • Title and source. Package name plus registry. PyPI, npm, Maven, commercial vendor, GitHub repo.
  • Unique version identifier. Exact version, commit hash, or release tag. Not "latest." Not "main."
  • Functional requirements in your system. What your product specifically relies on it to do. "Provides HTTPS client functionality used for secure data transmission to the clinical dashboard" — not "HTTP library."
  • Safety class of the software item using it. Inherits the scrutiny level of its consumer.
  • Known anomalies. Open tracker issues relevant to your use case, CVEs against your version, limitations you've encountered.
  • Risk evaluation. Failure mode, hazardous situation, severity, risk controls.
  • Last reviewed date. Triggers the review cadence for anomaly list refresh.
ComponentVersionSupplierSafety Function?Anomaly ListLast ReviewedReact 18.2.0Frontend UI18.2.0 (pinned)Meta (OSS)No (UI only)N/A (Class A)2025-11-14TensorFlow 2.13ML inference2.13.0 (pinned)Google (OSS)Yes —inferenceRequired2025-11-14libusb 1.0.26USB I/O1.0.26 (pinned)OSSYes —USB I/ORequired2025-09-03
Minimum viable SOUP inventory — six fields per item. Every field has a regulatory purpose: version pinning creates reproducibility, anomaly list status drives your safety argument, last reviewed date triggers update evaluation.

ML frameworks as SOUP — where the standard didn't anticipate today

PyTorch, TensorFlow, scikit-learn, ONNX Runtime — all SOUP. Pre-trained weights your product uses or fine-tunes from — also SOUP, with additional expectations from FDA's evolving AI/ML guidance.

The failure mode specific to ML frameworks: numerical precision changes in minor releases. Between PyTorch 2.1 and 2.2, several operators changed their default precision on certain hardware paths. Model output distributions shifted measurably in benchmarks. Not enough to break obvious tests. Enough to invalidate clinical performance claims tied to the pre-update build.

For clinically consequential SaMD, treat every minor version bump of the inference framework as a SOUP change event requiring model performance re-verification against the original clinical evaluation dataset. Not a full re-study. A statistical comparison against the locked reference distribution, with a pre-specified tolerance.

Pre-trained model weights are a special case. Not software in the traditional sense. Produces output that drives clinical decisions. Under current FDA draft guidance on AI/ML SaMD, the model card, the data governance documentation, and the performance characterization for any third-party model component belongs in your design record. Your PCCP framework intersects with the SOUP process for these items.

Risk-stratified update management

"Update all dependencies to latest" without SOUP change evaluation is non-compliant. "Freeze all dependencies" accumulates CVEs and functional debt. Neither scales.

What works: tier your SOUP items by the safety class of the software consuming them. Class C-consuming SOUP gets full change evaluation on every version bump — read the release notes, evaluate changes against your functional requirements, flag anomaly list changes, run the suitability test suite. Class A-consuming SOUP gets CVE-only review on a monthly cadence. Build-only tooling gets quarterly review.

This tiering has to be documented and have management approval. It's a risk-based argument, and reviewers will test it. "We update differently based on consumer class" is defensible. "We don't update non-critical dependencies" is not.

Keeping the record current is a build-time problem

The most common SOUP management failure isn't initial inventory. It's maintenance. Dev teams add dependencies continuously. The SOUP registry updates via human memory. By submission, the inventory is missing items that shipped to production 14 months ago.

Manual tracking doesn't scale past 50 items. Build-time automation scales.

MANKAIND integrates SOUP tracking with the engineering record. Dependencies are surfaced from your build configuration and connected to the software items that use them. Risk evaluations sit alongside the design decisions they support. When a version changes, the platform connects that change event to affected software items and the verification evidence that needs review. The SOUP inventory is current because it's rendered from the build, not maintained as a parallel spreadsheet.

The SBOM and the SOUP inventory become the same record, viewed through different lenses. That's what keeps Clause 5.3.3 evidence intact at 3,000 packages rather than 47.

See how MANKAIND handles this

30-minute demo. Bring your hardest design controls question.