Skip to main content

Blog

IEC 62366 usability engineering — where your UEF will actually get torn apart

Three weeks ago, another 510(k) came back with a usability deficiency. Class II infusion pump. Third AI request on the HRUS list. No new findings — the auditor just wanted to know why the risk file listed seven hazardous situations tied to use error and the UEF's summative scenario set only covered four.

You've seen this letter. Or you will.

IEC 62366-1:2015 plus A1:2020 is clean on paper. Use specification, user interface specification, use scenarios, HRUS derived from the ISO 14971 file, formative evaluation, summative evaluation, Usability Engineering File. In practice, the standard doesn't fail. What fails is the connection between the UEF and the rest of your engineering record. Four issues cause most of the pushback CDRH has sent out since 2020, and I'll walk through them in the order they tend to surface.

HRUS derivation is where most UEFs get torn apart

Clause 5.5 says identify hazard-related use scenarios. Clause 5.9 says summative evaluation covers them all. Annex C of 62366-1 says the link to ISO 14971 runs both ways — hazards in the risk file should appear in the HRUS list, and observed use errors in summative testing should flow back into the risk file.

The gap usually lives here: hazards in the risk file came out of an FMEA conducted by the risk team. The HRUS list came out of task analysis run by the human factors consultancy. Different teams, different frameworks, different source documents. Overlap is usually 60–80%. The missing 20–40% is where the deficiency letters live.

What's missing, specifically: multi-action hazard pathways (operator acknowledged the alarm AND silenced it AND didn't re-check the dose), edge-case user populations (paediatric caregivers, night-shift staff cognitively degraded), environmental conditions (ambulance with siren running, low-light emergency) that a quiet usability lab doesn't produce. Task analysis catches the obvious use errors. FMEA catches the obvious hazards. Neither alone catches the combinatorial use-related hazards where real patients actually get hurt.

The fix isn't more testing. It's a single derivation artefact that traces every hazard in the 14971 file to either an HRUS, a rationale for why use error isn't a pathway, or a non-usability risk control. When CDRH reads that derivation table and can see the closure logic, the AI requests stop. When they read "we ran a task analysis," they keep asking.

Use error vs anticipated misuse vs reasonably foreseeable misuse

62366-1 Definition 3.21 on use error, the 14971 concept of reasonably foreseeable misuse, and the manufacturer's own framing of anticipated misuse don't always line up. This matters because the classification determines what you have to design for versus what you can disclaim.

Example from a case I worked on in late 2025: an adult dosing pump used on a paediatric patient with a paediatric-specific disposable that had no operator affordance distinguishing it from the adult configuration. Manufacturer framed this as off-label misuse — out of scope. CDRH framed it as use error — design gap. Six months of argument. Eventually they implemented a software-enforced disposable recognition and the submission cleared.

Simple rule I use now: if the operator could reasonably do the wrong thing without realising they're doing it, it's a use error, and the design has to address it. If they would have to actively override or bypass to misuse the device, you can argue misuse. Reviewers won't always agree with you, but at least the argument is clean.

The "15 users" number — when it's enough and when it's not

FDA's 2016 Final Guidance settled on 15 participants per distinct user group. AAMI HE75:2009 (updated 2019) is the statistical source: 15 observations give roughly 97% probability of catching a use error that occurs at a true 20% frequency.

Fine for detection of common errors. Not fine when the error frequency is 2% and the severity is catastrophic. That's where the number 15 starts failing.

For Class III implants, closed-loop therapeutics, autonomous surgical assistance, and high-severity diagnostic software, I've been seeing PMA submissions from 2024 onward with 25–30 users per group. CDRH isn't writing it into guidance, but their AI requests on implantable therapeutic device PMAs now routinely ask for sample size justification against severity, not just frequency. If your use error could kill someone and you tested with 15 users, you're building an uphill argument.

User group stratification matters more than total N. If you're selling to ICU nurses and general practice nurses and they have different training and different typical environments, 15 combined doesn't satisfy summative — it's 15 per group or a pooled design justified against a narrow-enough use specification to make pooling defensible. The manufacturers who lose this argument treated user group definition as a marketing question. It's a human factors engineering question.

What formative is actually for

62366-1 says formative findings have to demonstrate they influenced design. That's the clause most teams skip.

Typical failure: three formatives with five users each, cosmetic UI tweaks after each round, then a summative presented as the capstone. When CDRH reads the UEF, they look for substantive design changes traceable to formative findings. If they don't find them, the summative reads as a single validation event with no fallback — and when the summative surfaces anything unexpected, you have no history of iterative improvement to stand on.

Put it this way: if your formative reports don't show renamed controls, restructured workflows, added confirmations, or safety-relevant redesigns, the formative didn't do its job. Or you didn't capture that it did. Either way, the UEF is weaker for it.

AI-enabled interfaces break the summative-is-final assumption

This is the live problem. FDA's 2022 draft on applying human factors to AI-enabled devices and the 2024 FDA-Health Canada-MHRA joint guiding principles on ML transparency both flag it: 62366 assumes the interface you validated is the interface users will see. For adaptive AI, that's not true.

A summative captures usability at the moment of testing. If the model updates — through a PCCP modification, through distribution shift, through an adversarial input — the interface behaves differently. The UEF still describes the device you validated; the device in the field is subtly different.

Operational consequence: every PCCP modification needs a usability implication check. Does the change affect what the user sees, how they interpret output, how they respond to alerts or recommendations? If yes, some abbreviated formative re-evaluation before deployment. The PCCP should spell this out with the same specificity as the performance testing section. Otherwise the UEF becomes historical documentation describing a product that doesn't exist in the field.

I've seen this flagged in at least six 510(k) AI submissions in 2025 that I'm personally aware of. The ones that handled it well wrote a usability change trigger protocol into the PCCP. The ones that didn't accumulated AI requests that added 90+ days to review.

The real problem: UEF drift

Here's what actually goes wrong. The same use-related hazard appears in four documents — the risk management file, the IEC 62304 software requirements, the IEC 60601-1-6 usability evidence, the UEF. Each document is internally consistent. Across documents, the hazard is worded slightly differently, the severity differs by one level, the risk control reference points at a different section. Each document is maintained by a different team in a different system.

Reviewers read across these files. Every time. They find the inconsistencies. The defence has to be reconstructed in real time from somebody's memory of what was decided 18 months ago. Sometimes the defence works. Sometimes it doesn't. Either way, weeks of timeline disappear.

The only real fix is the UEF being a view into an integrated engineering record rather than a standalone document. Same hazard, same risk control, same requirement, same use scenario, same summative evidence — all one object, viewed through different lenses for different audiences. That's the architecture MANKAIND is built around. Not because it's elegant, but because every other approach accumulates the drift problem and eventually pays for it in review.

Frequently asked questions about IEC 62366

What is IEC 62366?

IEC 62366-1:2015 (amended 2020) is the international standard that defines the usability engineering process for medical devices. It governs how manufacturers identify use-related hazards, design the user interface, and validate that the device can be used safely by its intended users — required for FDA recognition and EU MDR conformity.

What is the difference between IEC 62366:2007 and IEC 62366-1:2015?

IEC 62366-1:2015 split the original standard into process requirements (Part 1) and guidance for usability engineering implementation (IEC/TR 62366-2). The 2015 revision tightened the linkage between usability engineering and ISO 14971 risk management via the concept of hazard-related use scenarios (HRUS), and clarified distinctions between formative and summative evaluation.

Is IEC 62366 mandatory for FDA?

FDA recognises IEC 62366-1:2015+A1:2020 as a consensus standard. Declaration of Conformity to the standard is not mandatory, but FDA guidance on Applying Human Factors and Usability Engineering to Medical Devices expects a usability engineering process aligned with 62366. For devices with high use-error risk, FDA typically asks to see the Usability Engineering File during review.

How does IEC 62366 relate to ISO 14971?

Hazard-related use scenarios identified under IEC 62366 feed directly into the ISO 14971 risk management file. Use errors are risk sources. Risk controls for use-related hazards are implemented in the user interface design, which is then validated through summative usability evaluation. The two standards are structurally interdependent — you cannot execute one without the other.

How many users are needed for summative usability evaluation?

FDA guidance and AAMI HE75 typically expect a minimum of 15 participants per distinct user group in summative evaluation. The rationale is statistical: with 15 users, you have roughly 97% confidence of observing any use error that occurs at ≥20% frequency. For devices with multiple user groups (clinicians, patients, caregivers), each group requires its own 15-participant study.

See how MANKAIND handles this

30-minute demo. Bring your hardest design controls question.