Most healthcare organisations evaluate AI products the way they evaluate imaging equipment — by reading the spec sheet. But AI is not a static device. Its performance depends on what data trained it, how it handles edge cases, and whether it has been tested on patients who look like yours. Accepting AI tools on vendor assertion alone has produced most of the high-profile clinical AI failures of the past five years.
The question isn’t whether the AI works. It’s whether you can verify that it works — in your environment, for your patients.
THRIVE™ Technical assessment maps your institution’s capability to demand training data transparency, interrogate algorithmic behaviour, require local validation, and make procurement decisions grounded in evidence rather than marketing.
Core capabilities
Training Data Representativeness
Demographic, clinical, and equipment diversity of the data used to train and validate the AI relative to the intended patient population.
Training Data Quality & Provenance
Completeness, accuracy, and documented lineage of data assets used to train, validate, and update the AI system.
Annotation Quality
Reliability and consistency of ground-truth labels, including documented inter-observer variability assessment.
Algorithm Transparency
NEWVendor disclosure of model architecture, decision logic, known failure modes, and confidence-scoring methodology at a level sufficient for clinical interrogation.
Model Robustness & Performance Characterisation
NEWDocumented behaviour under out-of-distribution inputs, demographic stress-testing, adversarial conditions, and edge-case clinical presentations.
Local Validation & Commissioning Demand
EXPANDEDStructured capacity to require, locally validate, fine-tune, and commission AI systems against locally defined clinical performance thresholds before clinical release.
How Technical connects
Technical readiness defines what evidence you demand. Evaluation readiness (E) determines whether you can actually test against it.
Without technical due diligence, vendor evaluation capability (V) defaults to procurement on price rather than performance.
THRIVE™'s Technical facet is the organisational mirror of what VERA evaluates at the product level — the demand side of the same evidence.
What the literature says
“For AI to be used effectively for health, existing biases in healthcare services and systems based on race, ethnicity, age, and gender, that are encoded in data used to train algorithms, must be overcome.”
— WHO, Ethics & Governance of AI for Health (2021)
“LLMs have achieved excellent performance on medical licensing exams, yet these tests fail to assess many skills necessary for deployment in a realistic clinical decision-making environment, including gathering information, adhering to guidelines, and integrating into clinical workflows.”
— Hager et al., Nature Medicine (2024)
“Model design is suited to the available data and supports the active mitigation of known risks, like overfitting, performance degradation, and security risks.”
— FDA, Good Machine Learning Practice (2021)