BMJ. 2026 Jun 2:393:e087812. doi: 10.1136/bmj-2025-087812.
Increasing availability of large routinely collected datasets presents many possibilities to answer more questions about health and disease, and at a faster pace. These opportunities are exciting, but, without the necessary expertise, well intentioned researchers can unwittingly fall into traps that make their work indistinguishable from that of less well meaning researchers. This article describes important challenges that arise in the analysis of routinely collected data and offers mitigation strategies to improve the trustworthiness of experimental and observational studies with the aim of explanation or prediction using classical statistical methods or AI algorithms. It also provides a roadmap to support researchers in conducting analyses on routinely collected data that produce more reliable estimates.