Using routinely collected data for research purposes: challenges and mitigation strategies

Sabine Hoffmann, Tim Morris, Moritz Herrmann, Georg Heinze, Laure Wynants, Ben Van Calster, Bernd Bischl, Matthias Schmid, Pamela A Shaw, Tim Mathes, Florian Naudet, Frank E Harrell, Mona Niethammer, Samuel Muli, Sebastian Zimmer, Farhad Bakhtiary, David Leistner, P Christian Schulze, Daniel Sedding, Philipp Lurz, Tienush Rassaf, Malte Kelm, Stephan Baldus, Johann Bauersachs, Georg Nickenig, Holger Thiele, Enzo L�sebrink
BMJ. 2026 Jun 2:393:e087812. doi: 10.1136/bmj-2025-087812.

Increasing availability of large routinely collected datasets presents many possibilities to answer more questions about health and disease, and at a faster pace. These opportunities are exciting, but, without the necessary expertise, well intentioned researchers can unwittingly fall into traps that make their work indistinguishable from that of less well meaning researchers. This article describes important challenges that arise in the analysis of routinely collected data and offers mitigation strategies to improve the trustworthiness of experimental and observational studies with the aim of explanation or prediction using classical statistical methods or AI algorithms. It also provides a roadmap to support researchers in conducting analyses on routinely collected data that produce more reliable estimates.

Using routinely collected data for research purposes: challenges and mitigation strategies