Development, Evaluation, and Assessment of Large Language Models (DEAL) Checklist: A Technical Report

Satvik Tripathi, Dana Alkhulaifat, M.D., Florence X. Doo, M.D., Pranav Rajpurkar, Ph.D., Rafe McBeth, Ph.D., Dania Daye, M.D., Ph.D., and Tessa S. Cook, M.D., Ph.D.
Abstract
Large language models (LLMs) have advanced artificial intelligence research in medicine, especially in natural language processing tasks. However, the nascent evolution of LLM practices presents challenges to the transparency, reproducibility, and rigor of research methods. Standardized reporting in research is critical to ensure reliable scientific communication and evaluation. This article introduces the Development, Evaluation, and Assessment of Large Language Models (DEAL) checklist, designed to guide authors and reviewers in reporting LLM studies. The checklist comprises two pathways: DEAL-A, tailored for advanced model development and fine-tuning, and DEAL-B, suited to applied research using pretrained models with minimal modifications. Each pathway addresses critical elements such as model specifications, data-handling practices, training procedures, evaluation metrics, and transparency standards. The DEAL checklist provides a comprehensive structure for documenting LLM research with the aim of making it accessible and reproducible. This structured approach aims to set a standard for future research, facilitating peer review and encouraging best practices. The DEAL checklist will serve as a valuable tool for enhancing the quality and reproducibility of LLM research. By offering clear guidelines on critical reporting elements, the DEAL checklist promotes robust and transparent scientific reporting, ultimately supporting the reliable advancement of LLM technologies.

Development, Evaluation, and Assessment of Large Language Models (DEAL) Checklist: A Technical Report

Abstract