Deep Learning: Deep Learning and Musculoskeletal Apps Imaging Pearls - Educational Tools | CT Scanning | CT Imaging

Deep Learning: Deep Learning and Musculoskeletal Apps Imaging Pearls - Educational Tools | CT Scanning | CT Imaging | CT Scan Protocols - CTisus

Imaging Pearls ❯ Deep Learning ❯ Deep Learning and Musculoskeletal Apps

View Pearls by Month:
-- OR --
View Pearls by Topic:
View Pearls by Subsection:

One of the major limitations of the AI system highlighted by this study is its susceptibility to false negative and false positive findings. Of the 71 fractures identified by CT, 13 were missed by the AI solution, while the radiologist missed only six fractures [2]. False negatives are of particular concern in the clinical setting, because missed fractures can lead to delayed or inappropriate treatment, increasing the risk of complications and potentially worsening patient outcomes. Moreover, the AI solution also generated 15 false positive findings, which can result in unnecessary further imaging or treatment, increasing patient anxiety and healthcare costs. This result underscore the need for sufficient training data, such as a low prevalence of rare fractures, which is a constant issue for AI applications [3]. As the study suggests, AI should be considered as a complementary tool rather than a replacement tool for human expertise, at least until further fine tuning can address these shortcomings. Future studies could focus on the combination of radiologists and AI tools, which may be a good balance to maximize the accuracy of bone fracture detection.
Artificial intelligence for bone fracture detection: A promising tool but no substitute for human expertise
Daphne Guenouna, Mickael Tordjman
Diagnostic and Interventional Imaging 2024 (in press)
In their study, Pastor et al. compared the diagnostic performance of a deep learning algorithm (Rayvolve®, AZmed) with that of experienced radiologists in detecting bone fractures in adult patients using radiographs [2]. With 94 patients included, this study evaluated both the sensitivity and specificity of the AI solution and human radiologists, using computed tomography (CT) as ground truth. The results demonstrated that while the AI solution performed reasonably well, it was consistently outperformed by the radiologists. The AI solution achieved a sensitivity (i.e., the ability to correctly identify fractures) of 82 % and a specificity (i.e., the ability to correctly rule out fractures in patients without fractures) of 69 % [2]. By comparison, the radiologists achieved a sensitivity of 92 % and a specificity of 88 %, demonstrating that human expertise remains critical in the clinical setting.
Artificial intelligence for bone fracture detection: A promising tool but no substitute for human expertise
Daphne Guenouna, Mickael Tordjman
Diagnostic and Interventional Imaging 2024 (in press)
“ AI tools have varying levels of performance, with some favoring sensitivity and others favoring accuracy, depending on the specific goal to achieve. Future studies should focus on improving the sensitivity and specificity of AI solutions, particularly in detecting fractures in challenging anatomical regions such as the hands, wrists, and feet, which are often missed by both AI and radiologists. As the technology continues to evolve, the role of AI in healthcare will undoubtedly grow. However, the study by Pastor et al. underscores the need for caution in adopting AI without first addressing its limitations. By maintaining a balance between technological innovation and human expertise, we can ensure that AI enhances, rather than diminishes, the quality of patient care. ”
Artificial intelligence for bone fracture detection: A promising tool but no substitute for human expertise
Daphne Guenouna, Mickael Tordjman
Diagnostic and Interventional Imaging 2024 (in press)

“AI capabilities present opportunities for clinical applications in spine imaging: early detection improving patient outcomes, enhanced diagnosis and surgical planning, improved patient experience with faster acquisition times, and assisting radiologists’ efficiency with end-to-end tools for automation.”
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
“AI enables automated vertebral, disc, and canal segmentation even with noise, artifacts, and anatomic variations. Models for training large data sets can identify complex features, enhancing diagnosis and classification of fractures, tumors, or degeneration. Specialized algorithms and variational autoencoders may reconstruct images from under sampled/noisy data, resulting in faster acquisition times while preserving diagnostic accuracy.”
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
“AI can reduce MR imaging (MRI) and CT image noise, using DL techniques learning noise patterns in data. DL noise reduction algorithm improved signal-to-noise ratio (SNR) of lumbar spine (LS) MRI scans up to 30%.27 DL reconstruction algorithms can enhance MR image resolution by learning relationships between different coordinates in the image, significantly improving LS MR image quality. LS CT studies show DL-enhanced images have significantly lower noise compared with the original scan. Reducing radiation dose in LS CT scans is an important direction in image reconstruction from fewer data points with high-quality images acquired at doses up to 72% lower than standard of care (SOC).”
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
“DL-assisted radiologists had superior or equivalent interobserver agreement for all stenosis gradings compared with unassisted radiologists. DL-assisted general and in-training radiologists improved their interobserver agreement for four class NF stenosis, k 5 0.71 and 0.70 (DL) versus 0.39 without DL, respectively (both P < .001).73 DL assistance can streamline report generation, which involves image review and a separate text input into a reporting module. DL assistance can detect regions of interest (ROIs), grade stenosis, band automatically generate a sentence directly into the reporting module. The radiologist can change and control the DL-assisted predictions before a report is generated, important for safety and patient preference. Strategic “one-click” solutions integrated within the normal radiologist workflow will be prerequisite for successful implementation.”
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
“AI potentially increases efficiency, reducing time-consuming tasks and assisting radiologists in specific diagnoses with a goal to provide comprehensive automated image analysis. Characterizing lesions and identifying lesions that might be missed are of great benefit, allowing earlier diagnoses, reporting, and treatment. Combining clinical data with image analysis may permit improved and more personalized treatment decision making. Although challenges remain, AI technology continues to rapidly progress with great potential to improve patient care and outcomes.”
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
--Artificial intelligence (AI) models can assist in diagnosing spine pathologies such as degenerative disease, tumor, infection, fracture, with increasing sensitivity and specificity when used in parallel with radiologists’ input and supervision.
--AI models integrated into automated reading workflow can significantly decrease interpretation time. This is most evident for radiologists in training.
--AI models can improve SNR and reduce artifacts inherent in rapid spine imaging protocols resulting in diagnostic quality imaging with 40-50% reduction in image acquisition time.
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)
--AI models may assist in ’pre-screening’ routine imaging examinations and flagging suspicious cases before review by radiologists. This allows added value through worklist prioritization.
--Clinically deploying AI models may assist in semiautomated reporting under radiologist supervision to provide more consistent and objective reporting
Practical Applications of Artificial Intelligence in Spine Imaging: A Review
Upasana Upadhyay Bharadwaj, et al.
Radiol Clin N Am - (2023) (in press)

Background Advances have been made in the use of artificial intelligence (AI) in the field of diagnostic imaging, particularly in the detection of fractures on conventional radiographs. Studies looking at the detection of fractures in the pediatric population are few. The anatomical variations and evolution according to the child’s age require specific studies of this population. Failure to diagnose fractures early in children may lead to serious consequences for growth.
Objective To evaluate the performance of an AI algorithm based on deep neural networks toward detecting traumatic appendicular fractures in a pediatric population. To compare sensitivity, specificity, positive predictive value and negative predictive value of different readers and the AI algorithm.
Materials and methods This retrospective study conducted on 878 patients younger than 18 years of age evaluated conventional radiographs obtained after recent non-life-threatening trauma. All radiographs of the shoulder, arm, elbow, forearm, wrist, hand, leg, knee, ankle and foot were evaluated. The diagnostic performance of a consensus of radiology experts in pediatric imaging (reference standard) was compared with those of pediatric radiologists, emergency physicians, senior residents and junior residents. The predictions made by the AI algorithm and the annotations made by the different physicians were compared.
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children
Idriss Gasmi et al.
Pediatric Radiology 2023 (in press)
Results The algorithm predicted 174 fractures out of 182, corresponding to a sensitivity of 95.6%, a specificity of 91.64% and a negative predictive value of 98.76%. The AI predictions were close to that of pediatric radiologists (sensitivity 98.35%) and that of senior residents (95.05%) and were above those of emergency physicians (81.87%) and junior residents (90.1%). The algorithm identified 3 (1.6%) fractures not initially seen by pediatric radiologists.
Conclusion This study suggests that deep learning algorithms can be useful in improving the detection of fractures in children.
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children
Idriss Gasmi et al.
Pediatric Radiology 2023 (in press)
“Trauma is a major indication for emergency room visits in adults and in children. Fractures remain a leading cause of diagnostic errors potentially associated with negative sequelae in children if the growth cartilage is affected. Assisting clinicians and nonspecialized radiologists in detecting and localizing fractures on digital radiographs could prevent medical errors, particularly outside standard working hours. The objective of this study was first to evaluate the diagnostic performance of an AI algorithm in detecting peripheral fractures in a pediatric population. Second, to compare the diagnostic performance of readers with different levels of experience, ranging from pediatric radiologists to senior and junior residents in radiology and emergency physicians.”
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children
Idriss Gasmi et al.
Pediatric Radiology 2023 (in press)
“Conversely, 1.6% of all fractures (3 fractures) were detected by the algorithm but were not initially seen by the pediatric radiologist, which suggests a significant contribution to daily practice. A gain in sensitivity for the diagnosis of skeletal fractures can be expected, particularly for general radiologists. The algorithm also showed superiority, in terms of sensitivity, over the emergency physicians (sensitivity 81.87%), which suggests a real benefit in the daily use of the algorithm by pediatric emergency physicians.”
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children
Idriss Gasmi et al.
Pediatric Radiology 2023 (in press)
“This study, evaluating the diagnostic performance of an AI algorithm in detecting appendicular fractures in children, has demonstrated a high sensitivity and NPV compared to the reference standard. The PPV and kappa coefficient were lower, but algorithms are constantly improving. It may be hypothesized that AI algorithms can assist radiologists and emergency physicians to detect appendicular fractures in children. AI algorithms may save time and provide a diagnostic aid in the detection of bone fractures in children. Less experienced readers may benefit the most, but even senior pediatric radiologists may find an AI algorithm useful to improve their time efficiency in the localization of appendicular fractures.”
Comparison of diagnostic performance of a deep learning algorithm, emergency physicians, junior radiologists and senior radiologists in the detection of appendicular fractures in children
Idriss Gasmi et al.
Pediatric Radiology 2023 (in press)
“Workflow prioritization based on pathology detection, injury or disease severity grading and classification, quantitative visualization, and auto-population of structured reports were identified as high-value tasks. Respondents overwhelmingly indicated a need for explainable and verifiable tools (87%) and the need for transparency in the development process (80%). Most respondents did not feel that AI would reduce the need for emergency radiologists in the next two decades (72%) or diminish interest in fellowship programs (58%). Negative perceptions pertained to potential for automation bias (23%), over-diagnosis (16%), poor generalizability (15%), negative impact on training (11%), and impediments to workflow (10%).”
A survey of ASER members on artificial intelligence in emergencyradiology: trends, perceptions, and expectations
Anjali Agrawal et al.
Emergency Radiology 2023 (in press)
“ASER member respondents are in general optimistic about the impact of AI in the practice of emergency radiology and its impact on the popularity of emergency radiology as a subspecialty. The majority expect to see transparent and explainable AI models with the radiologist as the decision-maker.”
A survey of ASER members on artificial intelligence in emergencyradiology: trends, perceptions, and expectations
Anjali Agrawal et al.
Emergency Radiology 2023 (in press)
“Just over half of respondents among the ASER membership currently use commercial AI tools in their practice. Two thirds of respondents who currently use AI tools feel that they improve quality of care, and most find themselves disagreeing with AI predictions in 5–20% of studies. Concerns and apprehensions pertaining to overdiagnosis and generalization to their local patient populations are shared by over half of end-users. The majority of respondents expect to see transparent and explainable AI tools with the onus of the final decision with the radiologist.”
A survey of ASER members on artificial intelligence in emergencyradiology: trends, perceptions, and expectations
Anjali Agrawal et al.
Emergency Radiology 2023 (in press)

Background: As the number of conventional radiographic examinations in pediatric emergency departments increases, so, too, does the number of reading errors by radiologists.
Objective: The aim of this study is to investigate the ability of artificial intelligence (AI) to improve the detection of fractures by radiologists in children and young adults.
Materials and methods: A cohort of 300 anonymized radiographs performed for the detection of appendicular fractures in patients ages 2 to 21 years was collected retrospectively. The ground truth for each examination was established after an independent review by two radiologists with expertise in musculoskeletal imaging. Discrepancies were resolved by consensus with a third radiologist. Half of the 300 examinations showed at least 1 fracture. Radiographs were read by three senior pediatric radiologists and five radiology residents in the usual manner and then read again immediately after with the help of AI.
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
Results: The mean sensitivity for all groups was 73.3% (110/150) without AI; it increased significantly by almost 10% (P<0.001) to 82.8% (125/150) with AI. For junior radiologists, it increased by 10.3% (P<0.001) and for senior radiologists by 8.2% (P=0.08). On average, there was no significant change in specificity (from 89.6% to 90.3% [+0.7%, P=0.28]); for junior radiologists, specificity increased from 86.2% to 87.6% (+1.4%, P=0.42) and for senior radiologists, it decreased from 95.1% to 94.9% (-0.2%, P=0.23). The stand-alone sensitivity and specificity of the AI were, respectively, 91% and 90%.
Conclusion: With the help of AI, sensitivity increased by an average of 10% without significantly decreasing specificity in fracture detection in a predominantly pediatric population.
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
“We have shown that the diagnostic performance of junior and senior radiologists for fracture detection from conventional radiographs can be improved with the assistance of AI. The study confirms that AI is suitable for bone fracture detection in clinical practice even for young children. A prospective evaluation in a setting closer to the real-life scenario should be considered.”
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
“Second, our study was retrospective in nature, with readers in artificial reading conditions, which could affect their reading. Moreover, the performance of readers was assessed solely on their ability to make decisions from the radiograph alone, without any of the clinical information or medical history that can be crucial in decision-making, creating a context bias. This same limitation applies to the radiologists who determined the ground truth, as they also worked without clinical information. Clinical information could have increased the sensitivity and specificity of readers and would have been more akin to daily practice. Furthermore, in everyday practice, indications are diverse and do not concern only trauma. Finally, reading with AI immediately after reading without AI could have introduced some bias. A study with clinical information, two separate phases and a washout period in between should be considered to remove these biases.”
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226

Background: As the number of conventional radiographic examinations in pediatric emergency departments increases, so, too, does the number of reading errors by radiologists.
Objective: The aim of this study is to investigate the ability of artificial intelligence (AI) to improve the detection of fractures by radiologists in children and young adults.
Materials and methods: A cohort of 300 anonymized radiographs performed for the detection of appendicular fractures in patients ages 2 to 21 years was collected retrospectively. The ground truth for each examination was established after an independent review by two radiologists with expertise in musculoskeletal imaging. Discrepancies were resolved by consensus with a third radiologist. Half of the 300 examinations showed at least 1 fracture. Radiographs were read by three senior pediatric radiologists and five radiology residents in the usual manner and then read again immediately after with the help of AI.
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
Results: The mean sensitivity for all groups was 73.3% (110/150) without AI; it increased significantly by almost 10% (P<0.001) to 82.8% (125/150) with AI. For junior radiologists, it increased by 10.3% (P<0.001) and for senior radiologists by 8.2% (P=0.08). On average, there was no significant change in specificity (from 89.6% to 90.3% [+0.7%, P=0.28]); for junior radiologists, specificity increased from 86.2% to 87.6% (+1.4%, P=0.42) and for senior radiologists, it decreased from 95.1% to 94.9% (-0.2%, P=0.23). The stand-alone sensitivity and specificity of the AI were, respectively, 91% and 90%.
Conclusion: With the help of AI, sensitivity increased by an average of 10% without significantly decreasing specificity in fracture detection in a predominantly pediatric population.
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
“We have shown that the diagnostic performance of junior and senior radiologists for fracture detection from conventional radiographs can be improved with the assistance of AI. The study confirms that AI is suitable for bone fracture detection in clinical practice even for young children. A prospective evaluation in a setting closer to the real-life scenario should be considered.”
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226
“Second, our study was retrospective in nature, with readers in artificial reading conditions, which could affect their reading. Moreover, the performance of readers was assessed solely on their ability to make decisions from the radiograph alone, without any of the clinical information or medical history that can be crucial in decision-making, creating a context bias. This same limitation applies to the radiologists who determined the ground truth, as they also worked without clinical information. Clinical information could have increased the sensitivity and specificity of readers and would have been more akin to daily practice. Furthermore, in everyday practice, indications are diverse and do not concern only trauma. Finally, reading with AI immediately after reading without AI could have introduced some bias. A study with clinical information, two separate phases and a washout period in between should be considered to remove these biases.”
Assessment of an artificial intelligence aid for the detection of appendicular skeletal fractures in children and young adults by senior and junior radiologists
Toan Nguyen et al.
Pediatric Radiology (2022) 52:2215–2226

Purpose: To conduct a prospective observational study across 12 U.S. hospitals to evaluate real-time performance of an interpretable artificial intelligence (AI) model to detect COVID-19 on chest radiographs.
Materials and Methods: A total of 95 363 chest radiographs were included in model training, external validation, and real-time validation. The model was deployed as a clinical decision support system, and performance was prospectively evaluated. There were 5335 total real-time predictions and a COVID-19 prevalence of 4.8% (258 of 5335). Model performance was assessed with use of receiver operating characteristic analysis, precision-recall curves, and F1 score. Logistic regression was used to evaluate the association of race and sex with AI model diagnostic accuracy. To compare model accuracy with the performance of board-certified radiologists, a third dataset of 1638 images was read independently by two radiologists.
Conclusion: AI-based tools have not yet reached full diagnostic potential for COVID-19 and underperform compared with radiologist prediction.
Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study
Ju Sun, et al.
Radiology: Artificial Intelligence 2022; 4(4):e210217
Summary
This 12-site prospective study characterizes the real-time performance of an artificial intelligence–based diagnostic tool for COVID-19, which may serve as an adjunct to, but not as a replacement for, clinical decision-making in the diagnosis of COVID-19.
Key Points
• The COVID-19 artificial intelligence (AI) diagnostic tool achieved an area under the receiver operating characteristic curve of 0.70 on real-time validation.
• At equity and subgroup analysis, the AI tool demonstrated improved diagnostic capabilities in participants with more severe disease and in non-White participants, improved sensitivity in men, and improved specificity in women during real-time and external validations.
• The COVID-19 AI diagnostic system had significantly lower accuracy (63.5%) compared with radiologists (radiologist 1 = 67.8% correct, radiologist 2 = 68.6% correct; McNemar P , .001).
Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study
Ju Sun, et al.
Radiology: Artificial Intelligence 2022; 4(4):e210217
“In conclusion, AI-based diagnostic tools may serve as an adjunct to, but not a replacement for, clinical decision-making concerning COVID-19 diagnosis, which largely hinges on exposure history, signs, and symptoms. Although AI-based tools have not yet reached full diagnostic potential in COVID-19, they may still offer valuable information to clinicians when taken into consideration along with clinical signs and symptoms.”
Performance of a Chest Radiograph AI Diagnostic Tool for COVID-19: A Prospective Observational Study
Ju Sun, et al.
Radiology: Artificial Intelligence 2022; 4(4):e210217

Background: Cinematic Rendering (CR) is a recently introduced post-processing three-dimensional (3D) visualization imaging tool. The aim of this study was to assess its clinical value in the preoperative planning of deep inferior epigastric artery perforator (DIEP) or muscle-sparing transverse rectus abdominis myocutaneous (MS-TRAM) flaps, and to compare it with maximum intensity projection (MIP) images. The study presents the first application of CR for perforator mapping prior to autologous breast reconstruction
Conclusion: The current study serves as an explorative study, showing first experiences with CR in abdominal-based autologous breast reconstruction. In addition to MIP images, CR might improve the surgeon’s understanding of the individual’s anatomy. Future studies are required to compare CR with other 3D visualization tools and its possible effects on operative parameters
The third dimension in perforator mapping—Comparison of Cinematic Rendering and maximum intensity projection in abdominal-based autologous breast reconstruction
Journal of Plastic, Reconstructive & Aesthetic Surgery 75 (2022) 536–543
“CR is a promising 3D visualization technique and can assist surgeons in understanding the patient’s anatomy with all de- tails. There is a continuous development of the underlying algorithms, and the use of artificial intelligence might help to refine the tool in newer software versions. Future studies may consider comparison of CR with other 3D visualization tools. Moreover, the technique might aid the understanding of anatomy in different fields of reconstructive surgery, for example, in the treatment of complex tissue defects or hand surgery. Future studies are needed to investigate the use of CR in other free flap options and its possible positive effects concerning operative parameters, for example, flap harvest time or the occurrence of intraoperative complications”
The third dimension in perforator mapping—Comparison of Cinematic Rendering and maximum intensity projection in abdominal-based autologous breast reconstruction
Journal of Plastic, Reconstructive & Aesthetic Surgery 75 (2022) 536–543
Background: Patients with fractures are a common emergency presentation and may be misdiagnosed at radiologic imaging. An increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis.
Purpose: To perform a systematic review and meta-analysis comparing the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications and the gray literature (ie, articles published on preprint repositories).
Conclusion: Artificial intelligence (AI) and clinicians had comparable reported diagnostic performance in fracture detection, suggesting that AI technology holds promise as a diagnostic adjunct in future clinical practice.
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Rachel Y. L. Kuo et al.
Radiology 2022; 304:50–62
Summary
Artificial intelligence is noninferior to clinicians in terms of diagnostic performance in fracture detection, showing promise as a useful diagnostic tool.
Key Results
• In a systematic review and meta-analysis of 42 studies (37 studies with radiography and five studies with CT), the pooled diagnostic performance from the use of artificial intelligence (AI) to detect fractures had a sensitivity of 92% and 91% and specificity of 91% and 91%, on internal and external validation, respectively.
• Clinician performance had comparable performance to AI in fracture detection (sensitivity 91%, 92%; specificity 94%, 94%).
• Only 13 studies externally validated results, and only one study evaluated AI performance in a prospective clinical trial.
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Rachel Y. L. Kuo et al.
Radiology 2022; 304:50–62
“Future research should seek to externally validate algorithms in prospective clinical settings and provide a fair comparison with relevant clinicians: for example, providing clinicians with routine clinical detail. External validation and evaluation of algorithms in prospective randomized clinical trials is a necessary next step toward clinical deployment. Current artificial intelligence (AI) is designed as a diagnostic adjunct and may improve workflow through screening or prioritizing images on worklists and highlighting regions of interest for a reporting radiologist. AI may also improve diagnostic certainty through acting as a “second reader” for clinicians or as an interim report prior to radiologist interpretation. However, it is not a replacement for the clinical workflow, and clinicians must understand AI performance and exercise judgement in interpreting algorithm output. We advocate for transparent reporting of study methods and results as crucial to AI integration. By addressing these areas for development, deep learning has potential to streamline fracture diagnosis in a way that is safe and sustainable for patients and health care systems.”
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Rachel Y. L. Kuo et al.
Radiology 2022; 304:50–62

“The status of AI in medical imaging in the next 10 years will depend on regulatory policy, reimbursement models, success in the incorporation of AI into routine workflow, development and adoption of standards versus platforms for AI applications, and the level of success in generalizing deep learning algorithms to different machines, geographies, and diverse patient populations to minimize bias.”
Future Directions in Artificial Intelligence
Babak Saboury, Michael Morris, MD, Eliot Siegel
Radiol Clin N Am 59 (2021) 1085–1095

Background: Artificial Intelligence (AI)/Machine Learning (ML) applications have been proven efficient to improve diagnosis, to stratify risk, and to predict outcomes in many respective medical specialties, including in orthopaedics.
Challenges and Discussion: Regarding hip and knee reconstruction surgery, AI/ML have not made it yet to clinical practice.In this review, we present sound AI/ML applications in the field of hip and knee degenerative disease and reconstruction.From osteoarthritis (OA) diagnosis and prediction of its advancement, clinical decision-making, identification of hip andknee implants to prediction of clinical outcome and complications following a reconstruction procedure of these joints, we report how AI/ML systems could facilitate data-driven personalized care for our patients.
Applications of artificial intelligence and machine learning for the hip and knee surgeon: current state and implications for the future
Christophe Nich et al.
International Orthopaedics (2022) 46:937–944
“In a near future, AI/ML will probably provide the orthopaedic surgeon with key tools in an increasingly data-driven and data-dependent world. As the amount of patient-related data continues to grow, it is becoming evident that medical decisions will increasingly have recourse to AI/ML. The latter will need to be incorporated into the daily practice, with the help of automated algorithms for computers. Also, it is probable that advanced ML systems will overcome the problem of missing data. Advances in unsupervised learning will enable far greater characterization of patient’s risk factors for complications or failure following hip or knee reconstruction. Ultimately, this will lead to better surgical technique selection, improved outcomes,and lower healthcare costs.”
Applications of artificial intelligence and machine learning for the hip and knee surgeon: current state and implications for the future
Christophe Nich et al.
International Orthopaedics (2022) 46:937–944
Applications of artificial intelligence and machine learning for the hip and knee surgeon: current state and implications for the future
Christophe Nich et al.
International Orthopaedics (2022) 46:937–944

“Hip fractures are a major cause of morbidity and mortality in the elderly, and incur high health and social care costs. Given projected population ageing, the number of incident hip fractures is predicted to increase globally. As fracture classification strongly determines the chosen surgical treatment, differences in fracture classification influence patient outcomes and treatment costs. We aimed to create a machine learning method for identifying and classifying hip fractures, and to compare its performance to experienced human observers. We used 3659 hip radiographs, classified by at least two expert clinicians. The machine learning method was able to classify hip fractures with 19% greater accuracy than humans, achieving overall accuracy of 92%.”
Machine learning outperforms clinical experts in classification of hip fractures
E. A. Murphy et al.
Scientific Reports (Nature) (2022) 12:2058
"In this work, we have demonstrated that a trained neural network can classify hip fractures with 19% increased accuracy compared to human observers with experience of hip fracture classification in a clinical setting. In the work presented here, we used as ground truth the classification of 3,659 hip radiographs by at least two (and up to five) experts to achieve consensus. Thus, this analysis is a prototype only and a more extensive study is needed before this approach can be fully transformed to a clinical application. We envisage that this approach could be used clinically and aid in the diagnosis and in the treatment of patients who sustain hip fractures.”
Machine learning outperforms clinical experts in classification of hip fractures
E. A. Murphy et al.
Scientific Reports (Nature) (2022) 12:2058
Background: Patients with fractures are a common emergency presentation and may be misdiagnosed at radiologic imaging. An increasing number of studies apply artificial intelligence (AI) techniques to fracture detection as an adjunct to clinician diagnosis.
Purpose: To perform a systematic review and meta-analysis comparing the diagnostic performance in fracture detection between AI and clinicians in peer-reviewed publications and the gray literature (ie, articles published on preprint repositories).
Materials and Methods: A search of multiple electronic databases between January 2018 and July 2020 (updated June 2021) was per- formed that included any primary research studies that developed and/or validated AI for the purposes of fracture detection at any imaging modality and excluded studies that evaluated image segmentation algorithms. Meta-analysis with a hierarchical model to calculate pooled sensitivity and specificity was used. Risk of bias was assessed by using a modified Prediction Model Study Risk of Bias Assessment Tool, or PROBAST, checklist.
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Kuo RYL et al.
Radiology 2022; 000:1–13 • https://doi.org/10.1148/radiol.211785
Results: Included for analysis were 42 studies, with 115 contingency tables extracted from 32 studies (55 061 images). Thirty-seven studies identified fractures on radiographs and five studies identified fractures on CT images. For internal validation test sets, the pooled sensitivity was 92% (95% CI: 88, 93) for AI and 91% (95% CI: 85, 95) for clinicians, and the pooled specificity was 91% (95% CI: 88, 93) for AI and 92% (95% CI: 89, 92) for clinicians. For external validation test sets, the pooled sensitivity was 91% (95% CI: 84, 95) for AI and 94% (95% CI: 90, 96) for clinicians, and the pooled specificity was 91% (95% CI: 81, 95) for AI and 94% (95% CI: 91, 95) for clinicians. There were no statistically significant differences between clinician and AI performance. There were 22 of 42 (52%) studies that were judged to have high risk of bias. Meta-regression identified multiple sources of heterogeneity in the data, including risk of bias and fracture type.
Conclusion: Artificial intelligence (AI) and clinicians had comparable reported diagnostic performance in fracture detection, suggesting that AI technology holds promise as a diagnostic adjunct in future clinical practice.
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Kuo RYL et al.
Radiology 2022; 000:1–13 • https://doi.org/10.1148/radiol.211785
Summary
Artificial intelligence is noninferior to clinicians in terms of diagnostic performance in fracture detection, showing promise as a useful diagnostic tool.
Key Results
• In a systematic review and meta-analysis of 42 studies (37 studies with radiography and five studies with CT), the pooled diagnostic performance from the use of artificial intelligence (AI) to detect fractures had a sensitivity of 92% and 91% and specificity of 91% and 91%, on internal and external validation, respectively.
• Clinician performance had comparable performance to AI in fracture detection (sensitivity 91%, 92%; specificity 94%, 94%).
• Only 13 studies externally validated results, and only one study evaluated AI performance in a prospective clinical trial.
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Kuo RYL et al.
Radiology 2022; 000:1–13 • https://doi.org/10.1148/radiol.211785
“Current artificial intelligence (AI) is designed as a diagnostic adjunct and may improve workflow through screening or prioritizing images on worklists and highlighting regions of interest for a reporting radiologist. AI may also improve diagnostic certainty through acting as a “second reader” for clinicians or as an interim report prior to radiologist interpretation. However, it is not a replacement for the clinical workflow, and clinicians must understand AI performance and exercise judgement in interpreting algorithm output. We advocate for transparent reporting of study methods and results as crucial to AI integration. By addressing these areas for development, deep learning has potential to streamline fracture diagnosis in a way that is safe and sustainable for patients and health care systems.”
Artificial Intelligence in Fracture Detection: A Systematic Review and Meta-Analysis
Kuo RYL et al.
Radiology 2022; 000:1–13 • https://doi.org/10.1148/radiol.211785
Background: Proximal femoral fractures are an important clinical and public health issue associated with substantial morbidity and early mortality. Artificial intelligence might offer improved diagnostic accuracy for these fractures, but typical approaches to testing of artificial intelligence models can underestimate the risks of artificial intelligence- based diagnostic systems.
Methods: We present a preclinical evaluation of a deep learning model intended to detect proximal femoral fractures in frontal x-ray films in emergency department patients, trained on films from the Royal Adelaide Hospital (Adelaide, SA, Australia). This evaluation included a reader study comparing the performance of the model against five radiologists (three musculoskeletal specialists and two general radiologists) on a dataset of 200 fracture cases and 200 non-fractures (also from the Royal Adelaide Hospital), an external validation study using a dataset obtained from Stanford University Medical Center, CA, USA, and an algorithmic audit to detect any unusual or unexpected model behaviour.
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
Findings: In the reader study, the area under the receiver operating characteristic curve (AUC) for the performance of the deep learning model was 0·994 (95% CI 0·988–0·999) compared with an AUC of 0·969 (0·960–0·978) for the five radiologists. This strong model performance was maintained on external validation, with an AUC of 0·980 (0·931–1·000). However, the preclinical evaluation identified barriers to safe deployment, including a substantial shift in the model operating point on external validation and an increased error rate on cases with abnormal bones (eg, Paget’s disease).
Interpretation: The model outperformed the radiologists tested and maintained performance on external validation, but showed several unexpected limitations during further testing. Thorough preclinical evaluation of artificial intelligence models, including algorithmic auditing, can reveal unexpected and potentially harmful behaviour even in high-performance artificial intelligence systems, which can inform future clinical testing and deployment decisions.
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
Interpretation: The model outperformed the radiologists tested and maintained performance on external validation, but showed several unexpected limitations during further testing. Thorough preclinical evaluation of artificial intelligence models, including algorithmic auditing, can reveal unexpected and potentially harmful behaviour even in high-performance artificial intelligence systems, which can inform future clinical testing and deployment decisions.
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
Added value of this study: This study presents a thorough preclinical evaluation of a medical artificial intelligence system (trained to detect proximal femoral fractures on plain film imaging). Despite high performance of the model, which outperformed human experts in the task of proximal femoral fracture detection, an evaluation including algorithmic auditing showed unexpected and potentially harmful algorithmic behaviour.
Implications of all the available evidence: Thorough evaluation of artificial intelligence systems, including algorithmic auditing, can identify barriers to safe artificial intelligence deployment that might not be appreciated during standard preclinical testing and which could cause significant harm. Regulators, medical governance bodies, and professional groups should consider the need for more comprehensive preclinical testing of artificial intelligence before clinical deployment.
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
“We note that although our model shows high performance, and does not appear to deviate from human performance in prespecified subgroups,it does still make the occasional inhuman error (eg, misdiagnosing a highly displaced fracture). We also note on saliency mapping that although the model reproduces some recognizable aspects of human practice (eg, it appears to pay attention to Shenton’s line), the visualizations nonetheless raise concerns about the regions that are not highlighted in the heatmaps. In particular, the saliency maps almost never show strong activity along the outer region of the femoral neck, even in cases where the cortex in this area is clearly disrupted.”
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
"Our study evaluated a high-performance proximal femoral fracture detection deep learning model, which outperforms highly trained clinical specialists in diagnostic conditions, as well as other clinical readers in normal clinical conditions. The performance of the artificial intelligence system was maintained when applied to an external validation sample, and a thorough analysis of the behaviour of the artificial intelligence system shows that it is mostly consistent with that of human experts. We also characterized the occasional aberrant or unexpected behaviour of the artificial intelligence model which could inform future clinical testing protocols. We next intend to test our model in a clinical environment, in the form of an interventional randomised controlled trial.”
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
"Our study had a number of limitations. First, the deep learning model itself is limited by being unable to act on cases with implanted metalwork (although our system is able to automatically identify these cases and exclude them from analysis). Second, the sample size of the MRMC study was limited by the availability of readers; we determined a total dataset of 400 cases (200 positive and 200 negative cases) was as many as we could reasonably expect the readers to review, and only five radiologists reviewed the cases under diagnostic conditions as defined in the local standards of practice.”
Validation and algorithmic audit of a deep learning system for the detection of proximal femoral fractures in patients in the emergency department: a diagnostic accuracy study
Lauren Oakden-Rayner et al.
www.thelancet.com/digital-health Published online April 5, 2022 https://doi.org/10.1016/S2589-7500(22)00004-8
Objectives: To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists.
Results: The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively.
Conclusions: An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents.
Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors
Claudio E. von Schacky et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08764-w
“In this study, machine learning models based on radiomics and demographic information were developed and validated to distinguish between benign and malignant bone lesions on radiographs and compared to radiologists on an external test set. Overall, machine learning models using the combination of radiomics and demographic information showed a higher diagnostic accuracy than machine learning models using radiomics or demographic information only. The best model was based on an ANN that used both radiomics and demographic information. On an external test set, this model demonstrated lower accuracy compared to radiologists specialized in musculoskeletal tumor imaging, while accuracy was higher or similar compared to radiology residents.”
Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors
Claudio E. von Schacky et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08764-w
Results: The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively.
Conclusions: An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents.
Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors
Claudio E. von Schacky et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08764-w
“In conclusion, a machine learning model using both radiomic features and demographic information was developed that showed high accuracy and discriminatory power for the distinction between benign and malignant bone tumors on radiographs of patients that underwent biopsy. The best model was based on an ANN that used both radiomics and demographic information resulting in an accuracy higher or similar compared to radiology residents. A model such as this may enhance diagnostic decision-making especially for radiologists or physicians with limited experience and may therefore improve the diagnostic work up of bone tumors.”
Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors
Claudio E. von Schacky et al.
European Radiology 2022https://doi.org/10.1007/s00330-022-08764-w

Background: The interpretation of radiographs suffers from an ever-increasing workload in emergency and radiology departments, while missed fractures represent up to 80% of diagnostic errors in the emergency department.
Purpose: To assess the performance of an artificial intelligence (AI) system designed to aid radiologists and emergency physicians in the detection and localization of appendicular skeletal fractures.
Conclusion: The artificial intelligence aid provided a gain of sensitivity (8.7% increase) and specificity (4.1% increase) without loss of reading speed.
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
Materials and Methods: The AI system was previously trained on 60 170 radiographs obtained in patients with trauma. The radio- graphs were randomly split into 70% training, 10% validation, and 20% test sets. Between 2016 and 2018, 600 adult patients in whom multiview radiographs had been obtained after a recent trauma, with or without one or more fractures of shoulder, arm, hand, pelvis, leg, and foot, were retrospectively included from 17 French medical centers. Radiographs with quality precluding hu- man interpretation or containing only obvious fractures were excluded. Six radiologists and six emergency physicians were asked to detect and localize fractures with (n = 300) and fractures without (n = 300) the aid of software highlighting boxes around AI- detected fractures. Aided and unaided sensitivity, specificity, and reading times were compared by means of paired Student t tests after averaging of performances of each reader.
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
Results: A total of 600 patients (mean age 6 standard deviation, 57 years 6 22; 358 women) were included. The AI aid improved the sensitivity of physicians by 8.7% (95% CI: 3.1, 14.2; P = .003 for superiority) and the specificity by 4.1% (95% CI: 0.5, 7.7; P < .001 for noninferiority) and reduced the average number of false-positive fractures per patient by 41.9% (95% CI: 12.8, 61.3; P = .02) in patients without fractures and the mean reading time by 15.0% (95% CI: 230.4, 3.8; P = .12). Finally, stand-alone perfor- mance of a newer release of the AI system was greater than that of all unaided readers, including skeletal expert radiologists, with an area under the receiver operating characteristic curve of 0.94 (95% CI: 0.92, 0.96).
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
Results: A total of 600 patients (mean age 6 standard deviation, 57 years 6 22; 358 women) were included. The AI aid improved the sensitivity of physicians by 8.7% (95% CI: 3.1, 14.2; P = .003 for superiority) and the specificity by 4.1% (95% CI: 0.5, 7.7; P < .001 for noninferiority) and reduced the average number of false-positive fractures per patient by 41.9% (95% CI: 12.8, 61.3; P = .02) in patients without fractures and the mean reading time by 15.0% (95% CI: 230.4, 3.8; P = .12). Finally, stand-alone perfor- mance of a newer release of the AI system was greater than that of all unaided readers, including skeletal expert radiologists, with an area under the receiver operating characteristic curve of 0.94 (95% CI: 0.92, 0.96).
Conclusion: The artificial intelligence aid provided a gain of sensitivity (8.7% increase) and specificity (4.1% increase) without loss of reading speed.
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
Summary
The artificial intelligence aid improved the sensitivity and specificity of radiologists and emergency physicians in the localization of appendicular fractures on radiographs, with no additional reading time.
Key Results
• The artificial intelligence (AI) aid, which highlighted potential fractures on full-resolution radiographs, improved the sensitivity (8.7% increase, P = .006) and specificity (4.1% increase, P = .03) of emergency doctors and radiologists in the diagnosis of appen- dicular fractures.
• The stand-alone area under the receiver operating characteristic curve, requiring that the AI system detect the precise locations of all fractures on an examination, was .94 with a newer release of the AI system.
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
"Our study had several limitations. First, readers and the AI sys- tem were assessed on their ability to make decisions based on image analysis alone, without knowledge about the findings from the patients’ physical examination or their medical history, creating a context bias. Clinical data can be crucial in making decisions; however, in our experience radiologists often lack relevant clinical data. Second, a Hawthorne effect may have affected the performances of readers, that is, a modification of their behavior in response to their awareness of being observed for the research project, leading, for instance, to a more thorough reading than in clinical practice. Similarly, cognitive biases related to the emer- gency setting could not be replicated in a retrospective study.”
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129
"In conclusion, we showed that a deep learning algorithm aided emergency physicians and radiologists in improving their diagnostic performance and boosting their time efficiency in the localization of all appendicular bone fractures on plain radiographs. The algorithm improved as updates were made, which bodes well for helping physicians cope with the increasing work- load more effectively, and an evaluation in future prospective studies will be needed.”
Assessment of an AI Aid in Detection of Adult Appendicular Skeletal Fractures by Emergency Physicians and Radiologists: A Multicenter Cross-sectional Diagnostic Study
Loïc Duron et al.
Radiology 2021; 300:120–129

“Artificial intelligence (AI) has the potential to affect every step of the radiology workflow, but the AI application that has received the most press in recent years is image interpretation, with numerous articles describing how AI can help detect and characterize abnormalities as well as monitor disease response. Many AI-based image interpretation tasks for musculoskeletal (MSK) pathologies have been studied, including the diagnosis of bone tumors, detection of osseous metastases, assessment of bone age, identification of fractures, and detection and grading of osteoarthritis. This article explores the applications of AI for image interpretation of MSK pathologies.”
Pattern Recognition in Musculoskeletal Imaging Using Artificial Intelligence
Natalia Gorelik, Jaron Chong, Dana J. Lin
Semin Musculoskelel Radiol 2020;24:38–49.
“Arttificial intelligence (AI) has the potential to affect every step of the radiology workflow from ordering with clinical decision support, examination scheduling and protocoling, image acquisition and reconstruction, radiation dose estimation and reduction, quality control, optimization of automatic image display with hanging protocols, worklist management with prioritization of urgent or abnormal studies, integration of radiologic data with clinical data, quantitative image analy- sis, structured reporting, delivery of results to the referring location. selected and input to a ML classifier, like support vector physician, to billing and coding.”
Pattern Recognition in Musculoskeletal Imaging Using Artificial Intelligence
Natalia Gorelik, Jaron Chong, Dana J. Lin
Semin Musculoskelel Radiol 2020;24:38–49.
“The proficiency of AI applications in pattern recognition holds great promise for improving patient care through achieving higher diagnostic accuracy, better predicting individual out- comes, and increasing radiologists’ efficiency, which is essential tial in light of the ever-increasing imaging volumes in both absolute number of examinations as well as the amount of data per study. In this article we reviewed how pattern recognition in MSK imaging using AI could facilitate the diagnosis of bone tumors, detection of bone metastases, evaluation of pediatric bone age, identification of fractures, labeling of images, and assessment of OA. Future research will no doubt further expand on the variety of MSK pathologies that can be addressed with AI-based solutions. As this field continues to evolve, radiology researchers, societies, and industry will collaborate to tackle the challenges ahead to improve radiololgy, technology, and patient care.”
Pattern Recognition in Musculoskeletal Imaging Using Artificial Intelligence
Natalia Gorelik, Jaron Chong, Dana J. Lin
Semin Musculoskelel Radiol 2020;24:38–49.
“The advent of AI in radiology lends a quantitative lens to the imaging practice to create more value for the patient and the referring physicians. We anticipate that integration of AI tools with BI&A will continue to rise at a rapid pace, particularly as demands for quality and efficiency grow, and our imaging informatics infrastructure grows increasingly complex. Specifically, the most salient growth will depend on the guidance of national professional societies such as the ACR to align AI development along appropriate standards, and the fastest business AI development is likely to arise from the pressure points along various regulatory drivers such as merit-based incentive payments and APMs.”
From Data to Value: How Artificial Intelligence Augments the Radiology Business to Create Value
Teresa Martin-Carreras, Po-Hao Chen
Semin Musculoskelet Radiol 2020;24:65–73.
From Data to Value: How Artificial Intelligence Augments the Radiology Business to Create Value
Teresa Martin-Carreras, Po-Hao Chen
Semin Musculoskelet Radiol 2020;24:65–73.
“AI implemented poorly risks pushing humanity to the margins; done wisely, AI can free up physicians’ cognitive and emotional space for patients, and shift the focus away from transactional tasks to personalized care. The challenge will be for humans to have the wisdom and willingness to discern AI’s optimal role in twenty-first century healthcare, and to determine when it strengthens and when it undermines human healing.”
Ten Ways Artificial Intelligence Will Transform Primary Care
Steven Y. Lin, Megan R. Mahoney, Christine A. Sinsky
J Gen Intern Med 34(8):1626–30

“The use of AI has the potential to greatly enhance every component of the imaging value chain. From assessing the appropriateness of imaging orders to helping predict patients at risk for fracture, AI can increase the value that musculoskeletal imagers provide to their patients and to referring clinicians by improving image quality, patient centricity, imaging efficiency, and diagnostic accuracy.”
Artificial Intelligence in Musculoskeletal Imaging: Current Status and Future Directions
Gyftopoulos S et al.
AJR 2019; 213:1–8
“Several studies have shown promising results of using ML to determine bone age. Using datasets from two separate chil- dren’s hospitals, Larson et al. found that their deep CNN was able to estimate skeletal maturity with accuracy comparable to that of an expert radiologist as well as to that of existing automated bone age software. Tajmir et al. showed that AI-assisted radiologist interpretation performed better than AI alone, a radiologist alone, or a pooled cohort of experts, by increasing accuracy and decreasing variability and the root-mean-square error. Their findings suggest that the most optimal use of AI for determination of bone age may be in combination with a radiologist’s interpretation.”
Artificial Intelligence in Musculoskeletal Imaging: Current Status and Future Directions
Gyftopoulos S et al.
AJR 2019; 213:1–8
“The use of AI has the potential to greatly enhance every component of the imaging value chain. From assessing the appropriate- ness of imaging orders to helping predict patients at risk for fracture, AI can increase the value that MSK imagers provide to their patients and to referring clinicians by improving image quality, patient centricity, imaging efficiency, and diagnostic accuracy.”
Artificial Intelligence in Musculoskeletal Imaging: Current Status and Future Directions
Gyftopoulos S et al.
AJR 2019; 213:1–8
“Radiomics is an emerging field in medicine that is based on the extraction of diverse quantitative characteristics from images and the use of these characteristics for data mining and pattern identification. These data can then be used with other patient information to better characterize and predict disease processes. ML techniques have led to a rapid expansion of the potential of radiomics to impact clinical care. For instance, the description of a sarcoma diagnosed on MRI will typically include estimates of tumor size, shape, and enhancement pattern. ML-driven algorithms can also identify and collect other characteristics that are not easily appreciated on images (e.g., texture analysis, image intensity histograms, and image voxel relationships) and can lead to more precise treatment.”
Artificial Intelligence in Musculoskeletal Imaging: Current Status and Future Directions
Gyftopoulos S et al.
AJR 2019; 213:1–8

AI and MR of the Knee
Purpose: To investigate the feasibility of using a deep learning–based approach to detect an anterior cruciate ligament (ACL) tear within the knee joint at MRI by using arthroscopy as the reference standard.
Results: The sensitivity and specificity of the ACL tear detection system at the optimal threshold were 0.96 and 0.96, respectively. In comparison, the sensitivity of the clinical radiologists ranged between 0.96 and 0.98, while the specificity ranged between 0.90 and 0.98. There was no statistically significant difference in diagnostic performance between the ACL tear detection system and clinical radiologists at P < .05. The area under the ROC curve for the ACL tear detection system was 0.98, indicating high overall diagnostic accuracy.
Conclusion: There was no significant difference between the diagnostic performance of the ACL tear detection system and clinical radiologists for determining the presence or absence of an ACL tear at MRI.
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091
Purpose: To investigate the feasibility of using a deep learning–based approach to detect an anterior cruciate ligament (ACL) tear within the knee joint at MRI by using arthroscopy as the reference standard.0.98, indicating high overall diagnostic accuracy.
Conclusion: There was no significant difference between the diagnostic performance of the ACL tear detection system and clinical radiologists for determining the presence or absence of an ACL tear at MRI.
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091
Results: The sensitivity and specificity of the ACL tear detection system at the optimal threshold were 0.96 and 0.96, respectively. In comparison, the sensitivity of the clinical radiologists ranged between 0.96 and 0.98, while the specificity ranged between 0.90 and 0.98. There was no statistically significant difference in diagnostic performance between the ACL tear detection system and clinical radiologists at P < .05. The area under the ROC curve for the ACL tear detection system was 0.98, indicating high overall diagnostic accuracy.
Conclusion: There was no significant difference between the diagnostic performance of the ACL tear detection system and clinical radiologists for determining the presence or absence of an ACL tear at MRI.
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091
Summary
* There was no statistically significant difference between the anterior cruciate ligament (ACL) tear detection system and clinical radiologists with varying levels of experience for determining the presence or absence of a full-thickness ACL tear using sagittal proton density–weighted and fat-suppressed T2-weighted fast spin-echo MR images.
Key Points
* There was no significant difference between the diagnostic performance of a fully automated deep learning–based diagnosis system and clinical radiologists for detecting a full-thickness anterior cruciate ligament (ACL) tear at MRI.
* Sensitivity and specificity of the ACL tear detection system at the optimal threshold were 0.96 and 0.96, respectively; the sensitivity of the clinical radiologists ranged between 0.96 and 0.98 and specificity ranged between 0.90 and 0.98.
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091
Key Points
* There was no significant difference between the diagnostic performance of a fully automated deep learning–based diagnosis system and clinical radiologists for detecting a full-thickness anterior cru- ciate ligament (ACL) tear at MRI.
* Sensitivity and specificity of the ACL tear detection system at the optimal threshold were 0.96 and 0.96, respectively; the sensitivity of the clinical radiologists ranged between 0.96 and 0.98 and specificity ranged between 0.90 and 0.98.
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091
Fully Automated Diagnosis of Anterior Cruciate Ligament Tears on Knee MR Images by Using Deep Learning
Fang Liu et al.
Radiology: Artificial Intelligence 2019; 1(3):e180091 • https://doi.org/10.1148/ryai.2019180091

“This study showed that a deep learning model can be trained to detect wrist fractures in radiographs with diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons. Additionally, this study showed that, when emergency medicine clinicians are provided with the assistance of the trained model, their ability to detect wrist fractures can be significantly improved, thus diminishing diagnostic errors and also improving the clinicians’ efficiency."
Deep neural network improves fracture detection by clinicians
Lindsey R et al.
PNAS | November 6, 2018 | vol. 115 | no. 45 | 11591–11596
“The approach of this investigation is to apply machine learning algorithms trained by experts in the field to less experienced clinicians (who are at particular risk for diagnostic errors yet responsible for primary patient care and triage) to improve both their performance and efficiency. The learning model presented in this study mitigates these factors.”
Deep neural network improves fracture detection by clinicians
Lindsey R et al.
PNAS | November 6, 2018 | vol. 115 | no. 45 | 11591–11596
“This study shows that deep learning models offer potential for subspecialized clinicians (without machine learning experience) to teach computers how to emulate their diagnostic expertise and thereby help patients on a global scale. Although teaching the model is a laborious process requiring collecting thousands of radiographs and carefully labeling them, making a prediction using the trained model takes less than a second on a modern computer.”
Deep neural network improves fracture detection by clinicians
Lindsey R et al.
PNAS | November 6, 2018 | vol. 115 | no. 45 | 11591–11596
“ Historically, computer-assisted detection (CAD) in radiology has failed to achieve improvements in diagnostic accuracy, decreasing clinician sensitivity and leading to unnecessary further diagnostic tests. With the advent of deep learning approaches to CAD, there is great excitement about its appli- cation to medicine, yet there is little evidence demonstrating improved diagnostic accuracy in clinically-relevant applica- tions. We trained a deep learning model to detect fractures on radiographs with a diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons. We demonstrate that when emergency medicine clinicians are provided with the assistance of the trained model, their ability to accurately detect fractures significantly improves.”
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596
In this work, we developed a deep neural network to detect and localize fractures in radiographs. We trained it to accurately emulate the expertise of 18 senior sub- specialized orthopedic surgeons by having them annotate 135,409 radiographs. We then ran a controlled experiment with emergency medicine clinicians to evaluate their ability to detect fractures in wrist radiographs with and without the assistance of the deep learning model. The average clinician’s sensitivity was 80.8% (95% CI, 76.7–84.1%) unaided and 91.5% (95% CI, 89.3–92.9%) aided, and specificity was 87.5% (95 CI, 85.3–89.5%) unaided and 93.9% (95% CI, 92.9–94.9%) aided. The average clinician experienced a relative reduction in misinterpretation rate of 47.0% (95% CI, 37.4– 53.9%).
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596
“The significant improvements in diagnostic accuracy that we observed in this study show that deep learning methods are a mechanism by which senior medical specialists can deliver their expertise to generalists on the front lines of medicine, thereby providing substantial improvements to patient care.”
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596
Misinterpretation of radiographs may have grave consequences, resulting in complications including malunion with restricted range of motion, posttraumatic osteoarthritis, and joint collapse, the latter of which may require joint replacement. Misdiagnoses are also the primary cause of malpractice claims or litigation. There are multiple factors that can contribute to radiographic misinterpretations of fractures by clinicians, including physician fatigue, lack of subspecialized expertise, and inconsistency among reading physicians.
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596
“The approach of this investigation is to apply machine learning algorithms trained by experts in the field to less experienced clinicians (who are at particular risk for diagnostic errors yet responsible for primary patient care and triage) to improve both their performance and efficiency. The learning model presented in this study mitigates these factors. It does not become fatigued, it always provides a consistent read, and it gains subspecialized expertise by being provided with labeled radiographs from human experts.”
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596
Thus, we speculate that, someday, technology may permit any patient whose clinician has computer access to receive the same high-quality radiographic interpretations as those received by the patients of senior subspecialized experts.
Deep neural network improves fracture detection by clinicians
Robert Lindsey et al.
Proc Natl Acad Sci U S A. 2018 Nov 6;115(45):11591-11596

“We have shown that radiological scores can be predicted to an excellent standard using only the disc-specific assessments as a reference set. The proposed method is quite general, and although we have implemented it here for sagittal T2 scans, it could easily be applied to T1 scans or axial scans, and for radiological features not studied here or indeed to any medical task where label/grading might be available only for a small region or a specific anatomy of an image. One benefit of automated reading is to produce a numerical signal score that would provide a scale of degeneration and so avoid an arbitrary categorization into artificial grades.”
Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist
Jamaludin A et al.
Eur Spine J 2018; DOI 10.1007/s00586-017-4956-3
“Automation of radiological grading is now on par with human performance. The system can be beneficial in aiding clinical diagnoses in terms of objectivity of gradings and the speed of analysis. It can also draw the attention of a radiologist to regions of degradation. This objectivity and speed is an important stepping stone in the investigation of the relationship between MRIs and clinical diagnoses of back pain in large cohorts.”
Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist
Jamaludin A et al.
Eur Spine J 2018; DOI 10.1007/s00586-017-4956-3
The process in a flow chart
One of the biggest potential bottlenecks that could inhibit or derail AI development and adoption in health care is the availability of sufficient quantities of high-quality data in standardized formats. As noted earlier, information today is highly fragmented and spread across the industry, residing in diverse, mostly uncoordinated repositories like electronic medical records, laboratory and imaging systems, physician notes, and health-insurance claims. Merging this information into large, integrated databases, which is required to empower AI to develop the deep understanding of diseases and their cures, is difficult.
Artificial Intelligence- The Next Digital Frontier
McKinsey Global Institute(2017)

FDA Statement
The OsteoDetect software is a computer-aided detection and diagnostic software that uses an artificial intelligence algorithm to analyze two-dimensional X-ray images for signs of distal radius fracture, a common type of wrist fracture. The software marks the location of the fracture on the image to aid the provider in detection and diagnosis.
FDA Statement
OsteoDetect analyzes wrist radiographs using machine learning techniques to identify and highlight regions of distal radius fracture during the review of posterior-anterior (front and back) and medial-lateral (sides) X-ray images of adult wrists. OsteoDetect is intended to be used by clinicians in various settings, including primary care, emergency medicine, urgent care and specialty care, such as orthopedics. It is an adjunct tool and is not intended to replace a clinician’s review of the radiograph or his or her clinical judgment.
FDA Approval Statement (AIDOC)

"Deep learning–based approaches have the potential to maximize diagnostic performance for detecting cartilage degeneration and acute cartilage injury within the knee joint while reducing subjectivity, variability, and errors due to distraction and fatigue associated with human interpretation."
Deep Learning Approach for Evaluating Knee MR Images: Achieving High Diagnostic Performance for Cartilage Lesion Detection
FangLiu et al.
Radiology 2018 (in press)

Skeletal bone age assessment is a common clinical practice to investigate endocrinology, genetic and growth disorders in children. It is generally performed by radiological examination of the left hand by using either the Greulich and Pyle (G&P) method or the Tanner-Whitehouse (TW) one. However, both clinical procedures show several limitations, from the examination effort of radiologists to (most importantly) significant intra- and inter-operator variability. To address these problems, several automated approaches (especially relying on the TW method) have been proposed; nevertheless, none of them has been proved able to generalize to different races, age ranges and genders. In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state-of-the-art performance.

 Deep learning for automated skeletal bone age assessment in X-ray images. Spampinato C et al. Med Image Anal. 2017 Feb;36:41-51 
“In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state-of-the-art performance. Furthermore, this is the first automated skeletal bone age assessment work tested on a public dataset and for all age ranges, races and genders, for which the source code is available, thus representing an exhaustive baseline for future research in the field. Beside the specific application scenario, this paper aims at providing answers to more general questions about deep learning on medical images: from the comparison between deep-learned features and manually-crafted ones, to the usage of deep-learning methods trained on general imagery for medical problems, to how to train a CNN with few images.”

 Deep learning for automated skeletal bone age assessment in X-ray images.

“An automated machine learning computer system was created to detect, anatomically localize, and categorize vertebral compression fractures at high sensitivity and with a low false-positive rate, as well as to calculate vertebral bone density, on CT images.” Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
“Sensitivity for detection or localization of compression fractures was 95.7% (201 of 210; 95% confidence interval [CI]: 87.0%, 98.9%), with a false-positive rate of 0.29 per patient. Additionally, sensitivity was 98.7% and specificity was 77.3% at case-based receiver operating characteristic curve analysis.” 

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
“This system performed with 95.7% sensitivity in fracture detection and lo- calization to the correct vertebral level, with a low false-positive rate. There was a high level of overall agreement (95%) for compression morphology and 68% overall agreement for severity categorization relative to radiologist classification.” 

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
*A fully automated machine learning software system with which to detect, localize, and classify compression fractures and determine the bone density of thoracic and lumbar vertebral bodies on CT images was developed and validated.  * The computer system has a sensitivity of 95.7% in the detection of compression fractures and in the localization of these fractures to the correct vertebrae, with a false-positive rate of 0.29 per patient.  * The accuracy of this computer system in fracture classification by Genant type was 95% (weighted k = 0.90).  

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)

“An automated machine learning computer system was created to detect, anatomically localize, and categorize vertebral compression fractures at high sensitivity and with a low false-positive rate, as well as to calculate vertebral bone density, on CT images.” 

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
“Sensitivity for detection or localization of compression fractures was 95.7% (201 of 210; 95% confidence interval [CI]: 87.0%, 98.9%), with a false-positive rate of 0.29 per patient. Additionally, sensitivity was 98.7% and specificity was 77.3% at case-based receiver operating characteristic curve analysis.” 

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
“This system performed with 95.7% sensitivity in fracture detection and lo- calization to the correct vertebral level, with a low false-positive rate. There was a high level of overall agreement (95%) for compression morphology and 68% overall agreement for severity categorization relative to radiologist classification. .” 

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)
* A fully automated machine learning software system with which to detect, localize, and classify compression fractures and determine the bone density of thoracic and lumbar vertebral bodies on CT images was developed and validated.  
* The computer system has a sensitivity of 95.7% in the detection of compression fractures and in the localization of these fractures to the correct vertebrae, with a false-positive rate of 0.29 per patient.
 * The accuracy of this computer system in fracture classification by Genant type was 95% (weighted k = 0.90).  

Vertebral Body Compression Fractures and Bone Density: Automated Detection and Classification on CT Images  Burns JE et al. Radiology (in press)