We present a comprehensive multi-center evaluation of 20 state-of-the-art bone segmentation models across 1,000 clinically resampled CT/CBCT scans. Our study reveals that image sharpness, isotropic smaller voxels, and neutral orientation significantly improve segmentation performance, while metallic osteosynthesis and anatomical complexity lead to significant degradation. The findings highlight the gap between benchmark performance and real-world clinical applicability, emphasizing the need for robustness evaluation in socio-technical systems.