Vision-language Model for Medical Images and Reports

By combining the power of language and vision, large-scale vision-language models could unlocked exciting possibilities for the future applications.

For instance, we could get more powerful image representations by leveraging the rich information in free-text reports.

Furthermore, these models can be trained to generate descriptive captions for medical images, facilitating automated radiology reports generation.