Upload images and get captions. Use the captions to generate speech. Translate the captions to other languages.