Multimodal AI Basics: Combining Text, Image, and Audio with LLMs