Creating Social Media Images on the Fly
Combine LLM creativity with image generation APIs to build a social media factory.
Dwizi Team
Editorial
Creating Social Media Images on the Fly
Social media is a visual medium. A tweet with an image gets 150% more retweets than one without. A blog post without a header image is invisible.
But most AI agents are text-only. They can write a witty caption, but they can't take the photo. This forces a human to step in, go to Unsplash, find a generic stock photo, and upload it. The automation chain breaks.
The Solution: The Artist Tool
We can give our agent access to an image generation model (like DALL-E 3). This allows the agent to not just write about a concept, but to visualize it.
The Implementation
We wrap the OpenAI Image API in a Dwizi tool.
/**
* Generates an image based on a prompt.
*
* Description for LLM: "Generate an image to accompany a social media post. Be descriptive."
*/
type Input = {
prompt: string;
size?: "1024x1024" | "512x512";
};
export default async function generateImage(args: Input) {
const apiKey = Deno.env.get("OPENAI_API_KEY");
if (!apiKey) throw new Error("Missing OPENAI_API_KEY");
const { prompt, size = "1024x1024" } = args;
// We call the DALL-E 3 endpoint.
// Note: This API call can take 10-15 seconds.
// Dwizi handles this long-running request gracefully.
const res = await fetch("https://api.openai.com/v1/images/generations", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "dall-e-3",
prompt,
n: 1,
size,
}),
});
const data = await res.json();
if (data.error) {
return { success: false, error: data.error.message };
}
// We return the URL.
// The LLM can now embed this URL in a Markdown image tag: 
return {
success: true,
url: data.data[0].url,
revisedPrompt: data.data[0].revised_prompt
};
}
The Creativity Loop
What makes this powerful is that the LLM is also the Art Director.
User: "Write a tweet about how coffee fuels coding."
Agent Thought Process:
- Copywriting: "Coffee: The process of turning caffeine into code. ☕️💻"
- Art Direction: "I need an image that represents this. Maybe a cyberpunk coffee cup with matrix code flowing out of the steam."
- Action: Calls
generate_image("Cyberpunk coffee cup, glowing green matrix code in the steam, dark background, neon lights"). - Result: It gets a stunning, unique image URL.
- Final Output: It posts the text AND the image together.
The Execution
The agent becomes a full-stack content creator. It doesn't just suggest ideas; it produces the final asset, ready for publishing.
Subscribe to Dwizi Blog
Get stories on the future of work, autonomous agents, and the infrastructure that powers them. No fluff.
We respect your inbox. Unsubscribe at any time.
Read Next
Project Management (Linear/Jira)
Stop manually creating tickets. Let the conversation become the ticket.
Currency Conversion (Determinism)
A simple tool that proves a big point: Why we need 'Islands of Truth' in a sea of hallucination.
The Junior Dev (GitHub)
Automating the first 15 minutes of every bug report. How to build an agent that triages issues.