Creating Social Media Images on the Fly

Social media is a visual medium. A tweet with an image gets 150% more retweets than one without. A blog post without a header image is invisible.

But most AI agents are text-only. They can write a witty caption, but they can't take the photo. This forces a human to step in, go to Unsplash, find a generic stock photo, and upload it. The automation chain breaks.

The Solution: The Artist Tool

We can give our agent access to an image generation model (like DALL-E 3). This allows the agent to not just write about a concept, but to visualize it.

The Implementation

We wrap the OpenAI Image API in a Dwizi tool.

/**
 * Generates an image based on a prompt.
 * 
 * Description for LLM: "Generate an image to accompany a social media post. Be descriptive."
 */

type Input = {
  prompt: string;
  size?: "1024x1024" | "512x512";
};

export default async function generateImage(args: Input) {
  const apiKey = Deno.env.get("OPENAI_API_KEY");
  if (!apiKey) throw new Error("Missing OPENAI_API_KEY");

  const { prompt, size = "1024x1024" } = args;

  // We call the DALL-E 3 endpoint.
  // Note: This API call can take 10-15 seconds. 
  // Dwizi handles this long-running request gracefully.
  const res = await fetch("https://api.openai.com/v1/images/generations", {
    method: "POST",
    headers: {
      Authorization: `Bearer ${apiKey}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "dall-e-3",
      prompt,
      n: 1,
      size,
    }),
  });

  const data = await res.json();

  if (data.error) {
    return { success: false, error: data.error.message };
  }

  // We return the URL.
  // The LLM can now embed this URL in a Markdown image tag: ![Alt](url)
  return {
    success: true,
    url: data.data[0].url,
    revisedPrompt: data.data[0].revised_prompt
  };
}

The Creativity Loop

What makes this powerful is that the LLM is also the Art Director.

User: "Write a tweet about how coffee fuels coding."

Agent Thought Process:

Copywriting: "Coffee: The process of turning caffeine into code. ☕️💻"
Art Direction: "I need an image that represents this. Maybe a cyberpunk coffee cup with matrix code flowing out of the steam."
Action: Calls generate_image("Cyberpunk coffee cup, glowing green matrix code in the steam, dark background, neon lights").
Result: It gets a stunning, unique image URL.
Final Output: It posts the text AND the image together.

The Execution

The agent becomes a full-stack content creator. It doesn't just suggest ideas; it produces the final asset, ready for publishing.

The Solution: The Artist Tool

The Implementation

The Creativity Loop

The Execution

Subscribe to Dwizi Blog

Read Next

Project Management (Linear/Jira)

Currency Conversion (Determinism)

The Junior Dev (GitHub)