Hacker News

Show HN: Imagent – agentic image/video/speech generation

4 points by unliftedq ago | 3 comments

Imagent gives AI agents the ability to generate images, video, and speech as a first-class step in their workflows, behind a single interface that hides the differences between providers and models

ankurchrungoo [-]

I have been working on something similar but also quite different, and I find this interesting. Do you foresee that the cost of video generation, for example, be managed by the user via his own API keys configured for different platforms?

unliftedq |root |parent [-]

I built this tool to resolve my own pain point of agentic automation:

1. Existing CLI solution is provider specific, like minimax cli, chatgpt cli, etc. and for other providers, there's no built-in CLI support.

2. For local CLI/scripts solution, the generation result/history is not tracked, sometimes, I want to generate a similar image, I have to keep the prompts in a notebook. Now, with imagent, I can simply remix any prompt from the library.

3. CLI is the best solution for agent automation, I can use the cli to generate slides, blog illustrations, website assets, etc. And with it, I can even generate videos with hyperframes/remotions with great illustrations and speech audios. All it done by agent, I don't need to create images, audios by myself.

4. Agent isn't aware of the difference/limitation of different models, by maintain the catalogs, agent can discover what is available and choose the best options as its need. And call it in a unified interface.

So, the cost of video generation is not my focus at this point, what I care is automation. If we give the agent such ability, what the agent can create for us automatically. Not vibe code, but vibe creation.

ankurchrungoo |root |parent [-]

Makes sense, thanks for the detailed reply!