Hacker News
Show HN: PageAgent, A GUI agent that lives inside your web app
Hi HN,
I'm building PageAgent, an open-source (MIT) library that embeds an AI agent directly into your frontend.
I built this because I believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.
Currently, most AI agents operate from external clients or server-side programs, effectively leaving web development out of the AI ecosystem. I'm experimenting with an "inside-out" paradigm instead. By dropping the library into a page, you get a client-side agent that interacts natively with the live DOM tree and inherits the user's active session out of the box, which works perfectly for SPAs.
To handle cross-page tasks, I built an optional browser extension that acts as a "bridge". This allows the web-page agent to control the entire browser with explicit user authorization. Instead of a desktop app controlling your browser, your web app is empowered to act as a general agent that can navigate the broader web.
I'd love to start a conversation about the viability of this architecture, and what you all think about the future of in-app general agents. Happy to answer any questions!
simon_luv_pho
|next
[-]
- GitHub: https://github.com/alibaba/page-agent
- Live Demo (No sign-up): https://alibaba.github.io/page-agent/ (you can drag the bookmarklet from here to try it on other sites)
- Browser Extension: https://chromewebstore.google.com/detail/page-agent-ext/akld...
I'd be really interested in feedback on the security model of client-side agents giving extension-bridge access, and taking questions on the implementation!
mentalgear
|next
|previous
[-]
Appreciate the transparency, but maybe you could add some European (preferably) alternatives ?
simon_luv_pho
|root
|parent
|next
[-]
The free testing LLM is Qwen hosted by Aliyun. Qwen and DeepSeek are the only ones I can afford to offer for free. It's just there to lower the try-out barrier; please DO NOT rely on it.
The library itself does NOT include any backend service. Your data only goes to the LLM api you configured.
I tested it on local Ollama models it works fine.
general_reveal
|next
|previous
[-]
The only thing I can think of is you had the AI rewrite and embed selectors on the entire build file and work with that?
simon_luv_pho
|root
|parent
[-]
It uses a similiar process as `browser-use` but all in the web page. A script parses the live HTML, strips it down to its semantic essentials (HTML dehydration), and indexes every interactive element. That snapshot goes to the LLM, which returns actions referencing elements by index. The agent then simulates mouse/keyboard events on those elements via JS.
This works best on pages with proper semantic HTML and accessibility markup. You can test it right now on any page using the bookmarklet on the homepage (unless that page CSP blocks script injection of course).
dzink
|next
|previous
[-]
simon_luv_pho
|root
|parent
[-]
The free testing LLM endpoint is hosted on Alibaba Cloud because I happen to have some company quota to spend, but it's not part of the library. Bring your own LLM and there is zero data transmission to Alibaba or anywhere else you haven't configured yourself.
I highly recommend using it with a local Ollama setup.
pscanf
|next
|previous
[-]
I'm particularly impressed by the bookmark "trick" to install it on a page. Despite having spent 15 years developing for the browser, I had somehow missed that feature of the bookmarks bar. But awesome UX for people to try out the tool. Congrats!
simon_luv_pho
|root
|parent
|next
[-]
Bookmarklets are such an underrated feature. It's super convenient to inject and test scripts on any page. Seemed like the perfect low-friction entry point for people to try it out.
Spent some time on that UX because the concept is a bit hard to explain. Glad it worked!
Mnexium
|next
|previous
[-]
simon_luv_pho
|root
|parent
[-]
CloakHQ
|root
|parent
|next
[-]
the "inside your own browser" angle is actually the right intuition here. a real user's browser has built up a consistent fingerprint profile across sessions. the moment you run an agent in a context where those signals differ from that baseline, you're detectable. curious whether you've run into this on sites with aggressive bot detection, or whether the use case has mostly been internal/enterprise apps where that's not a concern?
MeteorMarc
|next
|previous
[-]
simon_luv_pho
|root
|parent
|next
[-]
jauntywundrkind
|previous
[-]
> Collect and query content from tabs, bookmarks, and history - your AI research companion. FolioLM helps you collect sources from tabs, bookmarks, and history, then query and transform that content using AI.
https://github.com/PaulKinlan/NotebookLM-Chrome https://chromewebstore.google.com/detail/foliolm/eeejhgacmlh...