Hacker News
Launch HN: Sitefire (YC W26) – Automating actions to improve AI visibility
We’ve been working together for years and have backgrounds in RL/optimization at Stanford and software engineering. We came to this idea after speaking with marketing teams who were seeing declining traffic due to Google’s AI Overviews and didn’t know what to do.
This space can feel esoteric. Many case studies, few actual studies. Constant battle against myths (e.g. you need a llms.txt vs. you don't need a llms.txt) and "GEO hacks". We try to be more data-driven. And we try to be more bold and build a system that not only monitors, but actually improves traffic from AI search.
While Google performs a single search, AI search engines expand the user prompt into 3-10 fan-out queries. The sourced pages are ranked using a classified algorithm similar to Reciprocal Rank Fusion (RFF). Finally, the LLMs skim the pages and decide what snippets to cite. Our goal is making sure brands have the right content that makes it through this funnel.
Here is how sitefire works:
- The user defines a set of prompts they want to monitor. These are synthetic prompts - we generate them based on SEO keywords and their monthly search volume.
- We submit these prompts to ChatGPT, Gemini, Google AI Mode, etc. on a daily basis and capture the answers. We extract fan-out queries, sourced pages, citations, and brand mentions.
- For each topic, our agents analyze which web pages are sourced and cited the most, and why. They also consider similar pages that you already have.
- Based on the diagnosis, our content agents draft improvements or create new pages, and push them directly to the client’s CMS.
- We integrate with the client’s network logs and Google Analytics to monitor the increase in AI bot requests and human referrals to their page.
This system is continuously updated, so it always shows which content works, and how to adapt the existing sitemap. For one client that used sitefire to optimize their blog, the AI-optimized articles increased their AI bot requests from ~200/day to ~570/day within ten days.
A risk we recognize is that AI-generated content is filling brands’ websites with slop. Whilst it’s still early days and we don’t claim to have figured everything out yet, our intention is to mitigate this by focusing the content on specific, unique information: real product capabilities, real pricing, honest comparisons. The clients still review every page before it goes live, so they can ensure the content is true to their brand.
Some clients use our platform themselves. For others we act more like an agency, automating steps as we go. The goal is for sitefire to run mostly on its own, with clients approving changes via Slack, Claude or their CMS.
Here's a video demo: https://screen.studio/share/fw7VQQak
If you'd like to try what we've built so far, sign up at https://sitefire.ai.
onecommit
|next
[-]
vincko
|root
|parent
[-]
After that content is actually assessed by the model. This paper tried different strategies to improve performance for this last step: https://arxiv.org/pdf/2311.09735. Adding statistics, sources, original data are all strategies that we apply.
In classic SEO, creating more and more content leads to "cannibalization". Generally this hurts performance of all overlapping content so much that it is not worth it.
Gobhanu
|next
|previous
[-]
vincko
|root
|parent
[-]
There are other data sources we want to enable in the future like Cloudflare.
yunyu
|next
|previous
[-]
vincko
|root
|parent
|next
[-]
In our view Profound and Airops are aimed at existing marketing teams. Our goal is to be more hands-off, so you don't need a team. With many of our clients we act more like an agency, communicating via Slack and automating step by step. That's the experience we want to create. We aren't there yet though.
debarshri
|root
|parent
|next
|previous
[-]
vincko
|root
|parent
|next
[-]
Our view on Peec is that it is an analytics solution. They recently did launch an actions feature. But they do not take any actions (yet). Creating content takes a lot of resources. And agencies are expensive.
As an analytics solution it is a good option.
ceejayoz
|next
|previous
[-]
vincko
|root
|parent
[-]
We think about it like this: all of these agents will be most useful to users if they provide valuable answers. So they will be looking for valuable content for grounding their answer.
There are exploits, you can overfit on whatever they currently use as an objective function. But those tend to be temporary. So in the long run, valuable content will win. That's what we aim to create. It's a fine line.
ceejayoz
|root
|parent
[-]
This is a bald assertion.
vincko
|root
|parent
[-]
I do share doubts about the latter.
ceejayoz
|root
|parent
[-]
Yes; the customer here is the site using it, not Google end users, who'll tend to accept whatever's the top search result even if it's deeply wrong or complete slop.
The wellbeing of search users isn't really the priority here, right?
vincko
|root
|parent
[-]
Let me try to rephrase the line of thinking:
To maximize value to the end user, the [AI search] models generally aim to be helpful. The companies building these models [OpenAI, etc.] are incentivized to make the model use helpful content.
Our goal is to be aligned with their objective function long term. And that incentivizes us to create helpful content.
Not all of this is a given. We don't know for sure how it will play out. There will always be ways to game the system. But we think those will get fixed over time.
Edit: added some clarifications on what I mean by "models"
ceejayoz
|root
|parent
[-]
> To maximize value to the paying customer, the models generally aim to be seen as helpful by Google's algorithm. The companies building these models are incentivized to make the model seem to use helpful content.
SEO does the same thing; the appearance of useful to Google is more important than the actual being useful to Google's visitors.
a13n
|next
|previous
[-]
vahar
|next
|previous
[-]
vincko
|root
|parent
[-]
For those types of agents, prompt tracking is less accurate since the context of the queries is so large. But it's still relevant to understand what web searches they tend to perform and if you do show up in those.
That's another reason why we want to integrate other data sources, especially network logs.