Hacker News
An Introduction to YOLO26
pzo
|next
[-]
yeldarb
|next
|previous
[-]
RF-DETR is both faster and more accurate and truly open source with an Apache 2.0 license: https://github.com/roboflow/rf-detr
Full disclosure: I’m one of the co-founders of Roboflow (we made RF-DETR, wrote this blog post, and are a sub-licensor of Ultralytics’ models.)
MrGLaDOS
|root
|parent
|next
[-]
Misleading marketing statement.
The catch is that for image resolutions >=700x700pixels (most production usecases), the roboflow license is actually PML1.0 instead of Apache2.0 https://github.com/roboflow/rf-detr#license
yeldarb
|root
|parent
|next
[-]
Regardless, you can do whatever resolution you want with the Apache 2.0 model. Just change the config at runtime; it was trained to be resolution agnostic.
You are correct that we also released larger models with a larger backbone under a different, non open-source license.
krapht
|root
|parent
|previous
[-]
Citation needed? 2XL looks like you go up to 800x800 pixel inputs. This isn't the dealbreaker you say it is - all pipelines benefit from thoughtful crop and rescaling before going to inference.
MrGLaDOS
|root
|parent
[-]
Rescaling is fine for some purposes but but not for all. For many domain-specific (often less common and odd dimensioned) objects, downscaling will severely reduce recall. There is a reason that Roboflow slaps a license that is not open source on those specific architectures.
In some cases tiled inferencing (for example with https://github.com/obss/sahi ) might do the job.
yeldarb
|root
|parent
[-]
All of the models, including the Apache 2.0 ones, can be configured to go higher than 800x800. The difference between the ones with the PML license and the Apache 2.0 ones is the backbone, not the resolution.
I'd suggest you read the ICLR paper[1] which shows clearly the difference between the backbones at various latencies in Figure 1.
> For many domain-specific (often less common and odd dimensioned) objects, downscaling will severely reduce recall.
We released an entire paper[2] at Neurips about the long-tail transferability of models across a multitude of domains and benchmarked RF-DETR against that benchmark. The Apache 2.0 model is pareto optimal over the larger PML model at latencies less than the XL size.
(I'm one of the co-founders of Roboflow and worked on RF-DETR and RF100-VL.)
[1] https://arxiv.org/abs/2511.09554 [2] https://arxiv.org/abs/2505.20612
esquire_900
|next
|previous
[-]
teruakohatu
|root
|parent
|next
[-]
That said, many of the claimed improvements in this model were are efficiency related.
Onavo
|root
|parent
|next
|previous
[-]
yfontana
|root
|parent
|next
|previous
[-]
m00dy
|next
|previous
[-]
geuis
|next
|previous
[-]
deviation
|next
|previous
[-]
I then tried trained it on a lot of sample images from a 3D point & shoot game, and was quite disappointed in how it performed.
Has anyone else experimented with it recently? How does this suit as a base-model for training custom classifiers? And with hardware growth in the last ~5 years, is it suitable to run in parallel with games which are graphically intensive?
speedgoose
|next
|previous
[-]
If you want to detect objects and speed is important so you can’t use a LLM architecture, you can give it a try too.
Alles
|next
|previous
[-]
larodi
|next
|previous
[-]
Meanwhile their very own Peter Skalski already does super job with host write ups and examples of all YOLO sorts and is well respected.
Tepix
|next
|previous
[-]
Is there a demo like that available for YOLO26?
alex_duf
|next
|previous
[-]
ktallett
|next
|previous
[-]
maelito
|previous
[-]
Joel_Mckay
|root
|parent
|next
[-]
Running machine-vision outside in the Sun or Weather can get tricky. There is also a limited supply of BS a firm can shovel before some bystander ends up dead. =3
MaxikCZ
|root
|parent
|previous
[-]
What are you trying to accomplish by those questions? Are you genuinely asking, or just baiting? If the former, didnt answers to your previous question make it clear that your question makes less sense than you might assume?
maelito
|root
|parent
[-]
I wish new models coupled with LLM would be capable of estimating the size of features on the map, e.g. the size of the car in meters, to be able to derive the speed with a world understanding. But I have found no resource doing this.