I wonder how the power consumption compares. I’d expect the classic CNN to be cheaper just because it is more specialized.
> The allure of skipping dataset collection, annotation, and training is too enticing not to waste a few evenings testing.
How’s annotation work? Do you actually have to mark every pixel of “the thing,” or does the training process just accept images with “a thing” inside it, and learn to ignore all the “not the thing” stuff that tends to show up. If it is the latter, maybe Gemini with it’s mediocre bounding boxes could be used as an infinitely un-bore-able annotater instead.
If it works, you could use the llms for the first few thousand cases, then use these annotations to train an efficient supervised model, and switch to that.
That way it would be both efficient and cost-effective.
> The allure of skipping dataset collection, annotation, and training is too enticing not to waste a few evenings testing.
How’s annotation work? Do you actually have to mark every pixel of “the thing,” or does the training process just accept images with “a thing” inside it, and learn to ignore all the “not the thing” stuff that tends to show up. If it is the latter, maybe Gemini with it’s mediocre bounding boxes could be used as an infinitely un-bore-able annotater instead.