> "I didn’t ask for a robot to consume every blog post and piece of code I ever wrote and parrot it back so that some hack could make money off of it."
I have to say this reads a bit hollow to me, and perhaps a little bit shallow.
If the content this guy created could be scraped and usefully regurgitated by an LLM, that same hack, before LLMs, could have simply searched, found the content and still profited off of it nonetheless. And probably could have done so without much more thought than that required to use the LLM. The only real difference introduced by the LLM is that the purpose of the scraping is different than that done by a search engine.
But let's get rid of the loaded term "hack" and be a little less emotional and the complaint. Really the author had published some works and presumably did so that people could consume that content: without first knowing who was going to consume it and for what purpose.
It seems to me what the author is really complaining about is that the reward from the consuming party has been displaced from himself to whoever owns the LLM. The outcome of consumption and use hasn't changed... only who got credit for the original work has.
Now I'm not suggesting that this is an invalid complaint, but trying to avoid saying, "I posted this for my benefit"... be that commercial (ads?) or even just for public recognition...is a bit disingenuous.
If you poured you knowledge, experience, and creativity into some content for others to consume and someone else took that content as their own... just be forthright about what you really lost and don't disparage the consumer. Just because they aren't your "hacks" anymore, but that middlemen are now reaping your rewards.
Well then... let's eliminate any due process and fourth amendment protections, maybe requiring something sensible like "officer suspicion", or maybe just a program of "random" searches.. you know keep everybody on their toes. I also bet that real crimes (whatever that means) goes down...
Just because something works doesn't make it right. Personally, giving up what the law is suppose to protect (individual rights) in the name of the law is something I can only see as a fool's bargain.
It's interesting. I wonder how much large-company disfunction is derailing these things.
Recently Hytale, a would-be Minecraft successor, released early access. That project started around a decade ago as something of an indie project, was purchased by Riot, then cancelled by Riot, then recently sold back to the original project people... who basically undid a lot of fruitless work done by Riot and... as I said... now released as early access. A well received early access as far as I can tell.
I wonder why we don't see more indie games and new developers that are more able rising to challenge what look like dysfunctional incumbents?
I'll be the first to admit that I don't know anything about that industry, but it seems like there's space to make progress for newcomers.
30+ years in AAA game dev and the dysfunction is pervasive at large companies. If the C-suite could visualize the diminishing returns and massive opportunity costs of their own mismanagement, they’d have no choice but to fire themselves and deploy their golden parachutes.
That the current US administration is behaving today not terribly unlike the UK regime of the late 18th century is not a convincing argument for the UK reasserting its historic imperial ambitions in North America or elsewhere.
Furthermore, the native populations of those western territories would probably like to have a word over UK & European rights to assert such privileges in the first place.
Tried jj several weeks ago... and absolutely love it.
A life non-goal for me is becoming proficient in a version control system and Git, insofar as I've been able to tell, demands that you become proficient in an uncomfortably large subset of its features in order to get through any workload, even in the simplest realistic cases.
jj did take some getting use to, but after a couple days it was all sorted and actions which terrified me in Git felt natural and safe using jj. The kind of things that required me to go back to the git documentation (or stackoverflow, or some blog posts) to be sure I was holding it right... in jj it comes easily and naturally.
That jj offers sufficient power under a simple interface to get through the day ... while being compatible with those that use Git... make it a no-brainer for me.
Until someone changes the test to be four days, or two, but doesn't update the test name.
Ultimately disciplined practice requirements rely on disciplined practices to succeed. You can move the place where the diligence needs to taken, but at the end the idea that comments can lose their meaning isn't that different to other non-functional, communicative elements also being subject to neglect. I don't mean to suggest that a longish test title wouldn't be more likely to be maintained, but even with that long name you are losing some context that is better expressed, person-to-person, using sentences or even paragraphs.
I had first hand experiences with this, oddly enough, also working on an accounting system (OK, ERP system... working the accounting bits). I was hired to bring certain deficient/neglected accounting methodologies up to a reasonable standard and implement a specialized inventory tracking capability. But the system was 20 years old and the original people behind the design and development had left the building. I have a pretty strong understanding of accounting and inventory management practices, and ERP norms in general, but there were instances where the what the system was doing didn't make sense, but there was no explanations (i.e. comments) as to why those choices had been taken. The accounting rules written in code were easier to deal with, but when we got to certain record life-cycle decisions, like the life-cycle evolution of credit memo transactions, the domain begins to shift from, "what does this mean from an accounting perspective", where generally accepted rules are likely to apply, to what did the designers of the application have in mind related to the controls of the application. Sure I could see what the code did and could duplicate it, but I couldn't understand it... not without doing a significant amount of detective work and speculation. The software developers that worked for the company that made this software were in the same boat: they had no idea why certain decisions were taken, if those decisions were right or wrong, or the consequences of changes... nor did they care really (part of the reason I was brought in). Even an out-of-date comment, one that didn't reflect the code has it evolved, would still have provided insight into the original intents. I know as well as you that code comments are often neglected as things change and I don't take it for granted that understanding the comments are sufficient for knowing what a piece of code does.... but understanding the mind of the author or last authors does have value and would have helped multiple times during that project.
When I see these kinds of discussions I'm always reminded of one of my favorite papers. By Peter Naur, "Programming As Theory Building" (https://pages.cs.wisc.edu/~remzi/Naur.pdf). In my mind, comments that were at least good and right at one time can give me a sense of the theory behind the application, even if they cannot tell me exactly how things work today.
with code in good shape, i think i prefer having unnamed tests and instead you read the test to see that its an important function.
however, ive also done code archaeology, and same thing, old inaccurate comments were one of the few remaining things with any idea what the code was supposed to do and got me enough pointers to company external docs to figure out the right stuff.
wiki links, issue links, etc all had been deleted. same with the commit history, and the tests hadnt worked in >5 years and had also been deleted
the best comments on that code were about describing Todos and bugs that existed rather than what the code did do. stream of consciousness comments and jokes
what Ive left for future archaeologists of my code is detailed error messages about what went wrong and what somebody needs to do to fix that error
Yep. I saw the title and got excited.... this is a particular problem area where I think these things can be very effective. There are so many data entry class tasks which don't require huge knowledge or judgement... just clear parsing and putting that into a more machine digestible form.
I don't know... feels like this sort of area, while not nearly so sexy as video production or coding or (etc.)... but seems like reaching a better-than-human performance level should be easier for these kinds of workloads.
Nick Zentner, a geology lecturer at Central Washington University, takes a particular subject and does a relatively deep, discussion oriented, dive into it over the course of 26 sub-topics... his "A to Z" series. In these he does a couple streamed shows a week and includes links to relevant papers and resources. At the end of each session is a viewer Q&A for those watching live. Almost an online continuing education course....
Of central importance to the first half of the current Alaska series is recent paper by geologist Robert S. Hildebrand titled: "The enigmatic Tintina–Rocky Mountain Trench fault:a hidden solution to the BajaBC controversy?"
What's great about these series is that he'll get a number of the geologists writing these papers involved in one way or another. Either contributing interviews or talks specifically for the video series, or like in the case of this Hildebrand centric work in the current series, Hildebrand himself is watching the stream and participating in the live chat with the other viewers, answer questions and the like.
Hell yeah!!! Huuuuge Nick Zentner fan here, he's the entire reason I even knew about it. I'm a PNW resident and love attending his lectures in April. If you can make it, please do!
You might have missed the big H2 section in the article:
"Recommendation: Stick with sequences, integers, and big integers"
After that then, yes, UUIDv7 over UUIDv4.
This article is a little older. PostgreSQL didn't have native support so, yeah, you needed an extension. Today, PostgreSQL 18 is released with UUIDv7 support... so the extension isn't necessary, though the extension does make the claim:
"[!NOTE] As of Postgres 18, there is a built in uuidv7() function, however it does not include all of the functionality below."
What those features are and if this extension adds more cruft in PostgreSQL 18 than value, I can't tell. But I expect that the vast majority of users just won't need it any more.
Especially in larger systems, how does one solve the issue of reaching the max value of an integer in their database? Sure for unsigned bigint thats hard to achieve but regular ints? Apps quickly outgrow that.
OK... but that concern seems a bit artificial.. if bigints are appropriate: use them. If the table won't get to bigint sizes: don't. I've even used smallint for some tables I knew were going to be very limited in size. But I wouldn't worry about smallint's very limited number of values for those tables that required a larger size for more records: I'd just use int or bigint for those other tables as appropriate. The reality is that, unless I'm doing something very specific where being worried about the number of bytes will matter... I just use bigint. Yes, I'm probably being wasteful, but in the cases where those several extra bytes per record are going to really add up.... I probably need bigint anyway and in cases where bigint isn't going to matter the extra bytes are relatively small in aggregate. The consistency of simply using one type itself has value.
And for those using ints as keys... you'd be surprised how many databases in the wild won't come close to consuming that many IDs or are for workloads where that sort of volume isn't even aspirational.
Now, to be fair, I'm usually in the UUID camp and am using UUIDv7 in my current designs. I think the parent article makes good points, but I'm after a different set of trade-offs where UUIDs are worth their overhead. Your mileage and use-cases may vary.
Idk I use whatever scales best and that would be an close to infinite scaling key. The performance compromise is probably zeroed out once you have to adapt ur database to a different one supporting the current scale of the product. Thats for software that has to scale. Whole different story for stuff that doesnt have to grow obviously. I am in the UUID camp too but I dont care whether its v4 or v7.
It's not like there are dozens of options and you constantly have to switch. You just have to estimate if at maximum growth your table will have 32 thousand, 2 billion or 9 quintillion entries. And even if you go with 9 quintillion for all cases you still use half the space of a UUID
UUIDv4 are great for when you add sharding, and UUIDs in general prevent issues with mixing ids from different tables. But if you reach the kind of scale where you have 2 billion of anything UUIDs are probably not the best choice either
There are plenty of ways to deal with that. You can shard by some other identifier (though I then question your table design), you can assign ranges to each shard, etc.
Because then you run into an issue when you 'n' changes. Plus, where are you increasing it on? This will require a single fault-tolerant ticker (some do that btw).
Once you encode shard number into ID, you got:
- instantly* know which shard to query
- each shard has its own ticker
* programatically, maybe visually as well depending on implementation
I had IDs that encode: entity type (IIRC 4 bit?), timestamp, shard, sequence per shard. We even had a admin page wher you can paste ID and it will decode it.
id % n is fine for cache because you can just throw whole thing away and repopulate or when 'n' never changes, but it usually does.
Yes, but if you do need to, it's much simpler if you were using UUID since the beginning. I'm personally not convinced that any of the tradeoffs that comes with a more traditional key are worth the headache that could come in a scenario where you do need to shard. I started a company last year, and the DB has grown wildly beyond our expectations. I did not expect this, and it continues to grow (good problem to have). It happens!
I have to say this reads a bit hollow to me, and perhaps a little bit shallow.
If the content this guy created could be scraped and usefully regurgitated by an LLM, that same hack, before LLMs, could have simply searched, found the content and still profited off of it nonetheless. And probably could have done so without much more thought than that required to use the LLM. The only real difference introduced by the LLM is that the purpose of the scraping is different than that done by a search engine.
But let's get rid of the loaded term "hack" and be a little less emotional and the complaint. Really the author had published some works and presumably did so that people could consume that content: without first knowing who was going to consume it and for what purpose.
It seems to me what the author is really complaining about is that the reward from the consuming party has been displaced from himself to whoever owns the LLM. The outcome of consumption and use hasn't changed... only who got credit for the original work has.
Now I'm not suggesting that this is an invalid complaint, but trying to avoid saying, "I posted this for my benefit"... be that commercial (ads?) or even just for public recognition...is a bit disingenuous.
If you poured you knowledge, experience, and creativity into some content for others to consume and someone else took that content as their own... just be forthright about what you really lost and don't disparage the consumer. Just because they aren't your "hacks" anymore, but that middlemen are now reaping your rewards.
reply