A highly cited reason for using mongo is that people would rather not figure out a schema. (N=3/3 for “serious” orgs I know using mongo).
That sort of inclination to push off doing the right thing now to save yourself a headache down the line probably overlaps with “let’s just make the db publicly exposed” instead of doing the work of setting up an internal network to save yourself a headache down the line.
> A highly cited reason for using mongo is that people would rather not figure out a schema.
Which is such a cop out, because there is always a schema. The only questions are whether it is designed, documented, and where it's implemented. Mongo requires some very explicit schema decisions, otherwise performance will quickly degrade.
Fowler describes it as Implicit vs Explicit schema, which feels right.
Kleppmann chooses "schema-on-read" vs "schema-on-write" for the same concept, which I find harder to grasp mentally, but describes when schema validation need occur.
There is a surprising amount of important data in various Mongo instances around the world. Particularly within high finance, with multi-TB setups sprouting up here and there.
I suspect that this is in part due to historical inertia and exposure to SecDB designs.[0] Financial instruments can be hideously complex and they certainly are ever-evolving, so I can imagine a fixed schema for essentially constantly shifting time series universe would be challenging. When financial institutions began to adopt the SecDB model, MongoDB was available as a high-volume, "schemaless" KV store, with a reasonably good scaling story.
Combine that with the relatively incestuous nature of finance (they tend to poach and hire from within their own ranks), the average tenure of an engineer in one organisation being less than 4 years and you have an osmotic process of spreading "this at least works in this type of environment" knowledge. Add the naturally risk-averse nature of finance[ß] and you can see how one successful early adoption will quickly proliferate across the industry.
ß: For an industry that loves to take financial risks - with other people's money of course, they're not stupid - the players in high finance are remarkably risk-averse when it comes to technology choices. Experimentation with something new and unknown carries a potentially unbounded downside with limited, slowly emerging upside.
I'd argue that there's a schema; it's just defined dynamically by the queries themselves. Given how much of the industry seems fine with dynamic typing in languages, it's always been weird to me how diehard people seem to be about this with databases. There have been plenty of legitimate reasons to be skeptical of mongodb over the years (especially in the early days), but this one really isn't any more of a big deal than using Python or JavaScript.
Yes there's a schema, but it's hard to maintain. You end up with 200 separate code locations rechecking that the data is in the expected shape. I've had to fix too many such messes at work after a project grinded to a halt. Ironically some people will do schemaless but use a statically typed lang for regular backend code, which doesn't buy you much. I'd totally do dynamic there. But DB schema is so little effort for the strong foundation it sets for your code.
Sometimes it comes from a misconception that your schema should never have to change as features are added, and so you need to cover all cases with 1-2 omni tables. Often named "node" and "edge."
> Ironically some people will do schemaless but use a statically typed lang for regular backend code, which doesn't buy you much. I'd totally do dynamic there.
I honestly feel like the opposite, at least if you're the only consumer of the data. I'd never really go out of my way to use a dynamically typed language, and at that point, I'm already going to be having to do something to get the data into my own language's types, and at that point, it doesn't really make a huge difference to me what format it used to be in. When there are a variety of clients being used though, this logic might not apply though.
If you're only consuming, yes. It might as well be a totally separate service. If it's your database that you read/write on, it's closely tied to your code.
We just sit a data persistence service infront of mongo and so we can enforce some controls for everything there if we need them, but quite often we don’t.
It’s probably better to check what you’re working on than blindly assuming this thing you’ve gotten from somewhere is the right shape anyway.
The "DAO" way like this is usually how it goes. It tends to become bloated. Best case, you're reimplementing what the schema would've done for you anyway.
The adage I always tell people is that in any successful system, the data will far outlive the code. People throw away front ends and middle layers all the time. This becomes so much harder to do if the schema is defined across a sprawling middle layer like you describe.
As someone who has done a lot of Ruby coding I would say using a statically typed database is almost a must when using a dynamically type language. The database enforces the data model and the Ruby code was mostly just glue on top of that data model.
That's fair, I could see an argument for "either the schema or the language needs to enforce schema". It's not obvious to me that one of the two models of "only one of them is" deserves to much more criticism than the other though.
It's possible you didn't intend it, but your parent comment definitely came off as snarky, so I don't think you should be surprised that people responded in kind. You're honestly doing it again with the "let's stop feeling attacked" bit; whether you mean it or not, your phrasing comes across as pretty patronizing, and overall combined with the apparent dislike of people disagreeing with you after the snark it comes across as passive-aggressive. In general it's not going to go over well if you dish out criticism but can't take it.
In any case, you quite literally said there was a "lack of schemas", and I disagreed with that characterization. I certainly didn't feel attacked by it; I just didn't think it was the most accurate way to view things from a technical perspective.
That sort of inclination to push off doing the right thing now to save yourself a headache down the line probably overlaps with “let’s just make the db publicly exposed” instead of doing the work of setting up an internal network to save yourself a headache down the line.