Kafka Removing Zookeeper Dependency

powerbook5300CS · on May 17, 2020

Finally! Travis Jeffery did this years ago in Jocko and also solved my other beef with Kafka at the same time by building it in Golang.

https://github.com/travisjeffery/jocko

I’ve always found things built on JVM are are PITA to deploy (especially when using SSL) so the single Golang binary is a welcome advancement.

It’s all the good things about Kafka (concept, API, and wire protocol) without all the crap (zookeeper dep, JVM foundation)

elric · on May 17, 2020

Things on the JVM are a PITA to deploy? That's a bit of a silly statement. Some things are easy to deploy, some aren't. It's not an intrinsic property of the JVM, but rather the consequence of a series of choices made by whoever wrote the tool in question.

You can do TLS in Java without using keystores, even if keystores do have some advantages, and megacorps seem to like them for all the wrong reasons. Using them shifts some of the complexity of dealing with certificates from the developer to the system administrator. It's an implementation choice.

The only other complaint of yours I could find ITT was about setting min/max heap size... not only is it extremely convenient to be able to do that, it also hasn't ever technically been required, and the defaults have been Good Enough for most uses since Java 5 came out in 2004. The JVM will figure it out for you, and if it's too conservative or too aggressive for you, you can tune it.

If Kafka is a PITA to deploy, fine, maybe that's fair criticism. Not all things on the JVM are pain to deploy.

silisili · on May 17, 2020

I don't understand Java apologists. I'm not saying Java is bad. Quite the opposite, it's one of the most optimized systems to date. But if you cannot admit Java apps are a pain to deploy, or, at least harder than deploying a static binary, I cannot take this argument seriously in good faith.

elric · on May 17, 2020

Seriously? When was the last time you deployed something as a standalone, static binary? Hardly anything with any amount of complexity is shipped like that, especially not cross-platform. Just about everything requires some kind of runtime, some kind of dependencies or some kind of environment. I can go "dnf install firefox" just as easily as I can go "dnf install ant". Both of these come with a boatload of dependencies. One of them depends on a JVM, the other doesn't. Both are packaged pretty well and both can be installed in a single step.

I indeed won't admit that "Java apps" are a pain to deploy, because it's a meaningless and overly generalistic statement. Some apps are a pain to deploy, but it's almost always down to the way they're packaged/distributed, and rarely down to the underlying technology.

bsaul · on May 17, 2020

I'm not sure what you mean. Deploying static binaries was the main way to go before, and is now the standard approach with go. Pretty much every service developed in go is deployed as a standalone statically compiled binary, and i would assume it's the same for code developped in Rust.

Tsarbomb · on May 17, 2020

Except you need to make sure you built against the correct version of libc because you cannot statically link that.

I remember when docker first came out and people were losing their minds about how they will be able to deploy their applications with all their dependencies in a single shippable bundle, kind of like an uber jar.

nitsky · on May 17, 2020

Rust programs can be statically linked to musl libc, producing binaries that will run on any linux distribution. You can run ldd on the binaries and confirm they have zero dependencies on shared libraries.

touisteur · on May 17, 2020

Isn't that what go is trying to alleviate, with their use of direct syscalls and no use of libc?

hkeide · on May 24, 2020

I build my Linux Go binaries on my macOS development machine. Cross compilation is trivially easy in Go and I never had any libc version problems.

kitotik · on May 17, 2020

> Hardly anything with any amount of complexity is shipped like that

The entire HashiCorp stack and Cockroachdb comes to mind. I’m sure there are things that are much more complex in the world, but they do some fairly heavy lifting.

> especially not cross-platform

I find it drastically simpler to download a single binary for each platform I need to deploy to(macOS,Linux,FreeBSD) than it is to get a consistent Java environment setup on those same platforms.

tamale · on May 17, 2020

You're kinda omitting the insane amount of configuration you can give to pretty much every hashicorp product and cockroachdb

I think the big difference is that Java projects TEND to be more 'up front' about their configuration options, with shipped default files with a lot of the options already set to some default value, while projects like the ones you mentioned require you to look up every property yourself and set it to something if you want to change it from the default.

kitotik · on May 17, 2020

I did not mention the configs, because those are plain text and do not really add dependencies I don’t already have.

Are you saying that Java projects tend to have more sane defaults and come bundled with more out of the box?

The last Java project I had to deploy was ElasticSearch/Kibana, and there was a LOT of configuration needed that required consulting a lot of disparate documentation.

parasubvert · on May 18, 2020

Nearly everything written in Golang is deployed as a static binary with minimal dependencies, which is partly why it has grown in popularity.

It’s a nearly universal experience that different language runtimes have very different experiences with dependencies. Someone who ships a Node, Python or Ruby-based tool for example has a tough time, to the point of needing to write a wrapper installer that vendors all dependencies including the runtime itself, just to be sure.

The JVM with Maven and fat JARs isn’t probably as bad as these cases, as mostly it’s just requiring a compatible JRE. This is why Spring Boot apps for example are so popular for deployment - no more app server insanity!

Jlink makes this more like Golang, but isn’t used enough.

That said Golang dependency management for the developer / builder is a history of horrors. Maven has had its issues but has gotten past most of them.

buzzkillington · on May 17, 2020

>Hardly anything with any amount of complexity is shipped like that, especially not cross-platform.

Including the JVM. The paths alone.

apta · on May 17, 2020

You can ship the JVM with your code (uber jar), so no paths to set up. And now with jlink, you can even strip it down to only parts of the JVM that your code uses, to optimize size.

lostapathy · on May 17, 2020

From the perspective of a java developer, this makes total sense.

From the perspective of a sysadmin who just wants to deploy an app that ships as a jar and doesn't come pre-packaged with any convenience features like you mentioned, it's an awful lot to work out and learn.

stingraycharles · on May 17, 2020

The problem I have with your statement is that you do not elaborate on why it is such a PITA. Maybe you had some issues with a Spring web app? JavaBeans? I don’t know. The problem is that you’re making a broad generalization about all apps on the JVM, while to me it seems like this has more to do with decisions of the specific apps you were deploying.

To go back to the original argument, why is Kafka ported from JVM to Go suddenly much easier to deploy? Is it really due to the JVM, or is it perhaps related to design decisions made while porting the application to Go?

tiew9Vii · on May 17, 2020

They are really simple if done correctly. You have a single fat jar so can do java -jar myapp.jar which is almost as easy as your static binary but prefixed with two extra words.

It’s also trivially easy to package as a docker image.

What makes things like Zookeeper and Kafka to deploy is they are complex applications. You can write them in any other language and the deployment will still remain as non trivial.

michaelt · on May 17, 2020

I use Java for a living. Whether you think Java is very easy or a pain in the ass depends on what you're doing.

Consider the fact Ubuntu 18.04's default repos come with a pretty ancient version of Java. And if you want a newer version, you have to use some third-party source. And the fact there are both OpenJDK and Oracle JDKs to choose from. Will that be JDK or JRE? Headless? Depending on the combination of JDK and software, maybe the fonts will go all funny. Got multiple JDKs? Some programs will use your default Java, some will invite you to select the JDK to use, some will bundle their own complete JDK. Oh, a program uses JavaFX? No, of course that isn't installed just because you installed Java.

These problems aren't inherent to Java, of course - it's 98% due to Oracle's attempts to inconvenience people into paying for licenses.

elric · on May 17, 2020

I don't know which JVMs are or aren't in Ubuntu. But Ubuntu 18 is over 2 years old. Almost every major linux distribution has stable, LTS versions of Java 8 and 11 -- which are the currently supported LTS versions in general.

A package can easily add a dependency on a specific Java version. And unlike many other tools, if you have multiple versions of Java, all you usually need to do is add the right one to the $PATH. Sometimes you may have to set $JRE_HOME. And that's it. You're done. If that's too much of a pain to deal with, I hope you never have to install a tool that requires a specific PHP, Python or NodeJS version.

I don't know of any serious tools written in Java that don't either explicitly say which version(s) are supported. Maybe some tools are poorly packaged. But again, that's not a problem with Java.

apta · on May 17, 2020

You can also ship the JVM directly with your product, that way it's system independent. Even better with jlink.

PopeDotNinja · on May 17, 2020

+1 for this. A headless install of Java is not straight forward. Doable? Yes. Pain in the butt? Yes.

elric · on May 17, 2020

yum install java-latest-openjdk-headless

There, you're done. Need fonts? Install fonts. Pain in the butt? Where?

bsaul · on May 17, 2020

And now you have a separate intermediate runtime layer between your code and the OS, with its own set of versions, patches, regression, bugs, backward compatibilities, etc. that may or may not have an impact on the code that you intend to deploy (or redeploy).

Statically compiled binaries are undoubtedly a plus regarding deployment.

throwdbaaway · on May 17, 2020

You do realize that golang statically compiled binaries contain the go runtime right? So, all the concerns that you mentioned are applicable to go as well, e.g. a golang app may be buggy when built with version X, while another golang app may have GC issue when built with version Y, and yet another golang app would be vulnerable unless rebuilt with version Z. [1][2][3]

Don't drink the Kool-Aid, golang is just a more opinionated and much less capable cousin of Java/JVM. And for many companies, that could be the right trade-off.

[1] https://groups.google.com/forum/m/#!topic/golang-announce/mV...

[2] https://groups.google.com/forum/m/#!topic/kubernetes-securit...

[3] https://groups.google.com/forum/m/#!topic/golang-announce/65...

bsaul · on May 17, 2020

except that the developer controls it, since its is embedded. As such it follows the exact same lifecycle as the code itself ( versionning, testing, etc).

vips7L · on May 18, 2020

This is misinformed. jlink has existed since jdk 11 and allows the developer to produce a custom JVM to ship with their application. jpackage also now exists to create installers for whatever platform your on. It is you the developer who are responsible for packaging your application correctly.

hkeide · on May 24, 2020

Why doesn't Kafka do this?

ashtonkem · on May 17, 2020

I’ve deployed Java apps for years, the idea that I must “admit” to Java being painful to deploy in order to be taken seriously is laughable.

Like any other tool, Java has its pros and its cons, but you’re being hyperbolic.

gilbetron · on May 17, 2020

Ok, you have a java app I want on my system, what are the steps I need to do? (I'm on Ubuntu 18.04)

apta · on May 17, 2020

You can ship the JVM with your app such that it's self-contained and system independent. Apps like IntelliJ already do this. And now with jlink, you can strip it down to make deployment sizes even smaller by only shipping the parts of the JVM that your app uses.

ashtonkem · on May 17, 2020

That’s the reasonable choice for consumer facing applications, yeah. For server applications it makes more sense to use something like Docker or a pre-built AMI.

For what I do “how do I get Java on the server” is much less difficult than making deployments quicker and more efficient, which is more a problem for our CI/CD harness and integration tests. Neither of which are Java specific per se.

parasubvert · on May 18, 2020

For server apps you don’t even need a Docker image or AMI, a fat JAR will usually do 90% of what you need.

That said if you want to keep your JDK up to date via Docker or AMI of course that’s fine. a jlink JAR does the same thing but I can see the desire for wanting to decouple runtime upgrades from code upgrades.

splix · on May 17, 2020

I guess you mean that Java 11 is not available on 18.04 by default, right? Java 11 was release in September 2018, so I guess Ubuntu 18.04, which came in April 2018, didn't have any chance to include it. Therefore you have to make an extra step to setup a ppa repo with software that didn't exist at the moment of release.

Otherwise, at least with Ubuntu 19.04 which I have, it's simple `sudo apt install openjdk-11-jdk-headless`

ashtonkem · on May 17, 2020

wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/publi... | sudo apt-key add -

sudo add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/

sudo apt-get install adoptopenjdk-<version>-hotspot

Yes, it’s a few extra steps, but it’s not exactly Sisyphean now is it? Given the complexity of modern CI/CD pipelines, and the trend towards continuous deployment, making sure that a JVM is deployed alongside the uberjar is a pretty small ask. Would a single binary be easier? Sure. Would I change languages just for that? No, other factors are more important IMHO.

karmakaze · on May 17, 2020

To run a Java jar-with-dependencies you only need to `java -jar app-with-deps.jar` which isn't really that hard at all.

silisili · on May 17, 2020

This skips the steps of choosing and installing a JVM, which is a bit daunting. My distro gives like 4 different versions. And worse, some applications only run on certain versions. By default, there is no 'java' command.

weego · on May 17, 2020

That's not really a good faith argument. That's the equivalent of saying that a major problem with NodeJS is you have to pick which version to install before you deploy.

I deploy single package jvm files most days having not had to consider the deploy environment for a about 4 years.

jooz · on May 17, 2020

What is hard is not to choose the java version (1.8,19,...), but the JVM implementation. To choose between hotspot, openjdk, oracle. Even to find the download link without being forced to log in on oracle.com was difficult (in fact I couldnt find it the very last time).

npstr · on May 17, 2020

Have a look at https://sdkman.io

Groxx · on May 17, 2020

Why is it hard? Since Go gives you no choice, the equivalent would be to pick literally any, and make do with its defaults. Same thing for tuning. You do not have to choose optimally, that's an extra constraint.

femiagbabiaka · on May 17, 2020

It is a good faith argument — it’s literally the whole point. Native binaries bypass all of that.

Tsarbomb · on May 17, 2020

Good luck when you need to deploy to a system with a different version of libc.

splix · on May 17, 2020

Also with different version of openssl

femiagbabiaka · on May 17, 2020

goalposts are moving at a rapid pace

tamale · on May 19, 2020

having a different version of libc is exactly like worrying that the system has a different version of java. If you know about these 'gotchas', they're not big deal. If you don't, then ya, they can be a bit intimidating at first.

I really don't think running a go app vs. a java app is ANY different.

femiagbabiaka · on May 19, 2020

I’m trying to figure out what environments you’re working in where libc incompatibilities are as common as java versioning issues

You’ve made a straw man and are now working very hard to defend it.

morelisp · on May 17, 2020

> That's the equivalent of saying that a major problem with NodeJS is you have to pick which version to install before you deploy.

Are you saying this isn't a major problem with Node? In my experience it's a problem with every language that requires a separate runtime.

The JVM is the worst of two worlds because I have the build complexity of an AOT compiler to make my deployment artifact and the deployment complexity of an interpreted language to get the correct runtime pre-installed.

Tsarbomb · on May 17, 2020

You do realize Java has maintained it's backwards compatibility making choosing your deployment runtime significantly easier.

karmakaze · on May 17, 2020

This is a question you have to answer once for your project. Similar to choosing which OS, database, library, or framework to use. There's no avoiding having knowledge or learning when choosing dependencies.

Just about any of the implentations will work fine. You only have to concern yourself with avoiding use of the Oracle runtime in production without a license. If in doubt, install the latest AdoptOpenJDK[0] Hotspot version that's compatible with your app.

[0] https://adoptopenjdk.net/

apta · on May 17, 2020

You can ship the JVM with your app such that it's self-contained and system independent. Apps like IntelliJ already do this. And now with jlink, you can strip it down to make deployment sizes even smaller by only shipping the parts of the JVM that your app uses.

devonkim · on May 17, 2020

It depends upon what you’re looking at in a deployment. Static binaries are the gold standard of “simple” in theory for ops folks but are not architecture neutral by definition. The JVM JAR spec is pretty well known as well and is not a surprise with plenty of documentation or blog posts to help out. Quirks do show up from crossing boundaries into native land like DNS and sockets behaviors, but a basic JAR does give some advantages over a random native binary in a tarball when it comes to dependency management which is tougher with a static binary. It’s not obvious which version of a library is compiled into a static binary at first glance compared to unzipping and checksumming the JARs in a package.

There’s definitely some laborious parts of the Java packaging setup (the whole resources, meta-inf / manifest setup reeks of YAGNI problems) but similar to Go’s lack of expressiveness being a feature, the Java ecosystem is fundamentally designed for organizations that separate developers from the systems where the software is run, and this is either helpful or hurtful entirely depending upon the organization’s needs.

I’ve never really had a problem deploying anything based upon the packaging - it’s a minor part compared to various obscure configuration files, ConfigMaps, or environment variable injections that bother me more, and that has nothing to do with a package or even language in itself.

skywhopper · on May 17, 2020

Wow, this does not match my experience at all. The JVM itself is harder to install than any native-binary service I've ever used. And that's before you get around to running something like Kafka or Zookeeper on top of it. Maybe "not all things" are a pain to deploy, but everything I've ever run on it or looked into running has been.

I'm particularly curious about how you deal with TLS in the JVM without keystores? I'd love to hear more about that.

Also, I have to laugh about the heap settings not "technically" being required. Sure, but how many JVM-based services have you actually run in production without setting them? If it's more than zero, then I congratulate you on your luck.

elric · on May 18, 2020

How are you installing these troublesome JVMs? Either you use your OS' package manager and get updates as a bonus (for a looooong time on CentOS/RHEL). Or you can literally download an OpenJDK zipfile, unzip it and you're done -- will require manual updating. Or, if you're feeling enterpricey (not a typo), you can install an Oracle JVM and get paid updates for an eternity.

Kafka and ZK are a pain to deploy. Not the JVM's fault. Entirely down to the choices made by their respective community. Last I checked it was still virtually impossible to secure a ZK-ensemble. Maybe they'd welcome patches, but when I tried to submit a minor bugfix years ago, I found them to be an unwelcoming community.

agallego · on May 17, 2020

Agreed. We're working on this exactly - https://vectorized.io/redpanda/

10x faster. API compat. No jvm.

I also know of other private impls of it. Just makes sense.

pjmlp · on May 17, 2020

How 10x faster? Benchmarks available?

If that is the startup time, are Java AOT compilers taken into consideration into the said benchmarks?

agallego · on May 17, 2020

p99 latency.

No startup times of course. This is for pushing a simple 2 petabyte workload.

No AOT compilers, just download kafka bin distribution 2.4.1

Just launching 6 or 7 of these.

  bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance \
    --record-size 1024 \
    --topic sfo \
    --num-records $((1<<31)) \
    --throughput $((1024*64)) \
    --producer-props "acks=1" \
    "client.id=alex.client" \
    bootstrap.servers=172.31.31.28:9092 \
    batch.size=81960 \
    buffer.memory=$((1024 * 1024)) &> ~/nohup1.txt &

apta · on May 17, 2020

It's known that golang optimizes for latency at the expense of throughput, and it doesn't give you the option to change this if your requirements change. This is the power of the JVM. In any case, the JVM now ships with ZGC, a low latency GC, and it may be worth running the benchmark with it. There's also another low latency GC in the works called Shenandoah.

agallego · on May 17, 2020

We wrote it in c++ on top of seastar.io

apta · on May 17, 2020

Thanks for the info. It makes sense in that case. Though the JVM can be very performant when written properly, and especially now with ZGC, Valhalla, Panama, etc.

agallego · on May 17, 2020

ah yeah no doubt. I've read the recent gc code in the jvm tree. it's beautiful. We just had a ton years of experience with c++ and it was a good fit for seastar.io - this all came from me messing around with https://github.com/smfrpc/smf and seeing what one could do w/ DMA (no kernel page cache)

StreamBright · on May 17, 2020

Did you try to tune GC (as most Kafka production systems would have GC configured)? Which JVM version?

koffiezet · on May 17, 2020

Just the fact that you have to ask all these things proves the point?

pjmlp · on May 17, 2020

Not really, after all most native code compilers have endless amount of configuration options as well, while Go still falls behind many use cases.

Also I started to see a trend in books and blog posts regarding how to write Go code towards better performance, so it isn't a given that it excels at performance out of the box.

All of which comes back to the original point that many times isn't the language, rather how it is written and what tools one makes use of.

If Go is so much better than Java, Google would have replaced it already on Android with Go (battery life and such), instead they went with a mix of AOT/JIT with PGO, introducing Kotlin, while gomobile efforts were never given any serious consideration not even for the NDK.

gen220 · on May 17, 2020

Go’s mission statement isn’t to be faster than Java or to replace java. Its to (1) compile faster (2) compile into single binaries with no dynamic links (3) natively support concurrency and parallelism with M:N routines:threads.

Go will never replace Java for Android because (1) it doesn’t use a VM, so it would need to compile for every arch that android runs on (2) it would require a bug-compatible port of Android. Because Kotlin runs on the JVM and can run on top of Java codebases, it doesn’t have to leap over these hurdles.

There are other reasons you can come up with I’m sure, but those are the biggest that stick out to me. Notably it has nothing to do with “which language is better”.

They optimize for different use cases, and that’s OK. That being said, if you were writing Android from scratch and were only targeting ARM (an equally silly comparison meant to highlight the differences in the languages), you’d be hard pressed to justify Java over Go.

pjmlp · on May 17, 2020

Java on Android compiles to native code just like Go.

Google is writing Android from scratch, is it called Fuchsia and Go also doesn't get to play there.

The few parts that were written in Go are scheduled to be rewritten in C++ or Rust, with Dart being the main userspace language.

Groxx · on May 17, 2020

I honestly have yet to see any evidence that Fuchsia is anything but an experiment in this respect, like Dart was (which was also going to "replace android").

I entirely admit the possibility, but I see no plan. Hopes and aspirations are not plans.

pjmlp · on May 17, 2020

> That being said, if you were writing Android from scratch and were only targeting ARM (an equally silly comparison meant to highlight the differences in the languages), you’d be hard pressed to justify Java over Go.

Instead they went with Rust, C++ and Dart.

Now what have all those languages in common that Go lacks?

Groxx · on May 17, 2020

I think you meant to reply to the parent comment with this?

pjmlp · on May 18, 2020

Yeah I thought it was the same person.

apta · on May 17, 2020

> (1) compile faster

This isn't really true for non-trivial code bases. And even if it is, it's not a big difference in practice, especially with incremental compilation where I find it actually much quicker to change a couple of files and re-run unit tests compared to golang which has to spit out a multi-dozen MB binary each time, taking 5-6+ seconds.

> (2) compile into single binaries with no dynamic links

Java is getting AOT compilation which will do the same.

> (3) natively support concurrency and parallelism with M:N routines:threads.

Java is getting those as well: http://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.h...

Groxx · on May 17, 2020

I'm going to go with "absolutely not".

The fact that a tool works out of the box and offers you an extensive array of ways to get more out of it is not in any way worse than a tool that just works out of the box. You can use it in exactly the same way with no more mental effort - stick to the defaults.

StreamBright · on May 17, 2020

Which one is easier? Adding a few lines of GC configuration or rewriting a service in C++ using Seastar and binding processes to cores?

What sort of performance measurement uses default configurations? What is even the point of not tuning the GC to your application workload?

I have spent the last 15 years on running Java apps in production and to optimize for the p99 latency is really not that hard. Optimize p99.99999 is a whole different subject though. I don't understand what is the point of comparing a default GC setting that is for hello world applications to a software that is optimized every way possible. Apples to oranges. It is a grat marketing gimmick though. Look ma no performance! Look here, so much faster. We live in a single dimension word, yay!

manigandham · on May 17, 2020

Easier for who? At some point, the architecture needs to change to overcome a fundamental limit. Developers take on that work to make the performance and operations easier for their users.

Seastar is the foundation of Scylla, which shows that rewriting in C++ can deliver magnitudes more performance which is not possible by just tuning Cassandra on the JVM. In fact, Datastax has now copied the Scylla approach in Cassandra but still lags behind drastically with performance.

StreamBright · on May 17, 2020

>> Easier for who?

For any decent SRE out there.

>> magnitudes more performance which is not possible by just tuning Cassandra

Magnitudes?? Are you talking about the order of magnitude? You should read the ScyllaDB performance report first.

https://www.scylladb.com/product/benchmarks/aws-i3-metal-ben...

Avg. 99.9% Latency (ms): 9.9

vs.

Avg. 99.9% Latency (ms): 474.4

While there is no significant latency difference in lover percentile tiers. Where Scylla really shines is TCO. Some companies trade SRE time for license cost, some other companies tune GC.

manigandham · on May 17, 2020

Yes, you get 10x the throughput while maintaining far better tail latency with Scylla over Cassandra.

How is 10ms vs 475 not a major improvement? How is 4 nodes vs 40 not a major improvement? If you're an SRE than how is managing 4 servers with far less tuning and maintenance not a major improvement? Also 99.9% percentile still matters. They're testing with 300k ops/sec which means 300/sec are facing extreme latency spikes that can be enough to fail and/or cause cascading issues through out the application.

There's no metric where Cassandra is better here and you can't tune your way to the same performance in the first place which is the whole point of Scylla. What even is your claim here? Spend more to get less?

throwdbaaway · on May 18, 2020

2-3x maybe, 10x not likely, unless you compare it to untuned cassandra.

What makes it possible to run cassandra/scylla on nodes with TBs of data density is the TWCS compaction strategy from Jeff Jirsa. He was just a cassandra power user at the time, and I like to think that the invention was possible because of Java.

So, next time you read an ad piece from scylla about replacing 40 mid size boxes running CMS with 4 big boxes, don't forget about TWCS.

manigandham · on May 18, 2020

It's 4 servers doing the same as 40. That is 10x throughput, and with lower tail latency.

Scylla is far more than a compaction strategy. If it was that simple, than Cassandra would already be able to do it.

It's an objectively faster database in every metric. Datastax's enterprise distribution has more functionality but core Cassandra is now entirely outclassed by Scylla in speed and features.

throwdbaaway · on May 18, 2020

Again, that's 40 mid size boxes with questionable GC (32G to 48G heap size is the no man land, G1GC target pause time of 500ms of course will result in 500ms P999 latency, etc), versus 4 big boxes, which have more than 4x higher specs. So 10 divides by 4, that's the 2-3x that I mentioned.

The TWCS is just a good example that raw performance is not everything. Performance also comes from things like compaction strategy, data modeling, and access pattern, while users and stakeholders also care about things like easiness to modify, friendly license, and steady stewardship.

maxdo · on May 17, 2020

How does Scylla relates to Kafka and zookeeper here? I know a bunch of ad-tech companies that built their entire stack around JVM and scala. Those companies need to perform a bid (a whole bunch of logic ) within 100 ms ( including networking), otherwise, they got a financial penalty. Please stop being a kid who post everywhere you see JVM product your opinion on that. Kafka, Pulsar are two successful projects no matter what you think. Kafka relies on a bit outdated architecture but has the most vibrant ecosystem. Those are major pros and cons. Not a language or VM.

manigandham · on May 18, 2020

Did you reply to the wrong post?

Redpanda uses the Seastar framework which was created by the ScyllaDB project. Scylla is high-performance C++ reimplemation of Cassandra and RedPanda seems to be chasing the same thing as an alternative to Kafka/JVM.

As a 12 year adtech veteran who has built ad networks from scratch 3 times, low-latency and high-throughput are critical to ad serving infrastructure and that's why Scylla is such a better alternative to Cassandra. The only other database that gets close is Aerospike, and possibly Redis Enterprise with Flash persistence. It's entirely valid to want similar improvements for event streams as well, and as long as they keep the same external API then you don't lose any of the ecosystem advantages either.

agallego · on May 17, 2020

who cares about lower percentage tiers in the world of big data? what does the p30 or 50 even matter at large scale?

See Gil Tene's talk on how not to measure latency.

https://www.youtube.com/watch?v=lJ8ydIuPFeU

What matters is what most of your customers will get - p99, p999, p9999, p100.

throwdbaaway · on May 18, 2020

Ironically, Gil Tene uses the talk to sell you the Azul C4 pauseless GC, which shall easily invalidate your 10x lower p99 latency claim. Of course we also have zgc and shenandoah these days.

StreamBright · on May 17, 2020

You are assuming that all workloads are customer facing.

agallego · on May 17, 2020

I understand tail latencies - https://www.youtube.com/watch?v=WdFYY3vEcxo - my prev open source project smf (https://github.com/smfrpc/smf) uses gill tene's HDR histogram. I think you have a very superficial understanding of seastar. It is not simply marketing, it is a suite of tools and techniques to build low latency software - Specifically for IO intensive apps. Glauber wrote a good into here https://www.scylladb.com/2018/06/12/scylla-leverages-control....

Seastar is a fundamentally different way of programming from what you mentioned above. Let me give you an example. Seastar takes all the memory up front - never gives it back to the operating system (you can control how much via -m2G, etc). This gives you deterministic allocation latency, is just incrementing a couple of pointers. Memory is split evenly across the number of cores and the way you communicate between cores is message passing - which means you explicitly tell which thread is allowed to read which inbox (similar to actors) - i wrote about it here in 2017 https://www.alexgallego.org/concurrency/smf/2017/12/16/futur...

The point of seastar is to not tune the GC for each application workload So to bring that up means that you missed the whole point of seastar. Instead the programmer explicitly reserves memory units for each subsystem - say 30% for the RPC, 20% for the app specific page-cache (since it's all DMA no kernel page cache), 20% for write-behinds, etc. (obviously in practice most of this is dynamic). It is not one dimension as suggested and not apples to oranges. it is apples to apples. You have a service, you connect your clients - unchanged - and one has better latency. that simple.

It may be your experience that when you download a bin kafka say 2.4.1 you change the GC settings but in a multi-tenant environment that's a moving target. Most enterprises I have talked to, just use the default script to startup kafka w.r.t gc memory settings. (they may change some writers settings, caching, etc)

At the end of the day there is no substitute for testing in your own app with your own firewall settings w/ your own hardware. The result should still give you 10x lower latency.

StreamBright · on May 17, 2020

I am familiar with Seastar too. It is one component that is pretty useless by itself. What is relevant in this topic is what is around it, the functionality that you provide. This is why Scylla is copying Cassandra. You can come up with a nice way of programming whatever you want but the end of the way the business functionality is what matters and there are different tradeoffs involved, still.

manigandham · on May 17, 2020

What do you mean "copying" Cassandra? Obviously they're offering the same API. Many people like the Cassandra data model and multiregional capabilities and that's why it was chosen.

What Scylla is doing is unlocking new performance potential with a C++ rewrite and an entirely different process-per-core architecture that gets around the fundamental limitations of Cassandra and makes it easier to run. This performance and stability has also led to the team making existing C* features like LWT, secondary indexes, and read-repair even faster and better than the original implementations.

pjmlp · on May 18, 2020

While exposing a memory corruption friendly language into the internet.

What kinds of security assessments are done to guarantee that Scylla is as secure as Cassandra?

manigandham · on May 18, 2020

Sure, that's a valid but separate concern. You'd have to ask the dev team what their precise security approach is.

pjmlp · on May 17, 2020

Thanks.

manigandham · on May 17, 2020

Nice, always like seeing new projects. Is this based on Seastar? Will this be open-source?

agallego · on May 17, 2020

Yeah. Based on seastar.

W.r.t OSS doing research on it atm.

I had a VP at a major cloud say to me "we will try to use your tech" in person. So doing research before I make the decision

jonathanoliver · on May 17, 2020

@agallego I just signed up on your wait list through the vectorized.io website. I'm super interested in this project.

agallego · on May 17, 2020

on it.

maxdo · on May 17, 2020

No sources, no benchmarks, no production use-cases. Thanks but no.

skyde · on May 17, 2020

so this use raft for topic data not just metadata ?

agallego · on May 17, 2020

correct. we use one protocol, just raft - both for metadata and for data.

Raft is really easy to parallelize and dispatch to multiple followers async. I measured recently on 3 i3.8xlarge instances which give you 1.2GB/s - and i got around 1.18GB/s sustained -https://twitter.com/emaxerrno/status/1260415381321084929

Also, what's nice about using raft is that if there is a bug, we know it's w/ our implementation and not w/ the protocol. so it gives users sound reasoning.

skyde · on May 17, 2020

are you hiring :-) I would love to work on this.

I consider myself a RAFT expert and worked in kafka for the past 5 years.

agallego · on May 17, 2020

alex@vectorized.io - let's chat. i'm looking for a senior member atm.

snidane · on May 17, 2020

Nice job. I have always wondered why performant big data systems are written using resource hungry JVM.

sz4kerto · on May 17, 2020

Because the JVM is fast. And it's not resource hungry - your program might be resource hungry. Bad desktop Java programs misled a whole generation of programmers about the JVM.

Look at stuff like LMAX. Java can be lightning fast.

agallego · on May 17, 2020

The JVM is fast. (though I'd say the best thing about the JVM is the amazingly large test suite it maintains). I think he was alluding to the fact it is much easier to develop predictable systems in a language that forces you to deal with the constraints up front. - i.e.: writing your own memory allocator for custom pools knowing exactly the latency, throughput, assembly generated for it.

Not to mention that a lot of libraries that are immediately avail for c++ (see io_uring) take a while to get ported to java. The cost of JNI for a libaio wrapper is also expensive. Last i bench(few years back), switching to a crc32 with JNI switch alone was 30 microseconds - an eternity - before doing the work.

In any case, we use seastar (seastar.io), which I'm not sure can actually be ported to java. The pinned thread per core makes a lot of sense for minimizing latency and cache pollution, etc. Externally, the feeling that apps in java are slow is real, less because JVM is slow per se, but because writing low latency apps in java is not the idiomatic way and those that do see to extract every ounce of performance of the hardware often look else where since the work is just about the same.

Matthias247 · on May 17, 2020

> i.e.: writing your own memory allocator for custom pools knowing exactly the latency, throughput, assembly generated for it.

If you want to do this, you can gain a lot of performance with having custom allocators and pools on the JVM as well. E.g. frameworks like Netty have pooling strategies for ByteBuffers. If you go that route, you can also gain a lot of performance on the JVM, might really be competitive.

Unfortunately the JVM still enforces too many heap allocations since value-types are not a thing yet, but it still performs well.

One of interesting things I discovered about C++/Rust vs C#/Java is that the former language family wants you to do a lot of optimizations upfront, wich typically results in good performance. But sometimes you also spend too much time into optimizing something something that won't matter in practice.

Whereas the managed languages are a bit easier to work with by default, but will thereby only yield mediocre performance. However you still have the chance to look into the bottlenecks and improve them by large margins using the right approaches. In C# now even more so than in Java thanks to tools like Span and value types.

StreamBright · on May 17, 2020

True. I think the rejection of operating systems managing resources the way it is today (dynamic resource binding) is coming. Pinned threads make a ton of sense in HPC.

snidane · on May 17, 2020

You might be correct - it may be only my ignorant experience interacting with Apache Big Data technologies of present day and wondering what the hell they are doing under the hood to take them so long to process simple stuff in bulk.

Perhaps open source engineers are not "paid" enough to spend countless hours carefully optimizing their programs on JVM. Properly paid engineers working on proprietary technology can surely bend and twist JVM and make it perform. It might be possible, but costly due to complexity and unpredictable nature of it.

It reminds me of world of SQL where you have to run your query with many small modifications and hope that the optimizer will generate a sensible query plan while the query is still readable. That's the cost of building on top unpredictable systems. You wonder why you can't simply get access to the lower level - physical plans - and program on that level directly, since you know what your desired outcome on that level is anyway.

I still wonder why new projects for performant data processing are not written for example in Rust since for this task it has all the upsides and none of the downsides of JVM.

chillacy · on May 17, 2020

It's fast if you measure throughput or median latency but it tends to have pretty monstrous worst case latencies due to GC.

sz4kerto · on May 17, 2020

Depends. Java's GC can be tuned an infinitum. It's not a simple task and requires knowledge, but that's the beauty of it: if you need low-ish latencies, then you can tune the GC to target that instead of throughput. For example we're using relatively large heaps (partly due to inefficiencies in our code), but we still want to stay under 500 msec/request or so. So we told the G1 GC to target 150 msecs per collection, and then adjusted our heap size / code accordingly. It works well.

If you need really hard limits on collection then that's a tricky problem, but that's also tricky when you're managing memory yourself.

lllr_finger · on May 17, 2020

Once you start talking hundreds-to-thousands requests/second, 500ms is an incredibly long time and you're well past simple tweaks to GC. Tuning GC to a high degree is non-deterministic black magic, which is not what you're looking for at that point.

Simple tweaks can go a long way for a lot of developers, but GC performance has been a problem at the last 3 organizations I've been at - and I'm not in the valley or at a FAANG - so it isn't exactly an uncommon scenario for developers.

apta · on May 17, 2020

The JVM now ships with ZGC, a low latency GC targetted at exactly that. https://wiki.openjdk.java.net/display/zgc/Main

agallego · on May 17, 2020

+1K on this. tail latencies are bad.

StreamBright · on May 17, 2020

GC without tuning. With optimization, not so much.

lllr_finger · on May 17, 2020

LMAX also spent a ton of time doing non-standard things with data structures to optimize cache and prevent heap allocation: https://martinfowler.com/articles/lmax.html

Unnecessary allocations will be your bottle neck at some point when shutting data around.

agallego · on May 17, 2020

Thanks! Yeh I think the same, specially with the rise of core count. I think an i3.metal is at 96vcpus. I estimate a machine like that should hold around 400K partitions just fine with us. I should measure this weekend.

StreamBright · on May 17, 2020

Because most people using JVM in production understand that your claim is without merit and performance in the JVM land can be tuned to the workload and finally it is never a single dimension decision to chose a language/ runtime. JVM had many additional properties that make it an excellent choice for big data use cases.

jonathanoliver · on May 17, 2020

Someone is adding journaling to the NATS protocol to make it Kakfa like:

https://liftbridge.io/

It was just released as v1.0 a few weeks ago.

please-reread · on May 17, 2020

> make it Kakfa like

Kakfa?

DiabloD3 · on May 17, 2020

All you're doing is exchanging one runtime, the JVM, that happens to be well understood, highly optimized, and has a decade long track record, and supported by every major company on Earth... for Go's runtime, one that isn't well understood, does not have a track record, and is a product of a company that is known to just abandon popular services for no apparent reason.

You also seem to have misunderstood what causes some apps to be a pain to deploy: the apps, themselves, suck, and would suck no matter what language they had been written in. Apache tends to become the home for a lot of projects that exhibit that particular issue, and Zookeeper does not seem to be an exception.

Kafka likely also has a case of the Apacheisms going on, but also apparently has a non-trivial amount of Scala in it. Complaining about Kafka fits your requirements far more than Zookeeper alone does.

bananabreakfast · on May 17, 2020

Complaining about Google sunsetting consumer products as a means to disparage their technical contributions to open source software is incredibly trite and a very tired meme.

So, so much of Google depends on Go. It is extremely well understood, has an excellent track record and has zero chance of being "abandoned"

tinco · on May 17, 2020

What about the JVM is well understood? The most common trope about it is that it's supposed to be the fastest VM ever because, but Java devs always manage to build slow apps in it. So much for understanding it well.

The claim that Go does not have a track record is ridiculous. It's whole premise is that it's made to power essential parts of the largest IT operation in the world.

And there's nothing for Google to cancel, it's open source and they even made the effort of translating it to Go so it's super easy to work on.

And I'm not fanboying over Go, haven't written a Go service in years, but denying Go is an excellent platform to build these kinds of apps on is ridiculous.

closeparen · on May 17, 2020

Keystores are clunky, but other than that, why exactly?

powerbook5300CS · on May 17, 2020

Setting a min and max heap size for a garbage collected language in 2020. Just figure it out for me?

Twirrim · on May 17, 2020

There's a black box optimizer for databases, which is designed to take the user away from having to find and tweak all the hundreds of possible tunables in the database, https://github.com/cmu-db/ottertune / https://ottertune.cs.cmu.edu/

I keep wanting the same for JVM. It's probably nice that there are so many ways to adjust the JVM to best suit a very wide range of workloads, but from a practical perspective I rarely have spare time to go diving through the guts to experiment. Let me point something at a one-box, and it can tweak parameters to its hearts content, so I can export and apply to the rest of the fleet.

pjmlp · on May 17, 2020

Well, ART auto tunes and probably not everyone will agree it does it perfectly.

There are also plenty JVMs to pick and some are better than others working with their defaults.

staticassertion · on May 17, 2020

But Go doesn't "figure it out" for you. It decides for you - there's no "figure it out" step.

That's the power of having tunables.

powerbook5300CS · on May 17, 2020

Then they chose better defaults because I never have to think about it.

growse · on May 17, 2020

Just because the defaults are better for you doesn't mean they're better for everyone.

vips7L · on May 17, 2020

There's nothing to figure out. If you want to use all your ram (like in go) just set -Xmx<all_your_ram>.

simtel20 · on May 17, 2020

If you haven't done -Xmx=<all memory> before, you may not be aware that you're giving a bum steer here.

bobbyi_settv · on May 17, 2020

But to know whether you want that, you need to understand the application. Elasticsearch, for example, recommends using at most 50% of physical memory for heap:

https://www.elastic.co/guide/en/elasticsearch/reference/curr...

You also often want to stay below 32 gb to get compressed OOPs

thayne · on May 17, 2020

That is a terrible idea. The JVM basically never returns RAM to the OS, and in my experience, if you run a java process long enough, the max heap size ends up being the used heap size.

vips7L · on May 17, 2020

G1 the default allocator returns heap to the os only when you set -G1PeriodicGCInterval and both ZGC and Shenandoah do it automatically.

powerbook5300CS · on May 17, 2020

Why isn’t that the default? Unless I say otherwise, use it.

Imagine launching every app and having to say ok you get 300mb but not more!

rahilb · on May 17, 2020

It’s the default because Java has a history, and it was set when everyone was deploying war files to machines with 512mb total memory and the default worked for 95% of people for the past 25 years. In fact, it’s still good enough for the chocolate factory that has brainwashed you into drinking the static binary milkshake.

Of course there are ways to use all the available ram when you deploy your kubernetes pods, you just have to consult the documentation.

Honestly this criticism is ridiculous I can’t believe I even bothered to reply.

nine_k · on May 17, 2020

Normally you just run your one JVM app on your VM. The real balance is basically between the heap size and page cache size. You might want to tweak it, as much as the GC algorithm and other memory details. This applies if you understand them.

If you don't, you just start the JVM with defaults, and it configures itself for optimal performance, using its idea of optimal.

nhoughto · on May 17, 2020

Sometimes having more control over GC is useful tho, could save a re-write.

https://blog.discord.com/why-discord-is-switching-from-go-to...

tptacek · on May 17, 2020

They rewrote to get away from garbage collection full-stop, so I don't think this is quite the argument for the JVM that you're making it out to be.

Not to mention: they resolved their Go GC problem, and rewrote the service in Rust as part of a wave of moving services to Rust because they just liked Rust.

jonathanoliver · on May 17, 2020

I'm an active Go user and I've been working almost exclusively in Go since 2011. I have a feeling that over time, the number of knobs on Go will increase much like it has in Java. I'm trying to think back to my first experiences with Java in 1995. It was so simple compared to the overhead of getting things working in C/C++. So Java's got like 25+ years on Go and I would speculate that the number of knobs on Go in 25 years will be significantly higher. On the one hand, I don't like seeing all the crazy configurable stuff, e.g. CGO_....=1 until I actually need it.

koffiezet · on May 17, 2020

Go is already 10yo, so I suppose this will be in 15 years?

jonathanoliver · on May 17, 2020

Oops. Ya, 15. Off by 10 error!

nhoughto · on May 17, 2020

yep so not 100% the same, but my point with linking to it is tunables / levers / control is useful. In other scenarios maybe you could get by without having to do a rewrite in Rust, maybe you could keep it in golang and tune the GC a bit.

Whilst I like the memory ballast as a clever solution, it does smell like a workaround to a solvable problem if a parameter existed to tune it directly.

tptacek · on May 17, 2020

It's totally a workaround and I'm jumping on that last comment when I shouldn't be; it's just a reaction to the hyperbolic "you might have to rewrite" conclusion, because you won't have to rewrite, but indeed you might be annoyed by the lack of tunables.

skyde · on May 17, 2020

are you using Jocko in production?

powerbook5300CS · on May 17, 2020

No, it didn’t support consumer groups last time I checked (but could!).

lchengify · on May 17, 2020

I couldn't be more excited about this! Having finished a Kafka project about a year ago, we had so many Zookeeper production and test environment issues that it was the running joke to check Zookeeper first if anything went wrong.

Honestly Zookeeper in theory is a great idea: Having a centralized service for maintaining config info saves a lot of heartache when dealing with an open source distributed systems project. But in practice, I've never had a smooth experience getting Zookeeper to run consistently, especially with Kafka.

For sure, part of it is that we treat it like oxygen, in that if it's gone for a few seconds everything just dies. But having dealt with similar systems both proprietary and open source, my opinion is that Zookeeper just hasn't risen to the challenges of its users in the past 5 years. If the next generation of software architects want to use open source streaming or distributed systems, Zookeeper needs to be rewritten or removed.

Also shout out to the confluent.io team: I never paid for your enterprise license, but without your blog posts, docker images, or slack room, I never would have been able to get Kafka working. Thanks again!

skyde · on May 17, 2020

i am curious to know why people expect a raft library to be more reliable if embedded inside the kafka controller versus running inside a service like Etcd.

in the end broker will do RPC to a service (kafka controller/ etcd) and this service will use raft to replicate the state.

It should be exactly the same. And if anything knowing which node are running the raft algorithm help you be more careful with rolling restart and upgrade.

gen220 · on May 17, 2020

It might sound kind of trite, but it has less to do with the replication algorithm and more to do with the fact that Zookeeper’s source is quite complicated vs, say etcd, so there are more opportunities for subtle bugs to appear. I encourage you to look at the bug section of the change log for zookeeper, and also the feature list of ZK vs etcd.

Also, etcd powers many critical open source projects, so there are many institutional eyes that actively contribute to its improvement. IME if we ever encountered an issue at work with ZK, we found it impossible to trace it down to a bug that we could fix and upstream. Etcd’s been easier in this regard.

lima · on May 17, 2020

Agreed, etcd is rock solid.

With regards to Kafka, it's probably easier and more robust to add their own consensus layer rather than switching to etcd - Kafka is already a distributed system built by a team of distributed systems engineers. It makes sense for them to build their own consensus, deeply integrated with the replication mechanism, rather than relying on an external database.

ashtonkem · on May 17, 2020

It might be an effect of popularity, but I rarely hear complaints about reliability out of Etcd, while I hear about ZK issues on a consistent basis.

nealabq · on May 17, 2020

I expect Pulsar to stay with Zookeeper. Kafka currently stores topic and partition info on ZK ( https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data... ), which can get to be a lot of data. But I think Pulsar only stores server names and basic config info on ZK ( https://pulsar.apache.org/docs/en/administration-zk-bk/ ), which is much more managable.

colin_mccabe · on May 17, 2020

But I think Pulsar only stores server names and basic config info on ZK ( https://pulsar.apache.org/docs/en/administration-zk-bk/ ), which is much more managable.

Unfortunately, this is not correct. BookKeeper stores a lot of information in ZookKeeper. By extension, Pulsar (which is based on BookKeeper) also stores a lot of metadata there as well.

For example, from the BK documentation ( https://zookeeper.apache.org/doc/r3.3.6/bookkeeperOverview.h... ):

An application first creates a ledger before writing to bookies through a local BookKeeper client instance. Upon creating a ledger, a BookKeeper client writes metadata about the ledger to ZooKeeper. Each ledger currently has a single writer. This writer has to execute a close ledger operation before any other client can read from it. If the writer of a ledger does not close a ledger properly because, for example, it has crashed before having the opportunity of closing the ledger, then the next client that tries to open a ledger executes a procedure to recover it. As closing a ledger consists essentially of writing the last entry written to a ledger to ZooKeeper, the recovery procedure simply finds the last entry written correctly and writes it to ZooKeeper.

biggestlou · on May 17, 2020

That is correct: Pulsar uses ZooKeeper relatively lightly and only for things strictly necessary for the cluster to run. Everything else (message data, topic info, cursors, schemas) is stored in BookKeeper.

Would still be great to see ZooKeeper made superfluous, though.

manigandham · on May 17, 2020

True, Pulsar's architecture is more reliable and scalable than Kafka, but it would still be nice to remove any extra dependencies.

rohitshekhar · on May 19, 2020

The bookKeeper has a nontrivial dependency on the ZK. Sooner or later Pulsar needs to solve the same issue.

hugofirth · on May 16, 2020

I never know where I sit on stuff like this. On the one hand if you’re confluent I think it makes total sense to own this part of your infrastructure. Especially if it lets you improve your operability story.

On the other hand I feel like projects should try and use open source “building-blocks”, like etcd and zookeeper, when building their distributed systems. Not only does this help iron out correctness bugs, but it also means that more people are familiar with the quirks, limitations, requirements etc.... of these tools. For example, I think I would be frustrated to hear that K8s were implementing their own raft.

sagichmal · on May 17, 2020

Counterpoint, if I need to deploy technology T to solve problem P, I don't want to have to also deploy a flotilla of support technologies because modularity or whatever. ZooKeeper was always an implementation detail of Kafka, the fact that you had to manage it separately was an abstraction leak that I'm happy to see fixed.

saurik · on May 17, 2020

I feel like this just argues that ZooKeeper should be a library that can be embedded in the servers of other projects (even if it is conceptually a separate server in the same memory space listening on its own ports; but like, it is otherwise entirely hidden inside of the other program and is entirely managed and configured by it also).

skyde · on May 17, 2020

100 % agree we need good (etch/zookeeper) as a library.

I wrote something similar but that provide only (lock/lease) using paxos.

The problem with raft ETCD and zookeeper replicates state machine design is that : - machine leaving and joining the ensemble dynamically is something hard to do correctly - optimal quorum size is no more than 5, you can setup other node as observer but it’s hard to decide which of 1000 node should be in the quorum ...

lima · on May 17, 2020

The etcd server can be embedded in other projects, and their Raft library is used by plenty of other projects like CockroachDB.

skyde · on May 18, 2020

yes it can but it’s a real pain compared to hashicorp version. I really hope they clean up the library to have a simple API.

This way it could become the standard for GO and we can stop wasting effort on multiple raft implementation.

chucky_z · on May 17, 2020

Isn't this exactly what lead to Hashicorp's memberlist library being used all over in random projects?

sagichmal · on May 17, 2020

Sounds good to me, more things like ZooKeeper should be embeddable.

halbritt · on May 17, 2020

The fact that many people are scared away from Kafka simply because they don't want to run zookeeper should be telling.

manigandham · on May 17, 2020

This is the difference between a library and a framework/project.

For example, there's a great Raft library for Go [1] that any project can use to implement distributed consensus without a separate running program. I find this to be a better approach with the same collective development and community testing advantages but without requiring more operational overhead.

1. https://github.com/hashicorp/raft

skyde · on May 17, 2020

this lib u is great and 1000 time cleaner better tested and easier to use compared to trying to reuse etcd raft implementation :-)

smarterclayton · on May 16, 2020

The difference is that Kubernetes isn’t a data store - we’d have to implement the full functionality of etcd. Kafka is a full featured data store that doesn’t need 80% of what zookeeper does.

This was actually an early complaint leveled against Kubernetes for things like proxying at the node level or implementing DNS. “Don’t reinvent the wheel!” Sometimes better administrative experiences exist only after a component absorbs some function previous systems expose.

Sometimes it’s better to own the parts of the problem that make your system simpler.

star-trek-fleet · on May 17, 2020

No etcd don't need to be replicated for k8s API server.

Basically versioning + some form of fault tolerance is sufficient for k8s API server.

kirstenbirgit · on May 17, 2020

I generally think it makes sense to start with readily available tools, but as a project grows and becomes more specialized, it often shows that the tool doesn't quite fit, doesn't quite have the same performance characteristics, or otherwise not a good fit anymore. And then it's time for something better. Or, it works out, and you don't switch it out (e.g. etcd in k8s)

gregwebs · on May 17, 2020

There are a lot of shared raft libraries now. It is much easier to share library code than a piece of a distributed system.

whorleater · on May 17, 2020

If ZK wasn't such a pain in the ass to manage I'd agree with you.

skyde · on May 17, 2020

this pain come from paxos raft or Zab algorithm used by concensus system. I don’t think it can be avoided.

fourseventy · on May 16, 2020

This is cool. One thing I love about Cassandra is that you don't need Zookeeper, its just a bunch of homogeneous nodes.

mavelikara · on May 17, 2020

Same with Elastic Search.

kungfufrog · on May 17, 2020

When will this be actually released and available for use?

I'd love to ditch Zookeeper (the weak part that falls over more regularly) in our current Kafka cluster.

eatonphil · on May 16, 2020

Here's the proposal in full.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A...

19h · on May 17, 2020

Finally! Having had to deal with properly setting zookeeper up on multiple mesos clusters was so painful and recovering from outages via s3 backups so excruciating.. I feel somewhat ashamed that in two out of five cases we simply went with the managed Kafka by AWS because the team didn’t feel confident enough to maintain the cluster.

gen220 · on May 17, 2020

Honestly this is probably the right thing to do. My cynical conspiracy is that the ZK dep was organized by cloud vendors to keep customers coming :)

It’s been at least a year and a half since we’ve had severe data loss on our homebrew Kafka cluster, but after the first couple you never look at Kafka+Zk the same way... both iirc were due to leadership election bugs that had been reported many months ago with no progress on a solution.

I have no idea how AWS internally puts up with this. I wouldn’t be surprised if they’d replaced the ZK dependency internally years ago.

xorchasiv · on May 17, 2020

I might be out of my realm here but would it be more efficient for the Kafka brokers to use a gossip protocol to distribute the metadata?

jdean677 · on May 17, 2020

Gossip protocol is useful for propagating node failures or events that are fine with being eventually consistent.

It cannot be used for leader election as these events are time sensitive and needs to be consistent across the cluster within a short duration, that is the reason we have raft & paxos.

xorchasiv · on May 17, 2020

Ah I see, thank you for the clarification

devin · on May 17, 2020

Amazing that people chose to operationalize Kafka with this completely unnecessary dependency. Good riddance!

ironfootnz · on May 16, 2020

Well... it’s just Kafssandra now.