First impressions of Redpanda
By Daniel Rossos, Software Engineer Intern, Mehryar Maalem, Software Engineer at IBM, and Shahir A. Daya, IBM Distinguished Engineer and Chief Architect
Daniel, Mehryar, and I work in the field as part of IBM Consulting. We have been working with several large organization on initiatives that include an element of Event Driven Architecture. The Apache Kafka APIs have become a de facto standard for Event Driven Architecture and Stream Processing. We have had the opportunity to work on a few large Kafka implementations and have found that Kafka is complex to implement and operate especially when you have a large cluster with lots of data and stateful streaming applications.
While we were trying to look into what we needed to put in place to be able to reliably operate a large implementation, we happen to see a blip on ThoughtWorks April 2021 Technology Radar titled “Kafka API without Kafka”. And that is how we ended up at Redpanda. What we want to do in this article is to share our first impressions of Redpanda and to also share the code we have developed to deploy Redpanda to IBM Cloud.
What is Redpanda and why was the technology interesting
Redpanda is a streaming data platform that is compatible with the Kafka APIs. It is built using C++ so there is no JVM and no Garbage Collection. There is no Zookeeper. It is based on the RAFT consensus algorithm. It uses a thread-per-core model among several other techniques that make it fast. We highly recommend watching Alex Gallego’s (Founder and CEO of Redpanda Data) talk on Co-designing Raft + thread-per-core execution model for the Kafka-API where he covers many of the techniques in detail. The compatibility with Kafka APIs, the fact that it was built from scratch to take advantage of the current state of hardware, and with operational simplicity as a goal made this very interesting for us.
Deploying Redpanda on IBM Cloud
To ease the installation process, provisioning the IBM infrastructure and configuring Redpanda can be automated. This automation is done utilizing Terraform for infrastructure provisioning and Ansible for automating Redpanda deployment. We have contributed to the [deploy-automation] repository for Redpanda to include IBM Cloud support. To get started:
- Clone the repository with terraform scripts and Ansible from [here]
- Follow the instructions in this [README] to set up your IBM Cloud account as well as provision the infrastructure using terraform.
- Once your infrastructure is successfully set up, follow the instructions [here]
At this point, a 3 node Redpanda cluster will now be running across the newly provisioned VSIs as well as a Grafana dashboard. The terminal should display a list of all the broker address as well as the Grafana address that you can use to connect to the cluster and see the dashboard.
Compatibility testing
Coming from Kafka, our first goal was to check that our existing stack of Kafka Streams applications based on SpringBoot and Avro Schemas work seamlessly with Redpanda as advertised. This was to simulate a scenario where we swap Kafka cluster with a Redpanda cluster. To do so, we took a demo application that used identical package versions and set up that we run in production at our clients and simply replaced our Kafka bootstrap servers and confluent schema registry with the Redpanda services.
We were pleasantly surprised that this worked out of the box without any tweaking. You can find our demo application and try it for yourself [here].
In addition we also completely moved our sink connectors and had no issues either. We managed to essentially completely substitute Kafka with Redpanda across our deployment. The only change was new bootstrap server URL and the change was fully transparent for our applications.
First impressions
- Simplicity of installation and management: The main take-way from setting up Redpanda was its simplicity. Redpanda comes as a single binary. There is no Zookeeper and Schema Registry is also built in. Given our day job with Kafka where we maintain three separated distributed applications consisting of brokers, zookeeper, and schema registry, this was a welcome change. This drastically reduces the complexity of chaining together deployments, managing multiple configuration files, as well as in case of debugging issues.
- Redpanda CLI is fantastic: The CLI provided was very intuitive to use as well as providing a great amount of power too. It provides an easy way to check the status of your cluster and topics while not requiring the use of external tooling or additional GUIs. As an operator, we we really welcomed the developer friendliness of this tool.
- Kafka client applications were fully compatible Redpanda was fully compatible with our existing Kafka Streams applications and Kafka Connectors. Initially, we were worried that we would need to reconfigure or change the dependencies of our applications, but through our compatibility testing, it was just as simple as pointing our application to a new brokers and schema registry and our applications functioned as intended.
Conclusion
We are continuing to work with Redpanda and it is a technology we have become comfortable proposing on our client engagements. If you are using Redpanda and have lessons you’ve learned, please feel free to share them via comments. We are always learning and would appreciate hearing what others experiences have been.
We also want to thank Alex Gallego, Patrick Thompson, and Patrick Angeles for spending time with us and supporting us as we got nerdy with Redpanda.
References
[1] “Technology Radar”, ThoughtWorks, 2022. [Online]. Available: https://www.thoughtworks.com/en-us/radar. [Accessed: 22- Mar- 2022].
[2] “Apache Kafka”, Apache Kafka, 2022. [Online]. Available: https://kafka.apache.org. [Accessed: 22- Mar- 2022].
[3] “Redpanda”, Redpanda.com, 2022. [Online]. Available: https://redpanda.com. [Accessed: 22- Mar- 2022].
[4] “deployment-automation/ibm”, GitHub, 2022. [Online]. Available: https://github.com/redpanda-data/deployment-automation/tree/main/ibm. [Accessed: 22- Mar- 2022].
[5] “IBM Cloud”, Ibm.com, 2022. [Online]. Available: https://www.ibm.com/cloud. [Accessed: 22- Mar- 2022].