16 min read

My story with Knative

In this longer than I expected blog post, I wanted to review my personal story with the Knative project. Everything happens for a reason and my relationship with Knative didn't start because I was forced to look into it. Knative was announced and it was designed to solve a set of problems that I was trying to solve, so very early on it just made sense. Unfortunately, Knative is one of these projects that unless you are working 24/7 on a Cloud-Native transformation and you start hitting challenges it is quite hard to get. On the other hand, if you are faced with these challenges, Knative makes complete sense, to a point where you cannot go back.

TL;TR: It is my firm belief that Knative should be installed in every Kubernetes Cluster out there, not only because it is mature enough and it is already being used in Production-grade environments, but because it just makes your life easier when designing and running Cloud Native applications. With the 1.0 GA release and several platforms bundling Knative such as VMware Tanzu, Red Hat Openshift and with a managed service Google Cloud Run, the Knative community is thriving. With the prospect of Knative going to the CNCF I think it is time to start thinking about how Knative 2.0 might look like. So I am taking a well-deserved Xmas break to push forward the Continuous Delivery for Kubernetes book and brainstorm about what challenges I will be taking on 2022.

Early Days

When left Red Hat / JBoss at end of 2016, I was working as part of the Drools & jBPM team, both Java Frameworks around Business Process Management and (Decision) Rules Management systems. I was lucky enough to get a very early exposure to Docker and Kubernetes as Openshift (3) Origin was in the cooks. The first reference that I can find publically about this work dates back from Oct 2015, but I am sure that my work started in early 2015. Docker and Kubernetes weren't easy those days, working with Java and containers was rough. As a Java Developer, I needed to get into these projects without fully understanding the future potential or even architectural changes on how we designed our tools and frameworks. In the world of Kubernetes, a Java Framework is only something that you add to your containers and previous scalability mechanisms provided by Application Servers were quite different to the ones proposed by Kubernetes. In 2016, we went full into Openshift/Kubernetes and we got some stateless services deployed in a high availability setup, but there was still a long way to go, as stateless scenarios were just the beginning. From these two years of working with Docker and Kubernetes inside Red Hat, and with my Java background something was clear, Developers will need to learn a lot of new tools and the way in which we designed Java applications was changing drastically (for the better I want to believe). This realization pushed me to look into frameworks like JHipster and Spring Cloud, unfortunately working at JBoss my exposure to Spring was almost zero, we always worked with "standards" like Java EE and CDI. I remember playing around with Fabric8.io which in my opinion were the most advanced team inside Red Hat working on tooling for Kubernetes, surprisingly enough they were pushing for Spring and Spring Boot really hard. The Fabric8 team (lead by James Strachan, Rob Davies and James Rawlings) created one of the first Kubernetes APIs Java Clients that was heavily adopted until the official SDK came out and it is still used in some Red Hat projects.

Making a case for Kubernetes and choosing the right asbtractions

I joined Alfresco right after leaving Red Hat in January of 2017 as the Technical Lead for the Activiti project. My mission was to use my previous experience to rearchitect, once again, a Java Framework to a more Cloud Native approach, but this time I was free to take advantage of the Spring, Spring Cloud and JHipster community. That's when reality hit me, even if Red Hat was heavily investing in technologies like Kubernetes and Docker, Kubernetes was not widely adopted and customers were not actually asking for it. Making a case for supporting Kubernetes in 2017 was hard, making a case for going full-on Kubernetes was almost impossible. Remember that at this time Amazon, for example, didn't have EKS (kops fun times) and AKS from Microsoft was in its early infancy. But if you have worked with me or have followed some of my blog posts you know that I like to show by doing and that's why Activiti Cloud was born. Activiti Cloud was heavily influenced by the Spring Cloud community as they were setting the right tone for Cloud Native development. The same year (2017) in Barcelona Spain, I've presented at the jBCNConf conference about what I believe was the start of my journey into building Platforms on top of Kubernetes. Right after JBCNConf, I've published a blog post titled Activiti Cloud meets Kubernetes, which highlighted the importance of the new architecture and why the ultimate place for running all these services was Kubernetes.

It was clear to me that if software companies were moving their projects/products to the cloud and to a more SaaS approach they needed to rearchitect for Kubernetes and make sure that their tools adapted to this new ecosystem. At the end of the day, tools like Drools, jBPM, Activiti are all about software integration, hence being a first-class citizen in the ecosystem where they operate is key for their success.

Early in 2018, and making significant progress on the Activiti Cloud project, not everything was perfect, and to be honest, things got very complex quite fast. We knew that we were on the right track but the complexity was becoming a thing. I remember that there were two topics that kept me up at night, the first one was dealing with the impedance miss-match between Spring Cloud and Kubernetes (very few people were evaluating this space and publishing articles about this at that time, Christian Posta from Red Hat, now Solo.io was a reference at that time)  and the second one was around software development practices and how to deal with a number of software components that kept growing (we were creating tons of GitHub repositories), serious automation was clearly needed.

Spring Cloud and Kubernetes

Here is where some history notes are important because the Spring Cloud framework came from the experiences around Cloud Foundry (pre-Kubernetes era) plus the first incarnation of the Netflix OSS Stack and hence there was a significant overlap with some of the core components and how the integrations were designed. Some design decisions around integrations mostly required us to make hard choices between the Spring Cloud world and the Kubernetes world, check my blog post about creating a unified integration mechanism between Activiti Cloud and external systems using messaging. Once again, Spring Cloud was a big part of this design, because how messages were exchanged between systems shouldn't be defined by the project, customers should be able to choose their prefered implementation of a messaging system. The idea here was to abstract away the integration mechanism from a messaging implementation such as ActiveMQ or RabbitMQ (implementations that were widely used by our existing customers). Exchanging messages also enabled a more reactive and event-based approach, meaning that we can capture these messages for auditing or for cross-functional integrations. Interesting enough, we were designing for Kubernetes, but the exchange of these messages and how services were wired together was handled by a framework inside our containers. This required every integration to include a certain set of libraries (spring cloud abstractions), which can be a big deal-breaker because in some way you are wasting a polyglot approach promoted by the use of containers.

For the kind of distributed system that we were designing, we needed to provide a solution for templating and enabling system integrators to create projects without understanding how all the pieces fit together. There was a clear alignment here with the JHipster community, as they were using Spring Cloud and the Netflix OSS stack to scaffold microservices and solved shared concerns about for example how to implement Identity Management for all the services in our applications using Keycloak.

Here is when once again my previous experience with Kubernetes and my newfound (and well funded) passion in the Spring Cloud community led me to Spring Cloud Kubernetes. Everything was aligning, the Spring Cloud community was catching up with Kubernetes and the idea of writing Kubernetes native components (in other words, extending Kubernetes) started to make sense. Funny enough, this story is full of cross-references and the same people popping up, again and again, showing the way forward on how to solve the key challenges of Cloud Native development. Guess who donated Spring Cloud Kubernetes to the Spring Cloud community, yup the Fabric8 team from Red Hat. This was not only a huge validation of the decisions that we were making, it was also a huge call out to push forward with platform design in contrast with a more traditional product design. We were not building end-user software, we were building a platform of software components that will run inside a Kubernetes Cluster alongside other (3rd party) components, they all should co-exist and play nice with each other. More on this platform thinking later on.

Automation is Key

On Activiti Cloud we were a very small team of 4 developers, we had around 80 repositories mixing libraries, services that were built into Docker containers, repositories with Kubernetes manifests, examples, and Bill of Materials repositories. We couldn't operate and keep evolving the platform (at this point we knew this was a platform already) that we were building without having everything automated. But here again, reality hit hard. There was a cultural shift of having a single big repository with a single build pipeline and some release scripts that were used once a year. Switching from this monolithic approach to having 80+ repositories that we knew will keep growing was hard. So we shift-left, we decided to take the entire lifecycle of the components that we were building, from coding to releasing the artifacts in a continuous fashion. This was made possible just because Jenkins X was created (sponsored by Cloudbees). Jenkins X main mission was to solve CI/CD for Kubernetes in a very opinionated way. Jenkins X only run in Kubernetes and pushed hard on software delivery practices as Trunk-Based Development and one repository per service, which maps really well to one repository per Docker container. The Jenkins X project provided loads of defaults around how to build containers and how to create the YAML files required to deploy these services to Kubernetes. The Activiti Cloud team (just the 4 of us) were not in the business of defining software practices and conventions, so it made a lot of sense to just adopt what the Jenkins X project was proposing to save time and focus on delivering. By using Jenkins X, we started releasing artifacts multiple times a day and we heavily followed how the Jenkins X team was releasing Jenkins X itself. In 3 years, Jenkins X created more than 4K releases a year. Besides the big impact that the Jenkins X project had on Activiti Cloud's day to day operation, Jenkins X was solving some challenges that every project running on Kubernetes needs to solve sooner or later and that is the integration of multiple projects together and the inherent complexity of keeping up with the Kubernetes ecosystem. By looking at how the Jenkins X project glued different pieces together, how they dealt with changes in upstream projects that they didn't control and how they controlled chaos we learnt a lot of valuable lessons as a team.

Activiti Cloud at this point needed Jenkins X and Jenkins X highlighted the need to start thinking about multiple cloud providers. Remember that this was early 2018, designing for multi-cloud with a team of 4 developers was considered crazy. Because we were heavily relying on Jenkins X there are a couple of blog posts and a conference workshop that I did with the Jenkins X team about our experience using these tools. I remember clearly, that before the workshop in Barcelona, we were all working 100% on Google Cloud, we have dropped docker-compose and local development was a thing from the past.

Two more important "internal" and historical details about the Jenkins X project, first they were full-on GoLang building glue for Kubernetes tools such as CLIs, Kubernetes Controllers and Services, libraries to abstract Git providers, etc. This highlighted to me that the Kubernetes community was going way much faster than other communities (for example the Java community) and that they were paving the way on how to extend Kubernetes with a multi-cloud approach in mind. They started with their own scripts to deploy to multiple cloud providers and later on moved to use Terraform to automate all these processes. Secondly, If you do a bit of research on Jenkins X you will notice that once again the creators were the same team behind the Fabric8 project (James Strachan and James Rawlings), they have moved from Red Hat to Cloudbees to work on Jenkins X and build a huge community of adopters. It was really interesting how Jenkins X, the Open Source project was massively adopted and how the company behind the project faced challenges finding the right business model to commercialize a project like this. Once again, something that every software company going to SaaS and leveraging the Open Source model face sooner or later.

So.. Knative

After this very long introduction, in July of 2018 Knative was announced in Google Next and it just made sense. We were building a platform on top of Kubernetes, but Kubernetes Built-in abstractions weren't enough. This pretty ugly (in my opinion, I mean no offence to anyone) diagram showing how Knative works on top of Kubernetes and the personas involved while developing for Kubernetes made it clear that the right abstractions to solve the challenges that we were facing were being built.

Knative aimed to make life easier for developers and provide higher-level abstractions that make sense for developers to go faster.

Note: The previous diagram reflects the image used in Google's announcement and very early on Istio was removed as a mandatory dependency for Knative. Nowadays you can choose different networking implementations depending on your needs.

The first public reference that I can find about me talking about Knative is in Jan 2019 (5 months after it was announced). Imagine that if it was hard to make a case about Kubernetes, now making a case about Knative and the initial Istio dependency was once again considered way too much by any sane person/architect. But it was clear at this point that Activiti Cloud core components and how it will connect to other systems needed to be adapted to follow some principles introduced by Knative in the Kubernetes world. Some of our early decisions adopting Spring Cloud needed to be reviewed and adapted to make sure that we align well with the Kubernetes community. With the introduction of Knative there were new things to consider: how projects were extending Kubernetes and how we enable users to adopt these complex tools.  At this point in time, I've started following the founders of the project including Evan Anderson, Matt Moore, Ville Aikas and N3wScott.

Due to the fact that the company sponsoring Activiti Cloud was sold a couple of times and the focus was drifting, I decided to move to my next adventure. But right before leaving, I wanted to test a couple of assumptions.  I was thinking a lot about the concept of having an abstraction called Application for a set of distributed services and how this can be implemented in the Kubernetes world. The answer was quite simple but scary. We needed to extend Kubernetes with a Custom Controller and CRDs. Instead of writing this component for Activiti Cloud, I used the JHipster Community to test my assumptions. I've created my first public Kubernetes Controller/Operator and presented it at the JHipster Conference in Paris.

Before starting a new gig, I joined LearnK8s as a Kubernetes instructor and KubeCon San Diego happened, where I managed to meet the Knative team in person at a CloudEvents WG meeting. Once again, when you meet and understand what are the challenges that these teams are tackling you are faced with a window to your future problems.

More Platform Thinking and Knative

I joined Camunda early in 2019, the company has been designing a new Workflow Engine that can scale for large scenarios, but guess what... Kubernetes was not the target platform to run this new engine. Once again, this is a common phase while adopting new technologies, Kubernetes was an option but not the default one and supporting tools like docker-compose and even just starting each component using `java -jar` was still an option in the docs.  Making a better experience for users to get started with the project in Kubernetes became my main focus.

Looking at my blog posts from that time, you can clearly see that I couldn't let go of the idea of using Knative as a simplification layer for people to work and interact with Kubernetes. You can also see that Knative Eventing became the right abstraction layer for eventing and messaging in Kubernetes.

It started to make sense to also look into the Go programming language. Camunda started building Camunda Cloud, a managed service using Kubernetes to host and provision their Workflow Engine for their users. It became clear that users interested in this managed service will want to connect their workloads running in their own Kubernetes cluster with this new service. From a service consumer point of view, I will need some kind of bridge to connect my Kubernetes world to a remote service that exposes some custom and domain-specific APIs. It also made some sense to provide an easy way to install the Workflow Engine into a Kubernetes cluster for development purposes, when using the managed service didn't make sense, for that some Helm Charts did the work.

After attending KubeCon and getting my hands on kubebuilder, I've decided to build this remote bridge between the managed service and a consumer Kubernetes Cluster, basically a new Kubernetes Controller to abstract the fact that services were provisioned outside the consumer Kubernetes Cluster.

While this solved the operational side of creating remotely managed resources outside of your cluster, it didn't solve the data plane challenges, how data will be moved between our workloads and the managed service. But I knew the answer to that question already, the Knative team have shown me the way: CloudEvents and Knative Eventing. Time to build another component to make sure that data can flow in and out from the managed service, a domain-specific CloudEvents router. This closed a full circle, allowing me to present at the Knative Meetup to talk about the challenges of dealing with CloudEvents and the advantage of using CloudEvents for system integrations. At this meetup, I've shown a more advanced use case, where the controller created Knative resources dynamically based on the use case requirement. In other words, I've built a domain-specific simplification relying on Knative mechanisms to build and wire complex CloudEvents orchestration patterns. I know for a fact, that this is becoming more and more popular, and if you are building platforms (or SaaS offerings) on top of Kubernetes you might need to build a solution like this one.

Finally, it was pretty evident that if you are designing platforms on top of Kubernetes and you are a software provider you need to cater for On-Premises and Multi-Could deployments of your platform. This is mostly associated with highly regulated industries that can't just accept that your managed service is deployed into a Cloud Provider that for example doesn't have servers in your country or doesn't pass a very specific industry certification. To tackle these challenges,  I've started working with the Crossplane project, which I believe is another fundamental piece for modern platform abstractions.

Before leaving Camunda, I realized the Operator/Controller that I have built overlapped quite heavily with what Crossplane Provider do, hence I refactored the Operator to be a Crossplane Provider. By doing this, I've enabled a provider-specific service to become part of a larger ecosystem.

But it was, once again, time to move to my next adventure, this time to work on the Knative OSS project 24/7 for VMware.

Knative OSS Today and in 2022

Based on my background and my passion for building platforms and making developers life easier to work with Kubernetes joining the Knative OSS team at VMware was a no brainer.

I've been following the project since it was first announced by Google (2018), and I've been always interested in how the project is organized due to the fact that there are multiple large companies (Google, VMware, Red Hat, IBM, TriggerMesh, Alibaba Cloud, SAP, etc) involved and collaborating pushing the project forward. Multiple companies are bundling Knative and creating products on top of it and it has been no secret, projects and organizations have been using Knative in production for some time now. Are you using Knative? The community will appreciate it if you add your company/organization to this issue.

It really feels like I joined the project in a key historic moment, as one of the first tasks that I got involved with was the Knative Spec Conformance something that is a clear indication that there are multiple people interested in adopting the project and also being able to provide assurance to their users that all the public interfaces and behaviours are respected in their products. With the announcement of Google Cloud Run in 2019, a managed service for Serverless conforming to the Knative Serving APIs, the community and companies adopting Knative got just another big reason to bundle Knative into their products. While there has been always tension with the fact that Google still owns the trademark of the project, all the companies involved keep pushing forward. While there is a real need for the project to be managed by an independent foundation, I am quite happy to have experienced first-hand how the Knative community individual works and the long term commitment of everyone involved to make sure that the project keeps evolving and maturing.

It was really clear to me that Knative will become an important piece in the Kubernetes puzzle and for that reason, I've started writing the Continuous Delivery for Kubernetes book for Manning at the beginning of this year (2021). While Knative is mostly associated with Serverless, I believe that at the end of the day when you adopt tools like Knative you are closer to being able to continuously deliver software in a more efficient way, with less friction and with the right tools for the different personas involved in the process. The reception from the community to the book early access program has been mind-blowing, I can thank everyone enough! There is still a lot of work to do, but because of all the comments that I've received this book is already way better than whatever I can produce on my own.

I have the pleasure to participate in the 1.0 release process and I can assert that there is a shared goal, no matter in which company you work for, to make this project successful. It is a very satisfying experience to work with world-class engineers solving problems that nobody else is solving at this scale. While going through the release process, I was grateful to the lessons learnt with Jenkins X and Activiti Cloud as the crazy number of repositories, dependencies and issues associated with having tons of projects that depend on each other can overwhelm you easily if you haven't experienced something similar in the past.

Having worked with some tools to improve the developer experience in the past and now getting involved with func and its integration with Tekton, I can't ask for more interesting projects to work on. It is also very rewarding to see the work of the RabbitMQ team to fully support the RabbitMQ Knative Eventing implementation. Having this kind of support and initiatives happening in the open makes the adoption of Knative a much more straightforward process.

With the 1.1 release scheduled for the 14th of December 2021, and with the year coming to a close I think that the whole community is getting an early Xmas gift. This Pull Request created today in the CNCF TOC repository can make Knative even bigger. While there are no guarantees of the project being accepted into the CNCF incubation process, this is a very welcomed initiative from Google towards a more open and collaborative future for the project.

Finally, as you can probably guess from this blog post, I am quite passionate about these topics, so if you are interested in learning more, getting involved with the community or just have a chat feel free to get in touch via a comment here or via a DM in Twitter: @salaboy, as my DMs are open.