Covering Scientific & Technical AI | Thursday, December 5, 2024

We’ll Be Enslaved to Proprietary Clouds Unless We Collaborate 

(Vintage Tone/Shutterstock)

Cross project integration is not exactly prevalent in today’s open source ecosystem, and it’s a problem. Open source projects that enable large scale collaboration and are built on a layered and modular architecture – such as Linux – have proven their success time and again. But the Linux ideology stands in stark contrast to the general state of much of today’s open source community.

Case in point: big data ecosystems, where numerous overlapping implementations rarely share components or use common APIs and layers. They also tend to lack standard wire protocols and each processing framework (think Spark, Presto and Flink) has its own data source API.

This lack of collaboration is causing angst. Without it, projects are not interchangeable, resulting in negative repercussions for customers by essentially locking them in, and slowing down, the evolution of projects because each one has to start from scratch and re-invent the wheel.

Changing Times

Several years ago, the focus was on creating modular architectures using standard wire protocols, such as NFSRPC, and standard API layers, such as BSD, POSIX and others. Buyers who purchased products from different vendors found they worked well together and were interchangeable. There were always open source implementations of the standard, while users also built commercial variations to extend functionality or durability.

That philosophy has changed.

At last December’s re:invent show, Amazon announced a bevy of new AWS products. But rather than targeting infrastructure companies, these simple, easy-to-use solutions are not only going head-to-head with established database vendors, they also compete with the open-source big data eco-system and container eco-system, security software and even developer and APM tools.

While open source is a key ingredient, Amazon (and other public cloud vendors) is proving that usability and integration are more important to many customers than access to an endless variety of overlapping open source projects. Moreover, customers see clearly that proprietary cloud solutions may lock them in, but for many of them, simplicity and time to market out-wins long-term freedom.

It’s interesting to see that while Amazon seemingly attacks the open source eco-system and highlights the advantage of its own tools, it is at the same time taking projects and turning them into a packaged, revenue generating products.

What needs to be done?

The tech industry is in a period of monumental flux, driven by digital transformation that is changing the infrastructure and software stack dramatically. The old guard is in survival mode and we seem to be missing responsible tech leadership that will define and build a modular stack for the new age. We must think ahead and work together with a focus on integration, not code. Hopefully all cloud providers will join and we can have interchangeable/pluggable cloud services.

The solution lies in moving toward with greater collaboration and full stack integration rather than having hundreds of overlapping open source projects. But who gets to decide which baselines are best, and how?

Right now, the Linux Foundation and its members, like iguazio, are trying to bring order to this mess. CNCF and its child projects, such as Kubernetes, have working groups (SIGs) to define APIs and implementations, open communications channels in Slack, and agreed-upon release schedules. There’s hope for greater collaboration across container frameworks as well.

In fact, hundreds of participating companies are now making contributions. Some are also profiting from parallel products or management tools that make user life simpler, but that’s certainly acceptable since it helps them finance open source work. What matters most is the baseline architecture and APIs between layers so that customers have the freedom to use different components.

We must seek synergies between cloud and big data open source initiatives. It doesn’t make sense for both to run independent scheduling, security and data management. After all, most big data solutions are deployed in the cloud!

Now it is the time for the Apache big data ecosystem and ODPi to follow suit in terms of defining layered APIs. The recent common file abstraction (HCFS) is a beginning But more work is needed to eliminate project overlap and API sprawl. Taking this crucial first step will also likely help ODPi obtain more active participants.

The fact is, collaboration will vastly improve user experience. With these changes, users could easily build, secure and operate integrated stacks from independent components, while offering the ability to swap parts if needed. And all without locking them to project specific or cloud provider APIs.

The time for collaboration is now. Without moving in this direction, the industry and users all lose, leaving us to become technically enslaved by proprietary clouds.

Yaron Haviv is founder and CTO at iguaz.io

AIwire