Skip to content
This repository was archived by the owner on Nov 22, 2024. It is now read-only.

Started introducing cloudflow-contrib thoughts into cloudflow docs#1046

Merged
debasishg merged 5 commits into
masterfrom
docs-with-contrib
May 24, 2021
Merged

Started introducing cloudflow-contrib thoughts into cloudflow docs#1046
debasishg merged 5 commits into
masterfrom
docs-with-contrib

Conversation

@debasishg

@debasishg debasishg commented May 7, 2021

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Introducing cloudflow-contrib thoughts into cloudflow documentation for the version where both options co-exist

Why are the changes needed?

The changes are required so that the users now have the option to use the externalized integration of Flink and Spark as implemented in cloudflow-contrib

Does this PR introduce any user-facing change?

Yes, cloudflow-contrib is also an option now for users to use Flink and Spark

How was this patch tested?

No testing, just documentation changes

In the current version, Cloudflow includes backend `Streamlet` implementations for Akka, Apache Spark - Structured Streaming, and Apache Flink.
Using these implementations you can write business logic in the native API of the backend.
Additionally, Cloudflow can be extended with new streaming backends.
Native support for Flink and Spark streamlets are supported as _legacy_ versions and will be discontinued in future. The current version introduces Flink and Spark integrations with more controls in the hands of the users. Users now have more control on deployment and management of Flink and Spark streamlets while still using the same Cloudflow streamlet API for developing their business logic. However, Akka will continue to be supported natively as in earlier versions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should not yet announce the current integration as "legacy"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Also native support for, and flink native, seems to have been mixed. We're not going to discontinue the native integration (through cli native kubernetes features). We have to come up with a better name for the current integration style

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe supported by the cloudflow operator vs via the CLI?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will make it sound like both the strategies are available.

Using these implementations you can write business logic in the native API of the backend.
Additionally, Cloudflow can be extended with new streaming backends.
Native support for Flink and Spark streamlets are supported as _legacy_ versions and will be discontinued in future. The current version introduces Flink and Spark integrations with more controls in the hands of the users. Users now have more control on deployment and management of Flink and Spark streamlets while still using the same Cloudflow streamlet API for developing their business logic. However, Akka will continue to be supported natively as in earlier versions.
Integration support for Flink and Spark, thus being externalized, makes Cloudflow easily extensible with new streaming backends.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new integration should be incentivized but marked "experimental" for now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned that ..


This method is not only dev-friendly, but is also compatible with the typical CI/CD deployments.
This allows you to take the application from dev to production in a controlled way.
The deployment procedure will be a bit different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully depoyed using `kubectl cloudflow` as above. However for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the cloudflow engine. For details please have a look at xref:develop:cloudflow-contrib-change-me.adoc[Cloudflow Contrib] documentation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The deployment procedure will be a bit different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully depoyed using `kubectl cloudflow` as above. However for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the cloudflow engine. For details please have a look at xref:develop:cloudflow-contrib-change-me.adoc[Cloudflow Contrib] documentation.
The deployment procedure will be a bit different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully deployed using `kubectl cloudflow` as above. However for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the Cloudflow engine. For details please have a look at xref:develop:cloudflow-contrib-change-me.adoc[Cloudflow Contrib] documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ..

@debasishg debasishg marked this pull request as ready for review May 11, 2021 05:23
@debasishg

Copy link
Copy Markdown
Contributor Author

@andreaTP , @RayRoestenburg Do we need to make any more changes for the version of Cloudflow where we offer both the options - the current operator based management for Spark and Flink and the new implementation through cloudflow-contrib ?

@andreaTP andreaTP left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few more suggestions

In the current version, Cloudflow includes backend `Streamlet` implementations for Akka, Apache Spark - Structured Streaming, and Apache Flink.
Using these implementations you can write business logic in the native API of the backend.
Additionally, Cloudflow can be extended with new streaming backends.
Along with the currently supported built-in integration of Flink and Spark streamlets via the Cloudflow operator, Cloudflow also supports external integrations for these streaming platforms through additional plugins. The current version introduces Flink and Spark integrations with more controls in the hands of the users. Users now have more control on deployment and management of Flink and Spark streamlets while still using the same Cloudflow streamlet API for developing their business logic. However, Akka will continue to be supported natively as in earlier versions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove:

The current version introduces Flink and Spark integrations with more controls in the hands of the users.

And start the following sentence with something like:

Using this new integration ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ..

Additionally, Cloudflow can be extended with new streaming backends.
Along with the currently supported built-in integration of Flink and Spark streamlets via the Cloudflow operator, Cloudflow also supports external integrations for these streaming platforms through additional plugins. The current version introduces Flink and Spark integrations with more controls in the hands of the users. Users now have more control on deployment and management of Flink and Spark streamlets while still using the same Cloudflow streamlet API for developing their business logic. However, Akka will continue to be supported natively as in earlier versions.
Integration support for Flink and Spark, thus being externalized, makes Cloudflow easily extensible with new streaming backends. The new externalized integration has been marked _Experimental_ in the current version.
For more details on externalized Flink and Spark integrations, please have a look at https://lightbend.github.io/cloudflow-contrib/docs/0.0.4/index.html[Cloudflow Contrib] documentation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use as a link:

https://lightbend.github.io/cloudflow-contrib/

the redirect is performed there

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ..


This method is not only dev-friendly, but is also compatible with the typical CI/CD deployments.
This allows you to take the application from dev to production in a controlled way.
The deployment procedure will be a bit different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully deployed using `kubectl cloudflow` as above. However for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the Cloudflow engine. For details please have a look at https://lightbend.github.io/cloudflow-contrib/docs/0.0.4/index.html[Cloudflow Contrib] documentation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The deployment procedure will be a bit different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully deployed using `kubectl cloudflow` as above. However for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the Cloudflow engine. For details please have a look at https://lightbend.github.io/cloudflow-contrib/docs/0.0.4/index.html[Cloudflow Contrib] documentation.
The deployment procedure will be different with the _cloudflow contrib_ approach where Flink and Spark applications are supported through external plugins. Akka applications will be fully deployed using `kubectl cloudflow` as above. However, for Spark and Flink applications, you need to use an extra plugin and carry out a few extra steps to make them known to the Cloudflow engine. For details please have a look at https://lightbend.github.io/cloudflow-contrib[Cloudflow Contrib] documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ..

- The Flink Job Manager then requests task manager resources from Kubernetes to deploy the distributed processing.
- Finally, if and when resources are available, the Flink-bound task managers start as Kubernetes pods. The task managers are the components tasked with the actual data processing, while the Job Manager serves as coordinator of the (stream) data process.

In case you are using the cloudflow-contrib model of integration, you need to go through some additional steps to complete the deployment of your Flink streamlets. This https://lightbend.github.io/cloudflow-contrib/docs/0.0.4/get-started/flink-native.html[section] on cloudflow-contrib has more details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought of adding this here since there is no mention of cloudflow-contrib in this section on flink streamlets.

In this architecture, the Spark driver runs the Cloudflow-specific logic that connects the streamlet to our managed data streams, at which point the streamlet starts consuming from inlets.
The streamlet advances through the data streams that are provided on inlets and writes data to outlets.

In case you are using the cloudflow-contrib model of integration, you need to go through some additional steps to complete the deployment of your Spark streamlets. This https://lightbend.github.io/cloudflow-contrib/docs/0.0.4/get-started/spark-native.html[section] on cloudflow-contrib has more details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought of adding this here since there is no mention of cloudflow-contrib in this section on spark streamlets.

@debasishg debasishg merged commit 3968724 into master May 24, 2021
@debasishg debasishg deleted the docs-with-contrib branch May 24, 2021 04:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants