Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce ADR for CodeFlare operator redesign #9

Merged
merged 4 commits into from
Sep 7, 2023
Merged

Introduce ADR for CodeFlare operator redesign #9

merged 4 commits into from
Sep 7, 2023

Conversation

astefanutti
Copy link
Contributor

No description provided.

Copy link
Contributor

@KPostOffice KPostOffice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ADR. It looks good. Will there be an option in the configuration to have Instascale disabled?

PCF-ADR-0007-operator-redesign.md Outdated Show resolved Hide resolved
@astefanutti
Copy link
Contributor Author

Thanks for the ADR. It looks good. Will there be an option in the configuration to have Instascale disabled?

Good catch, thanks! I completely forgot I wanted to detail that. Added with 4fea0e6. PTAL.

@sutaakar
Copy link
Contributor

sutaakar commented Sep 5, 2023

/lgtm

Copy link

@dimakis dimakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In it's current guise and for purpose this LGTM.

I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?

I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.

It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.

I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

@asm582
Copy link
Member

asm582 commented Sep 6, 2023

/lgtm

@asm582
Copy link
Member

asm582 commented Sep 6, 2023

/approve

@astefanutti
Copy link
Contributor Author

In it's current guise and for purpose this LGTM.

I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?

I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.

It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.

I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

To answer these questions, strictly restricted to the scope of that ADR, the goal that aims at delegating the installation concern, to the underlying platform, such as OpenDataHub, will make it so modularity and polymorphism (the ability to swap a component for another) details that intersect installation, will have to be implemented at the level of that platform. While this ADR proposal won't preclude some form of modularity and polymorphism, it'll lower the threshold of the module boundaries, beyond which it'll have to touch the installation, hence be implemented by the platform, rather than the CodeFlare operator. Concretely, it'll still be possible to modularise controllers like InstaScale, quota manager, or finer-grained modules, like MCAD, backoff strategies, but swapping entire components like Kueue instead of MCAD, or Kubernetes cluster autoscaler instead of InstaScale, will likely be best achieved at the platform level.

To answer these questions beyond the scope of that ADR, I think that raises the fundamental question of what is the value-add of the CodeFlare stack. Modularity and polymorphism are technicalities that software engineers are prompt to introduce, while end-users are generally more interested to get their job done, out of the box, instead of figuring out what job scheduler or autoscaler should they use, or even be aware of. From the later end-user standpoint, I'd expect CodeFlare to provide an opinionated stack of best-in-class components, to enable users to get their job done as easily as possible, out of the box. So if the only value behind supporting multiple components, and the ability to swap them, is to mitigate gaps in existing components, it may be a better option to make these components best-in-class, or pick the ones that are already.

@dimakis
Copy link

dimakis commented Sep 7, 2023

Thanks very much for your detailed reply @astefanutti

In it's current guise and for purpose this LGTM.
I do have a question though, how much complexity do you think would be involved in abstracting out the queuing/ quota/ scaling components and having a more modular design?
I think this would make the versioning of the subcomponents less hassle, but it may also mean that instascale could potentially be swapped out for the kube autoscaler -- with any cloud provider flavour if the stack were to be deployed in a vanilla kube env.
It may also mean that queuing etc. could potentially be swapped out for a component of the users choice.
I feel like this would be extremely expensive, from an engineering perspective, but seeing as @asm582 has plans on making mcad more modular and introducing features such as the renewal energy feature it may be an idea to have this abstraction at the CFO as opposed to MCAD?

To answer these questions, strictly restricted to the scope of that ADR, the goal that aims at delegating the installation concern, to the underlying platform, such as OpenDataHub, will make it so modularity and polymorphism (the ability to swap a component for another) details that intersect installation, will have to be implemented at the level of that platform. While this ADR proposal won't preclude some form of modularity and polymorphism, it'll lower the threshold of the module boundaries, beyond which it'll have to touch the installation, hence be implemented by the platform, rather than the CodeFlare operator. Concretely, it'll still be possible to modularise controllers like InstaScale, quota manager, or finer-grained modules, like MCAD, backoff strategies, but swapping entire components like Kueue instead of MCAD, or Kubernetes cluster autoscaler instead of InstaScale, will likely be best achieved at the platform level.

This is likely the right call.

To answer these questions beyond the scope of that ADR, I think that raises the fundamental question of what is the value-add of the CodeFlare stack. Modularity and polymorphism are technicalities that software engineers are prompt to introduce, while end-users are generally more interested to get their job done, out of the box, instead of figuring out what job scheduler or autoscaler should they use, or even be aware of. From the later end-user standpoint, I'd expect CodeFlare to provide an opinionated stack of best-in-class components, to enable users to get their job done as easily as possible, out of the box. So if the only value behind supporting multiple components, and the ability to swap them, is to mitigate gaps in existing components, it may be a better option to make these components best-in-class, or pick the ones that are already.

My thoughts were not necessarily with the end user in mind, as to my mind the installation of the stack and components comes from a devops/ platform team or similar and not the end user. So offering them the most customisable way may help in gaining adoption as we could target a larger number of different environments. That at least was my trail of thought here. None the less, I see that this may be better done elsewhere. I just wanted to explore all options and gain your insight.

I'm happy with the direction this takes the CodeFlare stack.

@dimakis
Copy link

dimakis commented Sep 7, 2023

/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants