Allow dynamic quota creation and removal#287
Allow dynamic quota creation and removal#287QuanMPhm wants to merge 1 commit intonerc-project:mainfrom
Conversation
| defaults={"value": json.dumps(new_quota_dict)}, | ||
| ) | ||
|
|
||
| # TODO (Quan): Dict update allows migration of existing quotas. This is fine? |
There was a problem hiding this comment.
@knikolla @jtriley This is a pre-existing feature, so I assume the answer is yes. Just to make sure.
There was a problem hiding this comment.
I don't think I fully understand this comment. Can you elaborate?
There was a problem hiding this comment.
We currently allow migrating the quota's cluster label (i.e "limits.cpu" for Openshift CPUs) by changing the hardcoded values in the QUOTA_KEY_MAPPING of the appropriate allocator. This migration feature is demonstrated in the functional test that I linked.
Below my TODO comment:
if not created:
available_quotas_dict = json.loads(available_quotas_attr.value)
available_quotas_dict.update(new_quota_dict)
QuotaSpecs.model_validate(available_quotas_dict) # Validate uniqueness
available_quotas_attr.value = json.dumps(available_quotas_dict)
available_quotas_attr.save()I wanted to show that this migration feature will still be available, because if you decide to add the same quota to the same resource, available_quotas_dict.update(new_quota_dict) means you can update/migrate everything about the quota, including its cluster label (with the exception of the display name, which you've mentioned and I responded here).
| "OpenStack Storage", | ||
| openstack_nese_storage_rate, | ||
| ) | ||
| # TODO (Quan): An illustration of how billing could be simplified. Shuold I follow with this? |
There was a problem hiding this comment.
@knikolla I couldn't do the same refactoring for the Openshift allocations because different storages have their own rates. I could have refactored the code further to circumvent that issue, but I didn't want the PR to be too long.
| }, | ||
| ) | ||
|
|
||
| # TODO (Quan): What happens when a quota is removed? Should the attribute be removed from Coldfront? |
There was a problem hiding this comment.
@knikolla @jtriley @joachimweyl This also has implications for billing storage. This test case is failing here since I would like people's consensus on desired behavior.
There was a problem hiding this comment.
My hunch is no, but I want to wait for @knikolla input
There was a problem hiding this comment.
For now just have the quota be removed from the Resource Attribute but untouched in the allocations.
b3c58d8 to
35273aa
Compare
|
@knikolla I addressed all your suggestions on Slack except one:
May I ask that I implement this feature in a subsequent PR, to prevent this PR from bloating even more? If not, I will implement this after I receive answers for my questions above. |
|
What is the impact of this omission? |
|
@joachimweyl The impact will be that to change the display names of attributes (the names that users will see in the Coldfront UI, i.e |
|
Makes sense to me. |
| def _get_network_quota(self, quotas, project_id): | ||
| network_quota = self.network.show_quota(project_id)["quota"] | ||
| for k in self.QUOTA_KEY_MAPPING["network"]["keys"].values(): | ||
| for cf_k in self.SERVICE_QUOTA_MAPPING["network"]: |
There was a problem hiding this comment.
You could have used the resource_type field of the QuotaSpec here. This will result in an error if not all quotaspecs are defined for OpenStack resources.
There was a problem hiding this comment.
My original use for the resource_type field was to identify quotas that are processed by the storage billing script. If you believe I should also have a field that identifies a resource's Openstack resource type, is it fine if I have two fields then? Like:
resource_type: str # Which Openstack service (i.e compute, object) does a quota belong to?
is_for_storage_billing: bool # Is the quota checked by the storage billing script?
There was a problem hiding this comment.
@QuanMPhm Wouldn't resource_type == "storage" be the same thing?
There was a problem hiding this comment.
As, I see what you mean. In OpenStack it would be volume, so there isn't a 1-1 mapping. I really don't want to introduce a new is_for_storage_billing parameter, so perhaps you could use quota label in a specific way for openstack. For example volume.volumes or compute.vcpu with the part before the . signifying the service and the latter the type.
There was a problem hiding this comment.
So that I know moving forward, what is your reasoning against a new is_for_storage_billing parameter when compared to the other option? It seems either case, there's some "new information" that the developer has to be aware about when maintaining the code (a new quota field, or how the Openstack quota label is parsed), which makes them seem equally burdensome to me.
There was a problem hiding this comment.
This is one of those situations where the burden becomes clear as the project evolves and new requirements are added.
If tomorrow we need to treat network quotas and some other type of quotas differently, it is easier to check resource_type == "network" then to add a new is_network_quota or is_gpu_quota. You'd need N attributes for N different resource types as opposed to one attribute with a flexible string.
The developer will have to be aware of this regardless.
src/coldfront_plugin_cloud/management/commands/add_quota_to_resource.py
Outdated
Show resolved
Hide resolved
| defaults={"value": json.dumps(new_quota_dict)}, | ||
| ) | ||
|
|
||
| # TODO (Quan): Dict update allows migration of existing quotas. This is fine? |
There was a problem hiding this comment.
I don't think I fully understand this comment. Can you elaborate?
There was a problem hiding this comment.
add_openstack_resource and add_openshift_resource now don't provide ALL the required QuotaSpecs with the EXACT same multiplier and static values as they are now, otherwise you are changing the behavior.
YOU NEED TO provide a separate python command or shell file that registers ALL the values as they are now.
Otherwise, as it is you haven't provided a smooth transition from the current system to the new Dynamic Quota system and an admin would need to type out a lot of error prone commands manually to make this upgrade work.
| class Command(BaseCommand): | ||
| def add_arguments(self, parser): | ||
| parser.add_argument( | ||
| "--display_name", |
There was a problem hiding this comment.
you have an underscore instead of a dash.
--display_name -> --display-name
| help="The default quota value for the storage attribute. In GB", | ||
| ) | ||
| parser.add_argument( | ||
| "--resource_name", |
There was a problem hiding this comment.
--resource_name -> --resource-name
| type=str, | ||
| default="", | ||
| help="Name of quota as it appears on invoice. Required if --is-storage-type is set.", | ||
| ) |
There was a problem hiding this comment.
how come you didn't specify dest= for some of these arguments?
There was a problem hiding this comment.
I normally wouldn't include dest=, and didn't review closely enough what Copilot generated this code for me. I've removed the dest=. Apologies
| def handle(self, *args, **options): | ||
| if options["resource_type"] == "storage" and not options["invoice_name"]: | ||
| logger.error( | ||
| "--invoice-name must be provided when storage type is `storage`." |
There was a problem hiding this comment.
"when resource type is storage."
There was a problem hiding this comment.
My idea is any quota that is relevant for storage billing should have the resource type storage, such as:
QUOTA_REQUESTS_IBM_STORAGE = "OpenShift Request on IBM Storage Quota (GiB)"
QUOTA_REQUESTS_NESE_STORAGE = "OpenShift Request on NESE Storage Quota (GiB)"
There was a problem hiding this comment.
sure, I am just pointing out that the error message says "storage type is storage" instead of "resource type is storage". You are checking options["resource_type"], there's no storage_type
| "--invoice-name", | ||
| type=str, | ||
| default="", | ||
| help="Name of quota as it appears on invoice. Required if --is-storage-type is set.", |
There was a problem hiding this comment.
where's --is-storage-type? Did you mean --resource-type is set to storage?
| else options["name"], | ||
| ) | ||
|
|
||
| # Add common Openshift resources (cpu, memory, etc) |
There was a problem hiding this comment.
remind how were these resources created before this?
There was a problem hiding this comment.
Currently, the information for these quotas are spread in multiple places in the repo. The display names are in attributes.py, the multiplier and static quantities are in tasks.py, other info in other places. The allocation attributes for these quotas were loaded by register_cloud_attributes.py, which consumes the attributes defined in attributes.py.
A by-product of this PR is that now all that info is created and stored in one place.
35273aa to
01021dd
Compare
|
|
||
| def add_arguments(self, parser): | ||
| parser.add_argument( | ||
| "--resource_name", |
There was a problem hiding this comment.
--resource_name -> --resource-name
| help="Name of the Resource to modify.", | ||
| ) | ||
| parser.add_argument( | ||
| "--display_name", |
There was a problem hiding this comment.
--display_name -> --display-name
01021dd to
0cf04ea
Compare
| # Add common Openshift resources (cpu, memory, etc) | ||
| call_command( | ||
| "add_quota_to_resource", | ||
| display_name=attributes.QUOTA_LIMITS_CPU, |
There was a problem hiding this comment.
Is it still necessary to keep these values hardcoded in the attributes module?
There was a problem hiding this comment.
I didn't remove the hardcoded values from attributes since a few files still need to reference the quota strings, namely in test cases, in openstack.py, and storage billing, and count_gpu_usage.py. I thought it didn't make sense to move the hardcoded strings to the test files or elsewhere, since that just felt like moving the problem somewhere else, and would lead to a lot of cleanup.
With how the test cases are, I can't see how the hardcoded strings can be entirely removed.
There was a problem hiding this comment.
I would like to see a follow-up PR at some point that moves all the quota attributes from attributes.py to a module within the tests to send a strong signal that the hardcoded attributes should only ever be used for the test cases.
There was a problem hiding this comment.
openstack.py and storage billing shouldn't need to use the hardcoded attributes. Why do they need to?
There was a problem hiding this comment.
Given all your suggestions so far, I'll remove the hardcoded attributes from openstack.py. For storage billing, I just need to ask one last thing:
To bill for storage, the storage script needs two things:
- Name of allocation attribute to check
- Name of the storage's su charge on
nerc-rates
First piece of info can be in resource_type as we discussed before. I would like your thumbs up on adding a second field for the nerc-rates key. Something like nerc_rates_key?
There was a problem hiding this comment.
- Name of allocation attribute to check = [ display name of quotas within a given allocation that as per the QuotaSpec of that Resource are of
resource_type == storage] - I don't like the idea of introducing a key that is specific to our instance of deployment of ColdFront by naming it
nerc_rates_key. I would like all NERC specific business logic to be restricted to the management command files whenever possible. It should be possible to fetch a Storage rate from the NERC rates file by having the name of the attribute conform to a specific form. Right now we only have rates for NESE and therefore whenever we need to introduce a new rate we can have it conform to a specific convention.
0cf04ea to
0c68d91
Compare
Closes nerc-project/operations#1391. This is how I would suggest to review this PR.
Two CLI commands have been added,
add_quota_to_resource.pyandremove_quota_from_resource.py. I would suggest understanding those two commands first. These commands allow us to dynamically add/remove quotas instead of having them hard-coded as they are currently done. These commands don't impact the quota objects in the clusters, nor the quota attributes in allocations. Their full impact is illustrated when used within the typical user workflow, or in tandem withvalidate_allocations.py. I would now suggest checking the changes tofunctional/openshift/test_allocations.pyto see the full implications of this PR. The other functional test cases only contain minor changes.Afterwards,
tasks.py,validate_allocations.py, and the allocator base and subclasses should be reviewed. They are the main consumers of quota information. All other changes relatively minor.This is a draft for now since I have some questions, and the tests are failing. I just wanted people to know my general direction with this feature.
I will wait for people's feedback before continuing work on this PR, since I assume substantial feedback will be given.