-
Notifications
You must be signed in to change notification settings - Fork 6.7k
fix(perf): use resourceVersion=0 for faster argocd-server informer initialization #25799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
🔴 Preview Environment stopped on BunnyshellSee: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
c1ca755 to
bd1f151
Compare
bd1f151 to
2655946
Compare
…itialization Set resourceVersion=0 in informer ListOptions to use API server's watch cache instead of etcd for initial list operations. This speeds up informer initialization when there are large numbers of Application resources. Signed-off-by: jmc-9304 <fdop104@gmail.com>
c9d0a5b to
2e7f0e4
Compare
| // resourceVersion=0 means use API server's watch cache (faster but may be slightly stale) | ||
| // This speeds up initial informer sync, especially with large numbers of Applications | ||
| if options.ResourceVersion == "" { | ||
| options.ResourceVersion = "0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the API server respect pagination when setting resourceVersion=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. 👍
I don't see any code in argocd-server that uses pagination List from the API-server.
What is the reason we need to consider pagination List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add, regarding KEP-4988 (Snapshottable API Server Cache), based on the implementation history, it seems that KEP-4988 is designed to create snapshots for list operations at specific resourceVersions (not for resourceVersion=0 which represents the latest available version).
Since our current implementation uses resourceVersion=0 for faster initial informer sync and doesn't use pagination. Additionally, unlike the controller, argocd-server is a backend server for the web UI, so focusing on the latest state to improve web response speed is more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same concerns that are outlined here argoproj/gitops-engine#617 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the feedback! :)
After reviewing the attached link, I understand that using resourceVersion=0 without pagination could potentially cause API server memory concerns, especially if a large number of Applications need to be loaded in a single list operation.
However, I'd like to point out an important difference between argocd-application-controller and argocd-server:
Controller behavior: Theargocd-application-controllerperforms periodic resync operations (configured via--app-resyncflag, default 120 seconds) that trigger list operations at regular intervals. This means the API server would receive list requests repeatedly over time.Server behavior: The argocd-server informers are created withresyncPeriod=0(seeserver/server.go:321-322), which means they only perform the initial list operation during startup. After that, they rely entirely on watch events for updates. There are no periodic resync operations inargocd-server.
(Sorry, I misunderstood and thought that the list operation was performed during the resync period.)
Therefore, while the initial list operation might be larger without pagination, it only happens once during server startup, not repeatedly like in the controller. This should significantly reduce the overall impact compared to the controller's behavior.
Additionally, as shown below, a rough estimation suggests that performing an Applications LIST operation on the API server without pagination may not result in significant memory pressure.
⚠️ To estimate the API server LIST memory overhead, let us assume there are 10,000 Application resources, with an average Application size of 10KB, and calculate the expected memory impact.Average-based memory estimation
- Measured avg Application size: 10 KB (in my case, avg: 9.870 KB)
- 10,000 Applications total payload:
- 10 × 10,000 ≈ 100,000,000 bytes ≈ 100 MiB
- Empirically, transient memory peak ≈ 2–5× payload, resulting in
~200 MiB – 500 MiB additional memory
Therefore, the memory overhead of the LIST operation does not appear to be significant, and since it is performed only once during initialization rather than repeatedly, it seems reasonable for argocd-server to use resourceVersion=0.
That said, if the reviewer still has concerns about potential API server OOM risks associated with a non-paginated LIST operation, I am also fine with closing this PR. 🙂
Checklist:
Problem Statement
When
argocd-serverstarts up, it initializes informers forApplicationandAppProjectresources to maintain an in-memory cache. During this initialization, the informer performs a list operation to populate its cache. In environments with a large number of Application resources (e.g., 3000+ applications), this initial list operation can be slow because it reads directly from etcd, taking approximately 15 seconds to complete. This delay impacts the server's startup time and readiness.Solution
This PR optimizes the informer initialization by setting
resourceVersion=0in the ListOptions for bothApplicationandAppProjectinformers. WhenresourceVersion=0is specified, Kubernetes API server returns data from its watch cache instead of reading directly from etcd. This approach:Performance Impact
Based on testing in an environment with approximately 3000 Application resources:
Technical Details
The change adds a
tweakListOptionsfunction that setsresourceVersion=0when the ResourceVersion field is empty. This function is applied to both:AppProjectinformer factoryApplicationinformer factoryThis optimization is safe because:
Testing