New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return 503 for aggregated APIs when the APIServiceRegistrationController hasn't finished installing all known APIServices #104748
Conversation
cmd/kube-apiserver/app/server.go
Outdated
@@ -22,6 +22,7 @@ package app | |||
import ( | |||
"crypto/tls" | |||
"fmt" | |||
"k8s.io/apiserver/pkg/util/notfoundhandler" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order
func (s emptyDelegate) UnprotectedHandler() http.Handler { | ||
return nil | ||
return s.handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remind me why we only need this for the unprotected handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the one we use at the end of the chain, here https://github.com/kubernetes/kubernetes/blob/master/cmd/kube-apiserver/app/server.go#L190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me understand, by spelling out the rest of the answer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API server is composed of three servers. The AggregatorServer
is first in the chain and delegates to the KubeAPIServer
which in turn delegates to the ExtentionAPIServer
.
Delegation happens when the server doesn't have a path registered to handle a request.
Only the first server in the chain is protected in a sense that it will apply all the registered filters, like AuthZ/AuthnN.
Unprotected in this case means give me a handler that won't apply the filters. In this specific case it make sense since the first server have already done the hard work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts : I am not sure I understand your question. A DelegationTarget has only one method that returns a Handler (this PR does not change this fact).
"k8s.io/apiserver/pkg/util/signal" | ||
) | ||
|
||
// New returns an HTTP handler that is meant to be executed at the end of the delegation chain and check if the server has been fully prepared before replying to the client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention that this is long before readiness and is meant to block request before the muxes and handlers are completely initialized such that requests are properly routed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that preparedToServe must not depend on other components in the cluster other than etcd
|
||
func (h *Handler) ServeHTTP(rw http.ResponseWriter, req *http.Request) { | ||
for _, preparedToServeSignal := range h.preparedToServeSignals { | ||
if !preparedToServeSignal.IsSignaled() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue early
@@ -113,10 +114,26 @@ type preparedAPIAggregator struct { | |||
runnable runnable | |||
} | |||
|
|||
// preparedToServeSignals holds signals/events that indicate this server has been fully initialized and is ready to serve requests | |||
type preparedToServeSignals struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a wrapper around another type? Just use signal.Interface
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to expose only the read-only interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current text exposes the full interface.
It looks to me like the reason for a struct here is to prepare for more signals in the future. For example, one about RBAC being initialized.
I will remove the signal pkg and use raw types instead. I will also add a signal from the extension API server. |
case <-e.ch: | ||
// already closed, don't close again. | ||
default: | ||
close(e.ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if Signal
is used concurrently, i think we need to use a sync.Once
to protect it.
} | ||
} | ||
|
||
type namedChannelWrapper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are converting this to a shared utility, i would recommend unit tests for it.
/triage accepted |
/cc @MikeSpreitzer |
func (s emptyDelegate) UnprotectedHandler() http.Handler { | ||
return nil | ||
return s.handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me understand, by spelling out the rest of the answer?
@@ -271,14 +271,19 @@ func (s *GenericAPIServer) NextDelegate() DelegationTarget { | |||
} | |||
|
|||
type emptyDelegate struct { | |||
handler http.Handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did "empty delegate" mean, exactly? Is it still empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty delegate is an HTTP handler that doesn't have any paths registered and won't pass the requests through to the next delegate.
It will still be empty as it doesn't registers any paths nor the next delegate. It allows us to register a special NotFound handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a go comment describing what handler is and when it is called
@@ -113,10 +114,26 @@ type preparedAPIAggregator struct { | |||
runnable runnable | |||
} | |||
|
|||
// preparedToServeSignals holds signals/events that indicate this server has been fully initialized and is ready to serve requests | |||
type preparedToServeSignals struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the current text exposes the full interface.
It looks to me like the reason for a struct here is to prepare for more signals in the future. For example, one about RBAC being initialized.
… as a MuxComplete signal apiServiceRegistrationControllerInitiated is closed when APIServiceRegistrationController has finished "installing" all known APIServices. At this point we know that the proxy handler knows about APIServices and can handle client requests. Before it might have resulted in a 404 response which could have serious consequences for some controllers like GC and NS
d3fdf02
to
c54463d
Compare
/retest |
7cea69a
to
2e2d22a
Compare
} | ||
} | ||
s.lifecycleSignals.MuxAndDiscoveryComplete.Signal() | ||
klog.V(1).Infof("%s completed, all registered MuxAndDiscoveryComplete signals (%d) have finished", s.lifecycleSignals.MuxAndDiscoveryComplete.Name(), s.GenericAPIServer.MuxAndDiscoveryCompleteSignals()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be less technical. Better: %s has all endpoints registered and discovery information is complete
// | ||
// Note that we don't want to add additional checks to the readyz path as it might prevent fixing bricked clusters. | ||
// This specific handler is meant to "protect" requests that arrive before the paths and handlers are fully initialized. | ||
func New(serializer runtime.NegotiatedSerializer, hasMuxAndDiscoveryIncompleteKeyFn func(ctx context.Context) bool) *Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the negation: isMuxAndDiscoveryComplete
2e2d22a
to
9e2bdfe
Compare
Lgtm and approved. Will give @MikeSpreitzer @caesarxuchao and others a chance to take a final look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about https://docs.google.com/document/d/1JQNT0eu6Gpn9UZ4rcfIUDkJf1wKc3f_57KKC4c-_s-U ? My recollection of the SIG api-machinery discussion is that a client opt-in was favored, out of concern that a non-optional solution would ruining some startup sequences that involve multiple processes.
type emptyDelegate struct { | ||
// handler is called at the end of the delegation chain | ||
// when a request has been made against an unregistered HTTP path the individual servers will simply pass it through until it reaches the handler. | ||
handler http.Handler | ||
} | ||
|
||
func NewEmptyDelegate() DelegationTarget { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
projects using the generic api server libary might still use this method.
|
||
func (h *Handler) ServeHTTP(rw http.ResponseWriter, req *http.Request) { | ||
if !h.isMuxAndDiscoveryCompleteFn(req.Context()) { | ||
errMsg := fmt.Sprintf("the request has been made before all known HTTP paths have been installed, please try again") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is fmt.Sprintf
doing for us here?
// Before it might have resulted in a 404 response which could have serious consequences for some controllers like GC and NS | ||
// | ||
// Note that the APIServiceRegistrationController waits for APIServiceInformer to synced before doing its work. | ||
apiServiceRegistrationControllerInitiated := make(chan struct{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both this channel and the handlerSyncedCh
channel in the apiservice-registration-controller post start hook? Could we use just one channel for both purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, we can have just one channel.
// | ||
// The presence of the key is checked in the NotFoundHandler (staging/src/k8s.io/apiserver/pkg/util/notfoundhandler/not_found_handler.go) | ||
// | ||
// The race may happen when a request reaches the NotFoundHandler because not all paths have been registered in the mux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to "the race" here is appropriate if the reader already knows there is a race involved.
However, I think the reader does not know that.
"A race" would be more appropriate for introducing the fact that there is a race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the gdoc.
@MikeSpreitzer in that discussion we didn't know that we had already a solution in place for CRDs, i.e. something that has become part of the API semantics expected by clients and that we cannot easily change. Whether we want anything beyond this, we can discuss. |
@sttts @MikeSpreitzer many thanks for reviewing the PR. I applied the last batch of comments. I think we are ready to tag. |
/test pull-kubernetes-unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks basically OK to me.
I have just a few minor comments.
func (s emptyDelegate) UnprotectedHandler() http.Handler { | ||
return nil | ||
return s.handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts : I am not sure I understand your question. A DelegationTarget has only one method that returns a Handler (this PR does not change this fact).
staging/src/k8s.io/apiserver/pkg/server/genericapiserver_graceful_termination_test.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/server/genericapiserver_test.go
Outdated
Show resolved
Hide resolved
728a40d
to
b6c2c4b
Compare
…ectly to apiserviceRegistration controller
b6c2c4b
to
5116a50
Compare
It looks like there is still room for improvement. It looks like we should protect requests to the discovery endpoint This could be fixed as proposed in the KEP by rejecting requests with a 503 response when the server hasn't become ready ( |
or we could open a new PR that would simply reject (503) requests made by the GC controller to the discovery API when it hasn't observed all GVK |
#104748 (comment) will follow for the remaining bit. For now this clearly improves the situation. /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: p0lyn0mial, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Ensures that requests to aggregated APIs will get 503, not 404 while the APIServiceRegistrationController hasn't finished installing all known APIServices. It is especially important for controllers like GC and NS since they act on 404
Note that this is an already existing behavior (returning 503 when the CRD informer hasn't been sync).
We already have it for the apiextentionserver, see #105653 for more details.
Which issue(s) this PR fixes:
Fixes #104342
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: