New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight #104942
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SataQiu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It makes sense.
/cc neolit123
/assign fabriziopandini |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SataQiu thanks for this PR!
Q: as per kubernetes/kubeadm#2564 (comment), should we also make this checks consistent between init and join?
cmd/kubeadm/app/preflight/checks.go
Outdated
FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeAPIServer, manifestsDir)}, | ||
FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeControllerManager, manifestsDir)}, | ||
FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.KubeScheduler, manifestsDir)}, | ||
FileAvailableCheck{Path: kubeadmconstants.GetStaticPodFilepath(kubeadmconstants.Etcd, manifestsDir)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these checks are control plane checks so it makes more sense to have them under cfg.ControlPlane != nil
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but something else to note here...looking at phases/join/preflight.go RunInitNodeChecks is already called for control plane nodes:
kubernetes/cmd/kubeadm/app/cmd/phases/join/preflight.go
Lines 104 to 123 in 718f1b6
if j.Cfg().ControlPlane != nil { | |
// Checks if the cluster configuration supports | |
// joining a new control plane instance and if all the necessary certificates are provided | |
hasCertificateKey := len(j.CertificateKey()) > 0 | |
if err := checkIfReadyForAdditionalControlPlane(&initCfg.ClusterConfiguration, hasCertificateKey); err != nil { | |
// outputs the not ready for hosting a new control plane instance message | |
ctx := map[string]string{ | |
"Error": err.Error(), | |
} | |
var msg bytes.Buffer | |
notReadyToJoinControlPlaneTemp.Execute(&msg, ctx) | |
return errors.New(msg.String()) | |
} | |
// run kubeadm init preflight checks for checking all the prerequisites | |
fmt.Println("[preflight] Running pre-flight checks before initializing the new control plane instance") | |
if err := preflight.RunInitNodeChecks(utilsexec.New(), initCfg, j.IgnorePreflightErrors(), true, hasCertificateKey); err != nil { | |
return err |
i.e. secondary control-plane nodes already should have the same checks as the first control plane node (mostly).
one difference is the isSecondaryControlPlane
flag which is true on join.
so i think the correct patch is to just remove this check:
DirAvailableCheck{Path: filepath.Join(kubeadmconstants.KubernetesDir, kubeadmconstants.ManifestsSubDirName)},
on joining worker nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @neolit123 ! I agree with your summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neolit123 @dlipovetsky Thanks for your review.
But is there such a scenario:
The user first joins a node as a secondary control-plane node successfully.
kubeadm join xxx --token xxx
--discovery-token-ca-cert-hash xxx
--control-plane --certificate-key xxx
The user then discovers that he intended to join a normal node instead of a control-plane node.
So he deletes the node from the cluster and runs the join command again (forgot to run the reset command):
kubectl delete node xxx
kubeadm join xxx --token xxx --discovery-token-ca-cert-hash xxx
Not surprisingly, this command will output errors similar to the following (without DirAvailableCheck
):
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
Based on these error messages, the user manually cleaned the relevant files:
rm -rf /etc/kubernetes/kubelet.conf
rm -rf /etc/kubernetes/pki/ca.crt
systemctl stop kubelet
Then the user can run the join command again, and it will execute successfully.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
At this point, however, the Pod that should be running on the control plane node is running on the normal node by mistake. This is not the desired state of a healthy cluster.
# kubectl get node
NAME STATUS ROLES AGE VERSION
izhp3acgtbq11k6ldsw593z NotReady control-plane,master 32m v1.22.1
izhp3acgtbq11k6ldsw594z NotReady <none> 114s v1.22.1
# kubectl get pod -A
kube-system coredns-7f6cbbb7b8-2l5j9 0/1 Pending 0 32m
kube-system coredns-7f6cbbb7b8-sglnx 0/1 Pending 0 32m
kube-system etcd-izhp3acgtbq11k6ldsw593z 1/1 Running 2 33m
kube-system etcd-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m9s
kube-system kube-apiserver-izhp3acgtbq11k6ldsw593z 1/1 Running 2 33m
kube-system kube-apiserver-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s
kube-system kube-controller-manager-izhp3acgtbq11k6ldsw593z 1/1 Running 2 (3m18s ago) 33m
kube-system kube-controller-manager-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s
kube-system kube-proxy-wxxpw 1/1 Running 1 29m
kube-system kube-proxy-zc2zj 1/1 Running 0 32m
kube-system kube-scheduler-izhp3acgtbq11k6ldsw593z 1/1 Running 4 (3m18s ago) 33m
kube-system kube-scheduler-izhp3acgtbq11k6ldsw594z 1/1 Running 1 2m8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not calling kubeadm reset after the initial control plane node join would mean that the control plane containers would continue running even after the kubectl delete of the node. Reset is sort of mandatory in all cases and If we are still seeing users that do not know about it perhaps it's time to start documenting it on the CLI output?
I think I agree with your assesment that worker join should errors on the presence of kubeadm managed manifests. One potential weird use case that this would break is if someone is joining an etcd node to the cluster - i.e. a worker with an etcd static pod.
Frankly, i would have designed this slightly differently:
- error out on the cp manifests only on cp nodes
- depending on the kubeadm command init / join / join cp, print a list of what additional files are in the manifest directory with hopes that the user can adjust if something unwanted ended up there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: If we remove DirAvailableCheck would users that were skipping it previously now get an error that there is no such preflight check?
Trying to refresh my memory if we had that.
EDIT: ok looks like kubeadm init --ignore-preflight-errors=foo
does not error out that there is no such check called foo
. that is good in a way since every time we delete a check the user will not get broken, but at the same time it's not ideal because if the user has a typo in the preflight check they will not get an indication for that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with your assesment that worker join should errors on the presence of kubeadm managed manifests.
+1
One potential weird use case that this would break is if someone is joining an etcd node to the cluster - i.e. a worker with an etcd static pod.
In hindsight, kubeadm could have avoided conflict with user-defined manifests by using a prefix, e.g. kubeadm-etcd.yaml
. But even in the absence of such a prefix, I think it's reasonable for kubeadm to treat the manifests, including etcd.yaml
as "reserved."
I can follow up with a docs PR to make it clear that kubeadm-managed manifests are "reserved."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlipovetsky are you happy with the current state of the diff?
it just removes the directory check, and one can still start a pod called etcd.yaml on a worker without any error / warning.
from your comment above i think i'm reading that join of workers should complain about reserved control-plane pod names as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you happy with the current state of the diff?
Yes.
from your comment above i think i'm reading that join of workers should complain about reserved control-plane pod names as well.
I was thinking out loud. I do think it's a good idea for kubeadm to treat filenames it writes as "reserved." That does more than address this bug, though, and can be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, understood.
we should change the release note depending on #104942 (comment) e.g.
|
Yes, the checks in the kubernetes/cmd/kubeadm/app/preflight/checks.go Lines 902 to 930 in 047a6b9
In the |
…y on joining worker nodes during preflight Signed-off-by: SataQiu <shidaqiu2018@gmail.com>
271fbb6
to
d57e442
Compare
Thanks @SataQiu for thinking about the details and fixing the issue! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/triage accepted
What type of PR is this?
/kind feature
/kind bug
What this PR does / why we need it:
kubeadm: do not check if the /etc/kubernetes/manifests folder is empty on joining worker nodes during preflight
Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#2564
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: