Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CredentialProvider fails to read environment variable passed into the CredentialProviderConfig file #102750

Closed
jicowan opened this issue Jun 9, 2021 · 48 comments · Fixed by #103231
Assignees
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@jicowan
Copy link

jicowan commented Jun 9, 2021

What happened:

I am attempting to pull an image from ECR using the new kubelet image credential provider feature which is an alpha features in Kubernetes v1.20. My CredentialProviderConfig file includes the environment variable AWS_PROFILE which resides in the default (~/.aws) directory. The profile references a credential_process which allows me to execute a binary that returns a set of temporary AWS credentials. When I try running a pod with a container from ECR, I see the following errors in the kubelet logs:

Jun 09 21:47:05 ecr-kubelet-worker kubelet[225990]: E0609 21:47:05.229849  225990 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 09 21:47:05 ecr-kubelet-worker kubelet[225990]: E0609 21:47:05.597552  225990 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized

However, when I type the following into the command line on the kubelet:

echo '{"kind": "CredentialProviderRequest", "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1", "image": "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2"}' | AWS_PROFILE=default /usr/local/kubelet/bin/ecr-credential-provider get-credentials

I get the following response:

{"kind":"CredentialProviderResponse","apiVersion":"credentialprovider.kubelet.k8s.io/v1alpha1","cacheKeyType":"Registry","cacheDuration":"21.6µs","auth":{"820537372947.dkr.ecr.us-east-1.amazonaws.com":{"username":"AWS","password":"eyJwYXlsb2FkIjoiak1RV0pidXpzVnRDd28wNHExdmlScmlwaVRLd3BOc25hejFlUktxVDlBOWZES0FVTjlsbkozSGl0TGh4OHhnejViU0xrRncxNGtaU3htd2R6Rks3MHNCd2FRL0M3eWdtSlVueGNrZGI0eWFjZUhIQ1NSbVZyMjNHam9VOE1yTjJadDdNUnprK0c0RzVWdmYrMVhjODdKdHhFNEZOTWU1dytoSjFRZXYydTlaNXBCVExhTHdaRDNxT0t4dk1aTi9iK0E5Vys2ajI0L1kxUGxER2wybnpoSWxDdFV3aXBxTDhaTS9MdzFwT1RmYmY1eXp6OFIxVDd0MUZvbER3SWVKNUNzQkZBQWZJanNITmpES2tTL3dYdXVlMDBOZGZNeVo2UHlPMjlVanBLdGdqMU8vZTBwZ0J4SzRGelpRVldROHE5Wk1paWtFL3BxRXMvZzN0WXNJcDhTalR6RkZqTys4UVB4WVdFZ2ZrVDFRcjZnbmtrOXZLRkdOSGd6SEV3Z0JMNUJQeDFReXpTSGJyNmlPMDd2Sjh1QzE3eHpITEZwcmpuR3o3T3AyQ1U3SFYydjVwSjdlNFM1d0Qrenljbk5NZFVabjBsQ2dzdlBpUGtuYkV5Rk1lZ050Y3dmM3JKdnNkSm5FZ3IrVGowTTlsdktTWHJBL3hGVW9vZXFQRFhQY2t4SlN6eXM2dkYvZGdXbWJZcXJPTW1jdnBaZ3VpUVo3aTF0NnZncEVIcUQ1RzIzbyt2UnlzQ1FjOElTNVRHVk1peDhGdXp1UjkyT0djbnhIVm85U0gzdWhVL096clZKdWJQQTdFcGJHTjJCM2lFUTFsaHBGeXdvdGtyNit2dFhvU01GSFBFeXFKQ2syL0VWTmphWi9Vb05ZS3JrL3hZYmJPS29UQmlpNUFrTkZDcE9IU1BqM1JVTHpQcXhJVCtPM2tkZ3ZwZ0RMSTI1c3AvYVVUV0Fpd21KYm5BZTFTR1RrdTcyOUk1cVY0bkxVdWtzK2Y0OEJqUlNXUlI0YWlqZnIxbFpLczBLRnlFTk16WkJvbGdBYWZMV01NNnEwaXlMd1Nzc1pJK2xPYmgxU3JUa3IwRFpYRDR4N05QQlN4RTByU1FmSDdzR1M3Y3VLbG85cUUrVDBMSGhxbE0velU1eDU2aTN5MkhZYnV5RDRlSHVvd1F5dVh6UTFRSXRhZDF3Q0U1blJCWjlOU1o1VDE1R1ZheklDcVFCbmFIaFl4WnZqemZuVm1vU1JrL0d1LzFESFU1c2c5U3Z0SXBkL0hXeFkxNXBTQk9nUHNaK2c5WTdOSXAyYXRHNEg0Qmh0eU40T215bm5VaGFOQjZPZU0xV3l3VFY1dXh5cHhINXk3STJwYkE4MDVMR1U4cGNzMGNvTU5Kbmdpam5aRGhzVDRhMisrS3F5d0VPaVpkZ29JZ0ZPdnFwWnIycEVuZlo2MUlEN2pjbjgwR0tMUExvalEzVGV4bjhtVFNINjhzS1l5ZHlyZGJ0MDg0bXZUbjEvNE1PeW9yOUVYeU9qMzY4UzZ2Sno1YmQ4dSsxc1Fva2lBb3FkNkIyWndLYm1QSWx4T0h6bVl1a3N6UGxnWDA1NFNQVUNNSGJJTjJacncxTmJIb3V1UXU1WFVjeGZPR2diK0g1bnBIL1VvTzJFdmR2T29TOWNlczBjMkNDU2hKbHY4MVE4ZUhGRExGRS9tajJaVkF6aFN1RXhsbTcvazlHaVJ0bW9JS3VISFBzRUhDVTRsQTBWVk1HNG44SFpEY1A3bWNrTTRCTS9ibmZTVXNabUc0ZjFEK29WdmpGQS9oWTQ3R0JrQmpVNGJ3OFRoeXprOXFxL2MxbE9tWDc4QllwUDRuMmdKbGd5M0wzcnFTMFBMc2xkNHZzZUFFSWRML29XdjlqZ2FWYjM4bHJuaEdHeTFXc09hOVJDWFMvWTRrc25vRGtnaHVmd1hGcHQveWE0d1VMcUtRL1d3bTZyZ2R0TEdGb3dPbERPOHY1cnZZWDY4WVNOSzFrcEJTcXZoak5haG5ZQU1pNEFrOXVGVFVqaDRDQmFLVTg4OFVYdUFBNERNbktMWEk0a29HK3pGK2ZVYVljOW52Vmo2dDZVdzFaQjFKTm5vNG00b1ZIUlo4U1E5c2g0Z2U3bFd4b2FiZVZxWDE0N0k4T0orZDV1L2g1MHdmWkRQT3AzZXViKzJzR28wZEl3SE5jM1lxREpBd0szSmltdFozakVJYldodmJzWUZUZzlFMkk5OUJqQ2ZnSWx6c092eXlxMXV2RW9DZ0RMN3ZySWI4amFaQWtsSDJWelQ5RUNzU3REc2J2K3ZuaGg4K2VaK2hnRzVOQXV2ZTRWL2NMVjZOL01KbzVadU1vMEg0THZyTnVRVDNXVlIzZjBqYnZlVmMrT2tyZWQxdEFOczdCUGJFdUlCZE5VaURMSmNpWkFJb3J6aGkvaDZxU3EzUE9Jc1pSencvczVBOCt0NERNQlptYXRsbDFZekJYV2EvN3pxbGVpZWlod01XU2ZqWUpDOTRVa1l1c3hkbjZCclJWVzhKNDdscHNqUVlNbG96V01raWRPalpIWUJqc0hVRXpVSytFdTdyQjQ2eDF6U2RkYSs5a3hZY0tXS1UxOUtKWTVQUTVUd1dVRXZKbXJna3dwZjdjZWRhTzFtWUwvZEpMK2ZMOFA4M09jMDJxSXJ0MzFRQUZRNGRmOFNkcmxCQVp4MExCQVdtTU0vd0RPTTJMNXlFckI0cVNTRXEvbHo4Rnc5RWFFTll3UW93N2hDQzFIRXZJSzZUWTQ1YXBZZU55clZ6UzFUUTJjSkVRWi9ZUGFlUVNFZVE1UEVnRkxpQ3RmdEhnPT0iLCJkYXRha2V5IjoiQVFFQkFIaHdtMFlhSVNKZVJ0Sm01bjFHNnVxZWVrWHVvWFhQZTVVRmNlOVJxOC8xNHdBQUFINHdmQVlKS29aSWh2Y05BUWNHb0c4d2JRSUJBREJvQmdrcWhraUc5dzBCQndFd0hnWUpZSVpJQVdVREJBRXVNQkVFRExzbWJPRldpeXcxbkhEVEVnSUJFSUE3RFVyUmcwd1ZWRlB4OFpGbkdQdkNGTVUwM21zQjZNTVZESUg5TnJLejN3Uy84L0NFZFQydy9RbEo4S1piaG1YNmlwdTl6RmNZdEQvOGFhRT0iLCJ2ZXJzaW9uIjoiMiIsInR5cGUiOiJEQVRBX0tFWSIsImV4cGlyYXRpb24iOjE2MjMzMTg2MTl9"}}}

It doesn't appear the environment variables in my Config are being passed to the Kubelet when it executes the ecr-credential-provider binary.

Below is Config file:

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
     - get-credentials
    env:
    - name: "AWS_PROFILE"
      value: "default"

What you expected to happen:

I expect for the Kubelet to pass the environment variables in the Config file when it executes the ecr-credential-provider binary.

How to reproduce it (as minimally and precisely as possible):

I am using KinD to test this feature. Here is my config:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
runtimeConfig:
  "api/alpha": "true"
name: ecr-kubelet
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: ClusterConfiguration
    apiServer: 
      extraArgs:
        feature-gates: 'KubeletCredentialProviders=true'
  image: kindest/node:v1.20.7@sha256:e645428988191fc824529fd0bb5c94244c12401cf5f5ea3bd875eb0a787f0fe9
- role: worker
  image: kindest/node:v1.20.7@sha256:e645428988191fc824529fd0bb5c94244c12401cf5f5ea3bd875eb0a787f0fe9
  extraMounts:
  - hostPath: /Users/jicowan/go/bin
    containerPath: /usr/local/kubelet/bin
  - hostPath: /Users/jicowan/conf
    containerPath: /usr/local/kubelet/config
  kubeadmConfigPatches:
  - |
    kind: JoinConfiguration
    nodeRegistration: 
      kubeletExtraArgs: 
        feature-gates: 'KubeletCredentialProviders=true'
        image-credential-provider-config: '/usr/local/kubelet/config/ecr.conf'
        image-credential-provider-bin-dir: '/usr/local/kubelet/bin/'

Anything else we need to know?:

No

Environment:

  • Kubernetes version (use kubectl version): 1.20
  • Cloud provider or hardware configuration: None
  • OS (e.g: cat /etc/os-release): Ubuntu 20.10
  • Kernel (e.g. uname -a): 5.10
  • Install tools: KinD
  • Network plugin and version (if this is a network-related bug):
  • Others:

cc @ayberk @andrewsykim

@jicowan jicowan added the kind/bug Categorizes issue or PR as related to a bug. label Jun 9, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 9, 2021
@jicowan
Copy link
Author

jicowan commented Jun 9, 2021

/sig Node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 9, 2021
@n4j
Copy link
Member

n4j commented Jun 11, 2021

/assign

@adisky
Copy link
Contributor

adisky commented Jun 11, 2021

@jicowan from the code it appears that env variables are passed to the plugin

cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", envVar.Name, envVar.Value))

from kubelet logs the plugin is only returning exit status 1, without any details about the error!!

@jicowan
Copy link
Author

jicowan commented Jun 11, 2021

Thanks @adisky. Yes, I saw that. cmd.Env doesn't work. I wrote the following code to test this:

package main

import (
	"fmt"
	"io"
	"log"
	"os"
	"os/exec"
)
var (
	body = `{
	"kind": "CredentialProviderRequest", 
	"apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1", 
	"image": "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2"
	}`
)
func main() {
	os.Setenv("AWS_PROFILE", "developer")
	cmd := exec.Command("ecr-credential-provider", "get-credentials")
	cmd.Args = []string{"get-credentials"}
	//cmd.Env = os.Environ()
	cmd.Env = []string{"AWS_PROFILE=developer"}
	stdin, err := cmd.StdinPipe()
	if err != nil {
		log.Fatal(err)
	}
	go func() {
		defer stdin.Close()
		io.WriteString(stdin, body)
	}()
	out, err := cmd.CombinedOutput()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s\n", out)
}

If I uncomment cmd.Env = os.Environ(), which imports the environment variables from the OS, it works.

@n4j
Copy link
Member

n4j commented Jun 12, 2021

Hello @jicowan, need some clarification.

Is the intent of the above code to convey that cmd.Env isn't passing env vars correctly to the spawned process?

Below code works as expected for me

package main

import (
	"fmt"
	"log"
	"os/exec"
)


func main() {
	cmd := exec.Command("env")
	cmd.Env = []string{"AWS_PROFILE=developer", "FOO=BAR"}
	out, err := cmd.CombinedOutput()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s\n", out)
}

output

~ go run main.go
AWS_PROFILE=developer
FOO=BAR

➜  ~ go version
go version go1.16.5 linux/amd64
➜  ~ uname -a
Linux turing 5.12.8-051208-generic #202105281232 SMP Fri May 28 12:35:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Also, could you tell if ecr-credential-provider is a proprietary binary or a open source code something similar to amazon-ecr-credential-helper because I want to use it in troubleshooting this issue.

@jicowan
Copy link
Author

jicowan commented Jun 12, 2021

Yes @n4j, it is not passing the Envs to the spawned process. The code for ecr-credential-provider is in the AWS cloud controller repository, https://github.com/kubernetes/cloud-provider-aws/blob/master/cmd/ecr-credential-provider/main.go.

@n4j
Copy link
Member

n4j commented Jun 13, 2021

@jicowan Quick update, I had setup two test beds to reproduce the above issue, but both of them gave expected results and the issue was not reproducible.

Below are my test setups:

ECR Repository 7604xxxx39.dkr.ecr.ap-south-1.amazonaws.com/whalesay:v1

Test Bed 1 Execute ecr-credential-provider from code and pass AWS_PROFILE as environment variable

Code

import (
	"fmt"
	"io"
	"log"
	"os/exec"
)

var (
	body = `{
	"kind": "CredentialProviderRequest", 
	"apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1", 
	"image": "7604xxxx39.dkr.ecr.ap-south-1.amazonaws.com/whalesay:v1"
	}`
)

func main() {
	cmd := exec.Command("./ecr-credential-provider", "get-credentials")
	cmd.Env = []string{"AWS_PROFILE=dev"}
	cmd.Args = []string{"get-credentials"}
	stdin, err := cmd.StdinPipe()
	if err != nil {
		log.Fatal(err)
	}
	go func() {
		defer stdin.Close()
		io.WriteString(stdin, body)
	}()
	out, err := cmd.CombinedOutput()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s\n", out)
}

Observed Result - Below credentials were generated

{
    "kind": "CredentialProviderResponse",
    "apiVersion": "credentialprovider.kubelet.k8s.io/v1alpha1",
    "cacheKeyType": "Registry",
    "cacheDuration": "21.6µs",
    "auth":
    {
        "7604xxxx39.dkr.ecr.ap-south-1.amazonaws.com":
        {
            "username": "AWS",
            "password": "**redacted**"
        }
    }
}

Expected Result - Credentials should be generated

Test Bed 2 Use ecr-credential-provider as the credential provider within the kubelet, I had setup a local test cluster

Provider Config

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
    - get-credentials
    env:
    - name: AWS_PROFILE
      value: dev

Expected result - Pods should be created based off custom whalesay:v2 image

Observed result - Pods were created based off custom whalesay:v2 image

Though, my test bed runs on kubernetes version 1.22, I could see that nothing significant has changed in plugin.go since version 1.20

Could you give me any other minor but important detail which you feel you could had missed out earlier which could help me in replicating the issue? Or could you double check if default provider exists in your ~/.aws/credentials file?

@jicowan
Copy link
Author

jicowan commented Jun 13, 2021

@n4j i am using credential_process, https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-sourcing-external.html in my profile. I am also using KinD for testing, but I don't think that should matter. I don't have a credentials file. I only have a config file.

@jicowan
Copy link
Author

jicowan commented Jun 13, 2021

This is the binary I'm calling with credential_process, https://github.com/jicowan/cognito.

@adisky
Copy link
Contributor

adisky commented Jun 14, 2021

@n4j can you check kubelet logs for Failed getting credential from external registry credential provider, It might be possible that is falling back to in-tree aws credential provider.

@n4j
Copy link
Member

n4j commented Jun 14, 2021

@adisky on my local setup credentials are being fetched.

I'll setup another test bed with the Cognito provider which @jicowan is using.

@adisky
Copy link
Contributor

adisky commented Jun 14, 2021

@n4j It is fetching credentials because it is falling back to other options available, It might be fetching from other credential provider if failed from external one. To confirm this you have to check kubelet logs.

@n4j
Copy link
Member

n4j commented Jun 14, 2021

@adisky In my env I just have one provider and there are no fallbacks.

Though I was keeping an eye on kubelet logs while performing this activity and I saw no such logs. The creds were fetched in one shot via the profile which I had specified.

And as I had mentioned earlier I don't see any issues in passing env vars to credential process.

Also, I did try negative cases as well by passing invalid profile names and it did fail and pods weren't pulled from ECR

@adisky
Copy link
Contributor

adisky commented Jun 14, 2021

@n4j kubelet registers intree credential providers while startup, you need not to explicitly add any!!
https://github.com/kubernetes/kubernetes/blob/master/cmd/kubelet/app/plugins_providers.go#L23-#L25

And I recently tried GCP credential provider, It does fall back to intree GCP credential provider after getting failed response from the one i set up.

@n4j
Copy link
Member

n4j commented Jun 14, 2021

@adisky Sure, makes sense. Though as I mentioned earlier I was keeping an eye on the kubelet logs and I didn't see the logs you had mentioned

@jicowan
Copy link
Author

jicowan commented Jun 14, 2021

@adisky i see Error running credential provider plugin: NoCredentialProviders: no valid providers in chain. Deprecated. in the logs when the kubelet starts, before attempting to run a pod that pulls an image from ECR.

@n4j
Copy link
Member

n4j commented Jun 14, 2021

@jicowan Had a light bulb moment and I was able to reproduce this issue, it might not be applicable in your case but it could be worth trying out.

In my local setup I ran the cluster as root and ecr-credential-provider failed with error:

E0614 18:19:27.692505 42719 main.go:165] Error running credential provider plugin: NoCredentialProviders: no valid providers in chain. Deprecated.

E0614 18:19:27.693579 40572 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 7604xxxx139.dkr.ecr.ap-south-1.amazonaws.com/whalesay: exit status 1

I have my aws credentials configure under my user (n4j) at path ~/aws/credentials and since there is no similar configuration for root the process failed.

Since, I have access to ecr-credential-provider code locally, I added couple of log statements to print out current env vars and I could see that env var AWS_PROFILE was set to dev :)

Would it be possible for you to check kubelet is running as what user and is there any dissonance which can prohibit the user from successfully retrieving the credentials?

@jicowan
Copy link
Author

jicowan commented Jun 14, 2021

@n4j I am getting the error you mentioned, NoCredentialProviders: no valid providers in chain. Deprecated. So far as I can tell, the kubelet is running as root and the config file is in the /root/.aws/ directory. Again, I'm using KinD for testing. I'm not sure how KinD runs the kubelet. Below is the output from systemsctl cat kubelet.service:

root@ecr-kubelet-worker:/# systemctl cat kubelet.service
# /etc/systemd/system/kubelet.service
# slightly modified from:
# https://github.com/kubernetes/kubernetes/blob/ba8fcafaf8c502a454acd86b728c857932555315/build/debs/ku
belet.service
[Unit]
Description=kubelet: The Kubernetes Node Agent
Documentation=http://kubernetes.io/docs/
# NOTE: kind deviates from upstream here to avoid crashlooping
# This does *not* support altering the kubelet config path though.
# We intend to upstream this change but first need to solve the upstream
# Packaging problem (all kubernetes versions use the same files out of tree).
ConditionPathExists=/var/lib/kubelet/config.yaml

[Service]
ExecStart=/usr/bin/kubelet
Restart=always
StartLimitInterval=0
# NOTE: kind deviates from upstream here with a lower RestartSecuse
RestartSec=1s

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
# https://github.com/kubernetes/kubernetes/blob/ba8fcafaf8c502a454acd86b728c857932555315/build/debs/10
-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --k
ubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_K
UBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably,
 the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS
 should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
# On cgroup v1, the /kubelet cgroup is created in the entrypoint script before running systemd.
# On cgroup v2, the /kubelet cgroup is created here. (See the comments in the entrypoint script for th
e reason.)
ExecStartPre=/bin/sh -euc "if [ -f /sys/fs/cgroup/cgroup.controllers ]; then create-kubelet-cgroup-v2;
 fi"
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELE
T_EXTRA_ARGS --cgroup-root=/kubelet

The Go code I wrote to test this fails when using cmd.Env; it's only when I import the OS environment variables that it works. This may because I am using credential_process in my shared config file. credential_process is a mechanism to call another binary, e.g. Cognito, to get a set of temporary credentials from Cognito.

@n4j
Copy link
Member

n4j commented Jun 14, 2021

@jicowan In order to simplify the sprawl and to pinpoint the offending component,could you remove the credential_process and Cognito and just try fetching creds via plain AWS credentials profile?

@jicowan
Copy link
Author

jicowan commented Jun 14, 2021

@n4j, sure, however, supporting credential process is critical for my use case. I'm using Cognito as a gating mechanism to ECR. Only licensed users will be allowed to pull images from ECR. These users will not have IAM credentials (access key/secret key). They will authenticate to AWS using Cognito.

@jicowan
Copy link
Author

jicowan commented Jun 14, 2021

The following configuration works when I specify access key/secret key:

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
     - get-credentials
    env:
    - name: "AWS_SDK_LOAD_CONFIG"
      value: "1"
    - name: "AWS_PROFILE"
      value: "default"
    - name: "AWS_CONFIG_FILE"
      value: "/root/.aws/config"

@jicowan
Copy link
Author

jicowan commented Jun 14, 2021

When I switch to a profile with credential process I get the following error:

Jun 14 18:33:10 ecr-kubelet-worker kubelet[390834]: I0614 18:33:10.432086  390834 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/6dbfb203-81a2-4801-a80b-5e5905c75abd-default-token-fmsqk") pod "cats" (UID: "6dbfb203-81a2-4801-a80b-5e5905c75abd")
Jun 14 18:33:10 ecr-kubelet-worker kubelet[390834]: E0614 18:33:10.982200  390834 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 14 18:33:11 ecr-kubelet-worker kubelet[390834]: E0614 18:33:11.247141  390834 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized

@adisky
Copy link
Contributor

adisky commented Jun 15, 2021

@jicowan isn't it the issue of ecr-credential-provider? This is the job of ecr-credential-provider to fetch credentials and return result in expected format to kubelet.

@jicowan
Copy link
Author

jicowan commented Jun 15, 2021

@adisky why does it work when I run it from the command line? Why does it work when I import envs from the OS?

@n4j
Copy link
Member

n4j commented Jun 15, 2021

@jicowan I am not familiar with aws credential process. Kubelet talks to ecr-credential-provider with exec based plugin mechanism, is ecr-credential-provider calling the credential process you are talking about?

@adisky Here's the execution flow kubelet -> ecr-credential-process -> aws plugin looks for the profile configured via env var -> it sees a binary needs to be executed (written by Jim) -> generate credentials via calling Congnito API

@jicowan Does the above paints a fairly accurate picture? If not then feel free to correct :)

@jicowan
Copy link
Author

jicowan commented Jun 15, 2021

@jicowan I have already validated that env vars are being passed to ecr-credential-process and there is no issue over there. If that wouldn't be the case then AWS credentials generation process via profile config wouldn't work for you.

I had debugged the code of kubelet myself and validated it + I wrote some logs in ecr-credential-process to print out the env vars and it was logging the passed vars.

IMHO, now what remains to be seen here is why it fails with Cognito. I am going to create a test bed tomorrow replicating the exact setup which you have in the mean time if it's possible for you could you create a Linux VM and replicate your setup inside it?

My intention is to see whether the Cognito based credential process goes through or not in a clean slate environment if it does then something on your local setup is interfering with passed env vars

@adisky Thoughts?

I included my KinD configuration in the beginning of this thread. Can you use that @n4j? The cognito binary has to be built from source, https://github.com/jicowan/cognito (or I can send it to you along with a temp username/password). There is an accompanying blog post for the binary at https://jicowan.medium.com/pulling-docker-images-from-ecr-private-registries-with-a-cognito-user-pool-identity-95f4c233fc4a.

The flow @adisky is as @n4j describes above. The cognito binary returns a json blob to the calling application that looks like this:

{
    "Version": 1,
    "AccessKeyId": "an AWS access key",
    "SecretAccessKey": "your AWS secret access key",
    "SessionToken": "the AWS session token for temporary credentials", 
    "Expiration": "ISO8601 timestamp for when the credentials expire"
}

During my testing (without the kubelet) I am able to get the proper response for the kubelet when referencing the AWS profile using os.Getenv or by setting exec.Env to os.Environ(). However, when I try to pass the AWS profile in using exec.Env("AWS_PROFILE=default") I get exit status 1 and no further information. I have a feeling that exec doesn't properly chain or pass information back to the calling application.

@n4j
Copy link
Member

n4j commented Jun 16, 2021

@jicowan Let me replicate your setup in my local env and see if I can reproduce this issue.

Also, could you tell what your AWS config file look like? Is it something like this

# I believe config file is store at path: /root/.aws/config

[profile developer]
credential_process = /opt/bin/cognito --username stevie --password n1ck5

You can use placeholder values for sensitive information but correct the binary path if it's not /opt/bin

@jicowan
Copy link
Author

jicowan commented Jun 16, 2021

@n4j yes, that looks right. Ideally the cognito binary is located in a directory in the shell's PATH but it also works when you specify the full path in the config file like you've done.

Are you planning to setup cognito in AWS? You'll need a cognito user pool and configure federation.

@n4j
Copy link
Member

n4j commented Jun 16, 2021

@jicowan Yes, I am planning to setup Cognito Identity pool in my AWS Account.

@n4j
Copy link
Member

n4j commented Jun 17, 2021

@jicowan So I replicated your setup on my local machine with slight change, instead of using AWS Cognito for generating credentials via credential_provider I am calling a binary written by me which logs env vars to a specific path and prints out credentials in the format as prescribed by AWS.

Below was my initial config file

[default]
region = us-east-1

[profile cognito]
credential_process = /usr/bin/static-cred-provider

After I ran local cluster with this setup I noticed couple of log lines like below:

I0617 20:53:09.755229   14141 plugin.go:362] Error execing credential provider plugin, stderr: E0617 20:53:09.754846   15906 main.go:156] Error running credential provider plugin: ProcessProviderExecutionError: error in credential_process
caused by: exec: "sh": executable file not found in $PATH

I did some basic validation of perms + I even moved the binary to couple of well know locations but nothing worked

Finally I modified my env vars section of my deployment spec like below and it worked

    # Env defines additional environment variables to expose to the process. These
    # are unioned with the host's environment, as well as variables client-go uses
    # to pass argument to the plugin.
    # +optional
    env:
    - name: AWS_PROFILE
      value: "cognito"
    - name: AWS_CONFIG_FILE
      value: "/root/.aws/config"
    - name: AWS_SDK_LOAD_CONFIG
      value: "1"
    - name: PATH
      value: "$PATH:/usr/local/bin:/usr/bin"

The logs which I had put in my custom credential provider printed following env vars validating the fact that, env vars are being passed

AWS_CONFIG_FILE=/root/.aws/config
AWS_SDK_LOAD_CONFIG=1
PATH=$PATH:/usr/local/bin:/usr/bin
AWS_PROFILE=cognito
PWD=/mnt/87c32d9c-5d4e-4bb8-8bc6-8d9239585657/go/src/k8s.io/kubernetes

In your logs do you see a log line something like above? If not then could you pass full contents of kubelet.log from start up until you create a Pod?

@jicowan
Copy link
Author

jicowan commented Jun 17, 2021

@n4j thanks for doing this. I never tried adding the PATH to the list of environment variables. I honestly didn't think it was necessary because I was referencing the full path to the binary in credential_process. I also never saw an error about not being able to find the executable in $PATH. I'll re-test after adding PATH as an ENV and let you know the results.

@jicowan
Copy link
Author

jicowan commented Jun 17, 2021

@n4j Adding the PATH did not work for me.

Jun 17 17:42:02 ecr-kubelet-worker systemd[1]: Starting kubelet: The Kubernetes Node Agent...
Jun 17 17:42:02 ecr-kubelet-worker systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --provider-id has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --fail-swap-on has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: Flag --cgroup-root has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.572914  442992 server.go:416] Version: v1.20.7
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.574994  442992 server.go:837] Client rotation is on, will bootstrap in background
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.587696  442992 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Jun 17 17:42:02 ecr-kubelet-worker kubelet[442992]: I0617 17:42:02.589504  442992 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: W0617 17:42:07.603815  442992 sysinfo.go:203] Nodes topology is not available, providing CPU topology
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: W0617 17:42:07.604901  442992 sysfs.go:348] unable to read /sys/devices/system/cpu/cpu0/online: open /sys/devices/system/cpu/cpu0/online: no such file or directory
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609584  442992 container_manager_linux.go:274] container manager verified user specified cgroup-root exists: [kubelet]
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609656  442992 container_manager_linux.go:279] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/kubelet CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalTopologyManagerScope:container ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609720  442992 topology_manager.go:120] [topologymanager] Creating topology manager with none policy per container scope
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609746  442992 container_manager_linux.go:310] [topologymanager] Initializing Topology Manager with none policy and container-level scope
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609770  442992 container_manager_linux.go:315] Creating device plugin manager: true
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609909  442992 remote_runtime.go:62] parsed scheme: ""
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.609946  442992 remote_runtime.go:62] scheme "" not registered, fallback to default scheme
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610010  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610045  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610348  442992 remote_image.go:50] parsed scheme: ""
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610397  442992 remote_image.go:50] scheme "" not registered, fallback to default scheme
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610458  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610488  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610600  442992 kubelet.go:394] Attempting to sync node with API server
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610649  442992 kubelet.go:262] Adding pod path: /etc/kubernetes/manifests
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610734  442992 kubelet.go:273] Adding apiserver pod source
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.610783  442992 apiserver.go:43] Waiting for node sync before watching apiserver pods
Jun 17 17:42:07 ecr-kubelet-worker kubelet[442992]: I0617 17:42:07.615457  442992 kuberuntime_manager.go:216] Container runtime containerd initialized, version: v1.5.1, apiVersion: v1alpha2
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: E0617 17:42:08.348527  442992 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]:         For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.350418  442992 server.go:1176] Started kubelet
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.352360  442992 server.go:148] Starting to listen on 0.0.0.0:10250
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.357062  442992 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.362365  442992 volume_manager.go:271] Starting Kubelet Volume Manager
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.363289  442992 desired_state_of_world_populator.go:142] Desired state populator starts to run
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.365796  442992 server.go:410] Adding debug handlers to kubelet server.
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367698  442992 client.go:86] parsed scheme: "unix"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367745  442992 client.go:86] scheme "unix" not registered, fallback to default scheme
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367863  442992 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.367906  442992 clientconn.go:948] ClientConn switching balancer to "pick_first"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.463987  442992 kuberuntime_manager.go:1006] updating runtime config through cri with podcidr 10.244.1.0/24
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.466961  442992 kubelet_node_status.go:71] Attempting to register node ecr-kubelet-worker
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.467682  442992 kubelet_network.go:77] Setting Pod CIDR:  -> 10.244.1.0/24
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.488026  442992 kubelet_node_status.go:109] Node ecr-kubelet-worker was previously registered
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.488308  442992 kubelet_node_status.go:74] Successfully registered node ecr-kubelet-worker
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.494247  442992 setters.go:577] Node became not ready: {Type:Ready Status:False LastHeartbeatTime:2021-06-17 17:42:08.4940374 +0000 UTC m=+6.019140201 LastTransitionTime:2021-06-17 17:42:08.4940374 +0000 UTC m=+6.019140201 Reason:KubeletNotReady Message:[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]}
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597806  442992 cpu_manager.go:193] [cpumanager] starting with none policy
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597861  442992 cpu_manager.go:194] [cpumanager] reconciling every 10s
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.597895  442992 state_mem.go:36] [cpumanager] initializing new in-memory state store
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598034  442992 state_mem.go:88] [cpumanager] updated default cpuset: ""
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598054  442992 state_mem.go:96] [cpumanager] updated cpuset assignments: "map[]"
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.598086  442992 policy_none.go:43] [cpumanager] none policy: Start
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: W0617 17:42:08.600866  442992 manager.go:594] Failed to retrieve checkpoint for "kubelet_internal_checkpoint": checkpoint is not found
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.603698  442992 plugin_manager.go:114] Starting Kubelet Plugin Manager
Jun 17 17:42:08 ecr-kubelet-worker kubelet[442992]: I0617 17:42:08.612594  442992 apiserver.go:53] Watching apiserver
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712753  442992 kubelet_network_linux.go:56] Initialized IPv4 iptables rules.
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712858  442992 status_manager.go:158] Starting to sync pod status with apiserver
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.712921  442992 kubelet.go:1828] Starting kubelet main sync loop.
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: E0617 17:42:10.713055  442992 kubelet.go:1852] skipping pod synchronization - PLEG is not healthy: pleg has yet to be successful
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.813844  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.818993  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: W0617 17:42:10.823658  442992 pod_container_deletor.go:79] Container "7b37db22207fcc990241bf967e38cba672dff9cae490079fe0c3f8cafd4a04c5" not found in pod's containers
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.840626  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-lib-modules") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.841468  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kindnet-token-qk58g" (UniqueName: "kubernetes.io/secret/8e759947-b33a-4e80-9a1e-6c5e39f078b3-kindnet-token-qk58g") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.841859  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy" (UniqueName: "kubernetes.io/configmap/04093824-c6f7-42be-a9a8-b905cd2a6830-kube-proxy") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.842412  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/04093824-c6f7-42be-a9a8-b905cd2a6830-xtables-lock") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.842892  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "lib-modules" (UniqueName: "kubernetes.io/host-path/04093824-c6f7-42be-a9a8-b905cd2a6830-lib-modules") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.843316  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "kube-proxy-token-4ht2v" (UniqueName: "kubernetes.io/secret/04093824-c6f7-42be-a9a8-b905cd2a6830-kube-proxy-token-4ht2v") pod "kube-proxy-42z6n" (UID: "04093824-c6f7-42be-a9a8-b905cd2a6830")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.843627  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "cni-cfg" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-cni-cfg") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.845269  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "xtables-lock" (UniqueName: "kubernetes.io/host-path/8e759947-b33a-4e80-9a1e-6c5e39f078b3-xtables-lock") pod "kindnet-lbzh9" (UID: "8e759947-b33a-4e80-9a1e-6c5e39f078b3")
Jun 17 17:42:10 ecr-kubelet-worker kubelet[442992]: I0617 17:42:10.845298  442992 reconciler.go:157] Reconciler: start to sync state
Jun 17 17:42:21 ecr-kubelet-worker kubelet[442992]: I0617 17:42:21.685653  442992 topology_manager.go:187] [topologymanager] Topology Admit Handler
Jun 17 17:42:21 ecr-kubelet-worker kubelet[442992]: I0617 17:42:21.728246  442992 reconciler.go:224] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") pod "cats" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008")
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.151045  442992 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.500764  442992 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.500940  442992 kuberuntime_image.go:51] Pull image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.501419  442992 kuberuntime_manager.go:829] container &Container{Name:cats,Image:820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:default-token-fmsqk,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008): ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.501573  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ErrImagePull: "rpc error: code = Unknown desc = failed to pull and unpack image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": failed to resolve reference \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": unexpected status code [manifests v2]: 401 Unauthorized"
Jun 17 17:42:22 ecr-kubelet-worker kubelet[442992]: E0617 17:42:22.780791  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ImagePullBackOff: "Back-off pulling image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\""
Jun 17 17:42:34 ecr-kubelet-worker kubelet[442992]: E0617 17:42:34.757420  442992 plugin.go:198] Failed getting credential from external registry credential provider: error execing credential provider plugin ecr-credential-provider for image 820537372947.dkr.ecr.us-east-1.amazonaws.com/cats: exit status 1
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.058457  442992 remote_image.go:113] PullImage "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" from image service failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.058819  442992 kuberuntime_image.go:51] Pull image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2" failed: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.059946  442992 kuberuntime_manager.go:829] container &Container{Name:cats,Image:820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2,Command:[],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:default-token-fmsqk,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:Always,SecurityContext:nil,Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,} start failed in pod cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008): ErrImagePull: rpc error: code = Unknown desc = failed to pull and unpack image "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": failed to resolve reference "820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2": unexpected status code [manifests v2]: 401 Unauthorized
Jun 17 17:42:35 ecr-kubelet-worker kubelet[442992]: E0617 17:42:35.060284  442992 pod_workers.go:191] Error syncing pod bb5278c8-fa3a-4b8c-a8af-e5aa57916008 ("cats_default(bb5278c8-fa3a-4b8c-a8af-e5aa57916008)"), skipping: failed to "StartContainer" for "cats" with ErrImagePull: "rpc error: code = Unknown desc = failed to pull and unpack image \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": failed to resolve reference \"820537372947.dkr.ecr.us-east-1.amazonaws.com/cats:v2\": unexpected status code [manifests v2]: 401 Unauthorized"
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.442993  442992 reconciler.go:196] operationExecutor.UnmountVolume started for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") pod "bb5278c8-fa3a-4b8c-a8af-e5aa57916008" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008")
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.445313  442992 operation_generator.go:797] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk" (OuterVolumeSpecName: "default-token-fmsqk") pod "bb5278c8-fa3a-4b8c-a8af-e5aa57916008" (UID: "bb5278c8-fa3a-4b8c-a8af-e5aa57916008"). InnerVolumeSpecName "default-token-fmsqk". PluginName "kubernetes.io/secret", VolumeGidValue ""
Jun 17 17:42:39 ecr-kubelet-worker kubelet[442992]: I0617 17:42:39.544298  442992 reconciler.go:319] Volume detached for volume "default-token-fmsqk" (UniqueName: "kubernetes.io/secret/bb5278c8-fa3a-4b8c-a8af-e5aa57916008-default-token-fmsqk") on node "ecr-kubelet-worker" DevicePath ""

I'm happy to send you the cognito binary with a username/password that you can use for testing. I have only tested this with KinD, but I don't think that is the issue here since it fails with exit code when when I try using exec.Env().

@jicowan
Copy link
Author

jicowan commented Jun 17, 2021

kind: CredentialProviderConfig
apiVersion: kubelet.config.k8s.io/v1alpha1
providers:
  - name: ecr-credential-provider
    matchImages:
    - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1alpha1
    args:
     - get-credentials
    env:
    - name: AWS_SDK_LOAD_CONFIG
      value: "1"
    - name: AWS_PROFILE
      value: "default"
    - name: AWS_CONFIG_FILE
      value: "/root/.aws/config"
    - name: PATH
      value: "$PATH:/usr/local/kubelet/bin"

^^ is my credential provider config

@n4j
Copy link
Member

n4j commented Jun 17, 2021

@jicowan Thanks for the logs. I have cloned your code in my repo and I'll substitute my Cognito creds. If it doesn't work out then I'll reach out to you for creds 🙂

@n4j
Copy link
Member

n4j commented Jun 20, 2021

@jicowan After several rounds of troubleshooting I was able to get ecr-credential-provider working in my KinD environment.

My credential loading setup is ecr-credential-provider <-> cognito <-> AWS Cognito and below is my configuration of credential process

[profile cognito]
credential_process = /usr/bin/cognito -u n4j-REDACTED -p __4C1751_REDACTED_523be3993

Initially, I got the same error as yours, then I added couple of custom log lines to print env vars in ecr-credential-provider in /tmp directory and I could verify that the env vars specified in the credentials spec were indeed being passed.

What I noticed in my KinD setup is that AWS SDK was logging an error because it couldn't find any "providers", though I could verify in the kind-worker container that /root/.aws/config file and /root/.aws/credentials were indeed present.

I altered my env vars section of the CredentialProviderConfig as below and it worked for me

    env:
    - name: AWS_PROFILE
      value: "cognito"
    - name: AWS_SDK_LOAD_CONFIG
      value: "1"
    - name: HOME # this is what did the trick for me
      value: "/root"
    - name: PATH
      value: "$PATH:/usr/local/bin:/usr/bin:/usr/local/kubelet"

Apparently, what I noticed is ecr-credential-provider is being passed ONLY the env vars which are present in the spec and everything else was missing! Or at least this is the behaviour in my local cluster setup + KinD cluster.

Can you please add HOME and other env vars necessary for the correct execution of AWS SDK and see if it works for you?

If this doesn't work, then I'll send you the modified version of the ecr-credential-provider which puts some additional logs to troubleshoot this issue further.

@jicowan
Copy link
Author

jicowan commented Jun 21, 2021

@n4j That did it! Thanks for helping me troubleshoot this. I wonder if this is unique to KinD or whether we would see this behavior in a "normal" cluster (i'll test it later). I will tell the developers who wrote the ecr-credential-provider to include an argument to increase the verbosity or to output logs to a file so this is easier to troubleshoot in the future.

@jicowan jicowan closed this as completed Jun 21, 2021
@n4j
Copy link
Member

n4j commented Jun 21, 2021

@jicowan What I have observed is, ecr-credential-process is only passed env vars which are in credential spec and the spec clearly says that the vars in spec are appended to existing env vars and this looks like an issue which needs deeper investigation.

Could you please re-open this issue?

@jicowan
Copy link
Author

jicowan commented Jun 21, 2021

@n4j sure thing.

@jicowan jicowan reopened this Jun 21, 2021
@k8s-ci-robot
Copy link
Contributor

@n4j: The label triage/accepted cannot be applied. Only GitHub organization members can add the label.

In response to this:

/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@n4j
Copy link
Member

n4j commented Jun 21, 2021

@jicowan What I have observed is, ecr-credential-process is only passed env vars which are in credential spec and the spec clearly says that the vars in spec are appended to existing env vars and this looks like an issue which needs deeper investigation.

Could you please re-open this issue?

@fromanirh @liggitt To me this looks like an issue with the kubelet and I have observed this behaviour with KinD as well as my local cluster.

What are your thoughts on this?

If this looks like a bug then can you accept the triage?

I would fix this.

@adisky
Copy link
Contributor

adisky commented Jun 21, 2021

@n4j Thanks for looking into the issue, you are right the spec does says that. Feel free to fix it
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/types.go#L550-#L554
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 21, 2021
@n4j
Copy link
Member

n4j commented Jun 22, 2021

@adisky Thanks!!!

@swatisehgal
Copy link
Contributor

/area kubelet

@jyotibhanot
Copy link

@n4j : I am facing the similar issue on my bare metal cluster

This is my config

apiVersion: kubelet.config.k8s.io/v1beta1
kind: CredentialProviderConfig
providers:
  - name: ecr-credential-provider
    matchImages:
      - "*.dkr.ecr.*.amazonaws.com"
    defaultCacheDuration: "12h"
    apiVersion: credentialprovider.kubelet.k8s.io/v1beta1
    env:
      - name: AWS_PROFILE
        value: "default"
      - name: HOME
        value: "/root"
      - name: AWS_CONFIG_FILE
        value: "/root/.aws/config"     
      - name: PATH
        value: "$PATH:/usr/local/kubelet/bin" 

Can you help. It works when I try to execute the credentials binary from local but doesn't work in the pod. Can you pls guide me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging a pull request may close this issue.

6 participants