flower road: 12月 2024

2024年12月13日金曜日

helm( kube-prometheus-stack)とlokiの連携

helm経由で、 kube-prometheus-stackとloki stackを入れるだけだと連携ができないので

追加で以下の手順を進める必要がある。

1)Loki stackの導入を実施

helm install loki grafana/loki-stack --namespace monitoring --create-namespace

garafanaとの連携用のファイルを作成する（configMapでの設定方法もあるが

今回は、以下にする）

grafana:

  datasources:

    datasources.yaml:

      apiVersion: 1

      datasources:

        - name: Loki

          type: loki

          url: http://loki.monitoring.svc.cluster.local:3100

          access: proxy

          isDefault: false

2)以下、コマンドを行うことでkube-prometheus-stackの修正が行われる

helm upgrade kube-prometheus-stack-xxxxxx prometheus-community/kube-prometheus-stack \

--namespace monitoring \

-f values.yaml

⚪︎ kube-prometheus-stack-xxxxxxは、 kube-prometheus-stackのリリース名を指定すること

設定自体は、手動で行う必要なし

（勝手に設定されている）

Raspberry pi(k8s)のネットワーク変更

rasuberry pi3台をwifi接続からethernetに変更することにしました。（なんか動作不安定なので）

1)以下、３台のraspberry piの修正後、以下実施
sudo netplan apply

cat > /etc/netplan/99-manual.yaml <<EOF
network:
  version: 2
  renderer: networkd
  wifis:
    wlan0:
      dhcp4: false
      addresses:
        - $IP_ADDRESS/24
      routes:
        - to: default
          via: 192.168.10.1
      nameservers:
        addresses:
          - 8.8.8.8
      access-points:
          xxxxxxxx5g:
              password: xxxxxxxxxxx
EOF

<After>

cat > /etc/netplan/99-manual.yaml <<EOF
network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      dhcp4: false
      addresses:
        - $IP_ADDRESS/24
      routes:
        - to: default
          via: 192.168.10.1
      nameservers:
        addresses:
          - 8.8.8.8

EOF

ネットワーク変更後、以下のエラーがk8sのイベントに表示されまくったので
トラブルシューを行なっていく

[error message]
Nameserver limits were exceeded, some nameservers have been omitted, the applied nameserver line is: 8.8.8.8 2409:11:9aa0:500:8222:a7ff:fe20:14f4 192.168.10.1

2)まず、CoreDNSのコンフィグマップの修正を行いデプロイ実施

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        hosts {
            192.168.10.11 prometheus.local
            192.168.10.11 prometheus.local.default.svc.cluster.local
            fallthrough
        }
        forward . 8.8.8.8 8.8.4.4
        cache 300
        reload
        loadbalance
        bind 0.0.0.0
    }

3)次に、CoreDNSの再起動を行う
kubectl -n kube-system rollout restart deployment coredns

raspberry pi側：

4)ipv6の設定を無効化させるために以下を実施
sudo sysctl -p

5)無効化されているか以下のコマンドを実施して、何も表示されてないことを確認
ip -6 addr show

2024年12月10日火曜日

Helm(PVCの追加)

1)以下、クラスタ内にフォルダを作成する

/mnt/data

/mnt/data2

2)PVC用のマニュフェストを作成(garafana/Prometheus用)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

---
# Grafana 用 PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/data # ノードのローカルパス (Grafana 用)
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - rasp-node1 # ノード名を指定
                - rasp-node2

---
# Prometheus 用 PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: local-storage
  local:
    path: /mnt/data2 # ノードのローカルパス (Prometheus 用)
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - rasp-node1 # ノード名を指定
                - rasp-node2

3)デプロイを行う

kubectl apply -f /Volumes/Store/Technology/k8s/PVC/garafana-prometheus-pvc.yaml

4)Helmの記載に追記するマニュフェスト作成(volumes.yaml)

# Prometheus のストレージ設定
prometheus:
  prometheusSpec:
    tolerations:
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi # Prometheus のストレージサイズを指定
          storageClassName: local-storage # 作成したストレージクラス名を指定

# Grafana のストレージ設定
grafana:
  tolerations:
    - key: "node-role.kubernetes.io/control-plane"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  persistence:
    enabled: true
    size: 10Gi # Grafana のストレージサイズを指定
    storageClassName: local-storage # 作成したストレージクラス名を指定

5)以下のアップデートコマンド実施

helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f volumes.yaml

Prometheus/Grafana(Helmによる導入)

手動で、yamlからprometheusやgrafanaを入れるとハマる要素がありすぎるので
素直にhelmから入れることにする
（container系のcpu使用率を確認するクエリがまともに動かなかったのが経緯）

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update helm install prometheus prometheus-community/kube-prometheus-stack

以下が含まれる。

Prometheus Operator
Prometheus Server、Alertmanager、および関連するCustom Resource群
Node Exporter

2024年12月9日月曜日

PVC(2024年版)

grafanaにpvcを追加を行ってので以下、手順を記載

1)以下、k8sのクラスターにて、以下のマウント先を作成する
sudo mkdir -p /mnt/data
sudo chmod 777 /mnt/data

2)pvc用に以下赤枠を記載

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
        - name: grafana
          image: grafana/grafana:9.0.5
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "512Mi" # コンテナに最低限必要なメモリ
              cpu: "250m" # コンテナに最低限必要なCPU
            limits:
              memory: "1Gi" # コンテナが使用できる最大メモリ
              cpu: "500m" # コンテナが使用できる最大CPU
          volumeMounts:
            - name: grafana-storage
              mountPath: /var/lib/grafana # Grafanaデータの保存先パス
      volumes:
        - name: grafana-storage
          persistentVolumeClaim:
            claimName: grafana-pvc # PVCの名前
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-pv
  namespace: monitoring
spec:
  capacity:
    storage: 5Gi # PVCで要求したストレージ容量に一致させる
  accessModes:
    - ReadWriteOnce # PVCで指定したアクセスモードと一致させる
  persistentVolumeReclaimPolicy: Retain # PVC削除後の挙動
  storageClassName: standard # PVCと同じストレージクラスを使用
  hostPath:
    path: "/mnt/data" # 実際のノード上のパス。適切なパスを指定してください。

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monitoring
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: standard # ストレージクラスの指定
  resources:
    requests:
      storage: 5Gi # ストレージ容量の指定

2024年12月7日土曜日

k8s(metrics server)

1)yamlファイルをダウンロードする
curl -LO https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

2)以下、追加する

    spec:
      containers:
        - args:
            - --cert-dir=/tmp
            - --secure-port=10250
            - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
            - --kubelet-use-node-status-port
            - --metric-resolution=15s
            - --kubelet-insecure-tls # この行を追加




3)以下のkubectlコマンドで各種リソースの確認ができる
（grafanaでも見れるけど、一応入れてみた）

kubectl top nodes 
kubectl top pods -A

cilium(CNI)

導入済みのcalicoからciliumに変更を行ってみる。

公式：

https://docs.cilium.io/en/latest/gettingstarted/k8s-install-default/

1)以下、コントロールプレーンにて実施(arm64版)

------------------------

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)

CLI_ARCH=amd64

if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi

curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum

sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin

rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}

--------------------------

2)コントロールプレーンにて実施

cilium install

3)以下、ステータスを確認

podが作成された様子

2024年12月6日金曜日

CoreDNS(トラブルシューティング)

ヘルスチェック(HTTPポート指定：デフォルト値)でコケまくるので

TCPに変更

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "5"
  creationTimestamp: "2024-08-17T05:57:02Z"
  generation: 11
  labels:
    k8s-app: kube-dns
  name: coredns
  namespace: kube-system
  resourceVersion: "186676"
  uid: 37448d7a-0b91-42b5-9a55-bf2dfe2981d1
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        kubectl.kubernetes.io/restartedAt: "2024-12-06T10:48:09+09:00"
      creationTimestamp: null
      labels:
        k8s-app: kube-dns
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: k8s-app
                      operator: In
                      values:
                        - kube-dns
                topologyKey: kubernetes.io/hostname
              weight: 100
      containers:
        - args:
            - -conf
            - /etc/coredns/Corefile
          image: registry.k8s.io/coredns/coredns:v1.11.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            tcpSocket:
              port: 53
            initialDelaySeconds: 60
            timeoutSeconds: 5
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 5
          name: coredns
          ports:
            - containerPort: 53
              name: dns
              protocol: UDP
            - containerPort: 53
              name: dns-tcp
              protocol: TCP
            - containerPort: 9153
              name: metrics
              protocol: TCP
          readinessProbe:
            tcpSocket:
              port: 53
            initialDelaySeconds: 0
            timeoutSeconds: 1
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
          resources:
            limits:
              memory: 170Mi
            requests:
              cpu: 100m
              memory: 70Mi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              add:
                - NET_BIND_SERVICE
              drop:
                - ALL
            readOnlyRootFilesystem: true
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /etc/coredns
              name: config-volume
              readOnly: true
      dnsPolicy: Default
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: coredns
      serviceAccountName: coredns
      terminationGracePeriodSeconds: 30
      tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
        - effect: NoSchedule
          key: node-role.kubernetes.io/control-plane
      volumes:
        - configMap:
            defaultMode: 420
            items:
              - key: Corefile
                path: Corefile
            name: coredns
          name: config-volume

以下、コマンド実行してcoreDNSを再作成実施
kubectl rollout restart deployment coredns -n kube-system

以下実施後、権限エラーがでた
kubectl logs -n kube-system -l k8s-app=kube-dns

-------log---------
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:serviceaccount:kube-system:default" cannot list resource "services" in API group "" at the cluster scope
------------------------------

RBACの追加を行う

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: coredns-role
rules:
  - apiGroups: [""]
    resources: ["services", "pods", "namespaces"]
    verbs: ["list", "watch"]
  - apiGroups: ["discovery.k8s.io"]
    resources: ["endpointslices"]
    verbs: ["list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: coredns-role-binding
subjects:
  - kind: ServiceAccount
    name: default
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: coredns-role
  apiGroup: rbac.authorization.k8s.io

テスト用にcoreDNSにサイドカーコンテナを追加して、nslookupを実施する
->疎通確認ができた。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      containers:
        - name: coredns
          image: registry.k8s.io/coredns/coredns:v1.11.1
          args:
            - -conf
            - /etc/coredns/Corefile
          ports:
            - containerPort: 53
              name: dns
              protocol: UDP
            - containerPort: 53
              name: dns-tcp
              protocol: TCP
            - containerPort: 9153
              name: metrics
              protocol: TCP
          volumeMounts:
            - mountPath: /etc/coredns
              name: config-volume
              readOnly: true
        - name: busybox
          image: busybox
          command: ["sleep", "3600"] # これにより、コンテナが長時間実行されます
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
              - key: Corefile
                path: Corefile

2024年12月5日木曜日

MetalLBの初期設定2024

導入済みのprometheusをLBを設置するも失敗、理由は、オンプレ(raspberry pi)では

別途、metalLBが必要とのこと。

1)導入方法は、公式の通りに実施
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml

2)L2Advertisementの設定も必要になったので、以下記載

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: prometheus-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
    - prometheus-pool

prometheus用のLBに割り振るIPの記載も忘れずに行う

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: prometheus-pool
  namespace: metallb-system
spec:
  addresses:
    - 192.168.10.10-192.168.10.20

2024年12月3日火曜日

Grafana(トラブルシューティング)

grafana経由で、prometheusからメトリクスが取得できなかったので

prometheusのpodログを見てみることに

ログを見てみると、権限系のエラーが出ているみたい。

[エラーログの確認]

ログが示すエラー内容：

nodes is forbidden: ノードリソースのリスト取得が拒否されました。
pods is forbidden: ポッドリソースのリスト取得が拒否されました。

原因は、system:serviceaccount:monitoring:defaultにこれらのリソースを操作する権限がない

<prometheusのログ>

---------------------------------------------

msg="pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:default\" cannot list resource \"pods\" in API group \"\" at the cluster scope" ts=2024-12-03T05:46:53.394Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.24.0/tools/cache/reflector.go:167: failed to list *v1.Node: nodes is forbidden: User

---------------------------------------------

以下、RBACの追記して、デプロイ実施

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: argocd-lr-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: argocd-lr-role
subjects:
  - kind: ServiceAccount
    name: argocd-application-controller
    namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: monitoring-role
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes"]
    verbs: ["list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: monitoring-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: monitoring-role
subjects:
  - kind: ServiceAccount
    name: default
    namespace: monitoring

適応後、再度grafanaにて、確認すると。。。

Loki設定(2024版)

1)helm経由でLokiを導入してみる

2)add data soueceからLokiを選択、設定値は、以下になる

3)ログの集約の様子

Exploerから、以下の赤枠の設定で、対象のログを集約できる

Grafana設定(2024編)

初期導入を行ったので、詳細な設定を行なっていくことにする。

初回ログインについては、ID: admin / Pass: adminだった。（その後、パスを設定するスタイル）

1)Add data sourseからprometheus(導入済みであること)を選択

2)以下、prometheusに関連した詳細なパラメータを記載していく

ArgoCD(導入方法)2024年版

k8sの環境を新しくしたので、ArgoCDの導入も新規で行う必要があるので記載を行う

以前に利用したyamlで書いたargoCDのマニフェストをデプロイしてもgithubにあるprometheus,grafanaのリポジトリからデプロイが出来なくなったので（どハマりエラー連発したので）、今回は、helm経由でargoCDの導入を行ってみる。

Lensという、k8sの管理ツールを使ってhelmでの導入を行ってみる

①赤く枠に示した、argo-cdを選択

②導入対象のバージョンとNameSpaceの指定を選択

③インストールを実施

実施にArgoCDにて、ログインして対象のミドルがデプロイできるか検証してみる。

1)ArgoCDにログインする（ログイン手順は割愛）

2)Settings > Repositoryを選択

3)CONNECT REPOを選択

4)以下の設定を記載

①VIA HTTPSを選択

②git

③default

④対象のリポジトリのURLを記載

⑤gihthubのIDを入力

⑥githubで作成したクレデンシャル（設定方法は、そこら辺に情報が転がっているので割愛）

⑦CONNECTを押す

5)ArgoCDからgithubのリポジトリ取得からデプロイまでの行う

①上記で作成したリポジトリ一覧から対象（今回、proetheus）を選択

②Create Applicationを選択

6)詳細な設定をおこなっていく

①アプリ名を記載

②default

③特に指定がないなら　" . " (ドット)の記載で良いです

④選択肢から選ぶだけです

⑤対象のname spaceを選ぶ

⑥CREATEを押す

ここままだと、デプロイまでしてくれなかったと思うので

SYNCボタンを押すことで完了すると思います

Custom Resourcesの強制削除対策

argoCD用のCustom Resourcesが削除できなくなった経緯から強制的に削除する方法を記載してみる

1. Finalizer の確認

ArgoCD のアプリケーションは、finalizer が設定されていると削除時にブロックされることがあります。finalizer を手動で削除することで、削除が進むことがあります。

アプリケーションの詳細を確認します。

kubectl get applications.argoproj.io prometheus --namespace=default -o yaml

finalizers セクションを探し、その内容を確認します。

finalizer を削除するために、次のコマンドを実行します。

kubectl patch applications.argoproj.io prometheus --namespace=default -p '{"metadata":{"finalizers":null}}' --type=merge

このコマンドで finalizers を削除し、その後に削除を実施。

2. 強制削除の実行

kubectl delete コマンドに --force オプションを追加して、強制的に削除。

kubectl delete applications.argoproj.io prometheus --namespace=default --force --grace-period=0

強制削除はリソースのクリーンアップ処理をスキップするため、データの不整合や依存関係の問題が発生する可能性があるので注意

3. リソースのロックを確認

ArgoCD が内部でリソースをロックしている場合があります。argocd-application-controller のログを確認して、リソースの削除に関するエラーや警告が出ていないか確認。

kubectl logs -n argocd <argocd-application-controller-pod-name>

ログの中にエラーや警告が含まれている場合、それに基づいて対策を検討します。

4. API サーバーの状況を確認

クラスタ全体の API サーバーや kubectl の動作が正常であることを確認します。場合によっては、API サーバーの問題でリソースの削除が遅延することがあるらしい。

kubectl get pods --all-namespaces

すべてのポッドが正常に動作していることを確認し、異常がある場合は対応すること

5. リソースの依存関係を確認

アプリケーションに関連するリソース（例えば、ConfigMap や Secret）が残っている場合、これらのリソースが削除されていないことが原因で削除処理が進まないことがあります。これらの関連リソースも確認し、手動で削除。

kubectl get configmaps,secret --namespace=default

必要に応じて、手動で削除します。


kubectl delete configmap <name> --namespace=default
kubectl delete secret <name> --namespace=default

これらの対策を順に試して、アプリケーションを削除してみてください。

2024年12月2日月曜日

k8s(ラベリング)

対象のノードに、workerというラベルをつける場合

以下のコマンドで編集を行う必要がある。

kubectl label nodes [ノード名] node-role.kubernetes.io/worker=

登録: 投稿 (Atom)