Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procdump -c does not work in k8s #240

Open
ximi522 opened this issue Apr 1, 2024 · 4 comments
Open

Procdump -c does not work in k8s #240

ximi522 opened this issue Apr 1, 2024 · 4 comments

Comments

@ximi522
Copy link

ximi522 commented Apr 1, 2024

Expected behavior

In a Kubernetes environment, when using procdump with the command 'procdump -c 10 -s 1 -w XXX', it doesn't generate a dump file when the CPU usage of the pod exceeds 10%. This might be because procdump monitors the CPU usage of the host machine instead of the pod itself. Could you consider adding monitoring for the pod's CPU and memory usage in future versions? It would greatly assist in troubleshooting .NET applications in Kubernetes.

System information (e.g., distro, kernel version, etc.)

pod docker image based on mcr.microsoft.com/dotnet/aspnet:7.0-bullseye-slim-amd64.

@MarioHewardt
Copy link
Collaborator

Hi - thanks for the feedback. I wrote a post on this a while back. Let me know if that helps answer your question and if not, please don't hesitate to reach back out.

https://medium.com/@marioh_78322/sysinternals-procdump-for-linux-and-cloud-native-applications-404d0351f1ea

@ximi522
Copy link
Author

ximi522 commented Apr 2, 2024

I have deployed my pod following the instructions in this post (https://medium.com/@marioh_78322/sysinternals-procdump-for-linux-and-cloud-native-applications-404d0351f1ea) and monitored my process using
procdump -c 10 -m 200 -s 1 -w GMTools /dump-data. However, when I tested the CPU load exceeding 10%, procdump did not generate the expected dump. The article did not attempt to monitor the CPU threshold using -c in the pod, so I suspect that there might be an issue in obtaining the correct CPU load in a Docker environment.
img_v3_029i_32766050-ae38-4e89-8679-51507c2b070g
ae70d914-2524-45eb-970e-75fb13f26707
`RUN apt-get update &&
apt-get install -y wget
RUN wget -q https://packages.microsoft.com/config/ubuntu/22.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
RUN dpkg -i packages-microsoft-prod.deb
RUN apt-get update &&
apt-get install -y procdump &&
apt-get clean

WORKDIR /app

ENTRYPOINT ["./start.sh"]`

start.sh:
#!/bin/bash procdump -c 1 -m 200 -s 1 -w GMTools /dump-data

@MarioHewardt
Copy link
Collaborator

Thanks for the detailed information. Could you add the -log switch to the procdump command line? This will send extended logging to syslog. Please share the procdump related log entries (there can be quite a few).

@ximi522
Copy link
Author

ximi522 commented Apr 19, 2024

I discovered while reading the code that the CPU usage is obtained and calculated from /proc/[pid]/stat. However, in a docker environment, the CPU usage obtained here is relative to the CPU of the actual host machine, which is not very meaningful for program monitoring. We would rather obtain the CPU usage relative to this docker container. I found a method to obtain the CPU usage in a docker container by reading this article [https://chengdol.github.io/2021/09/19/k8s-container-mem-cpu/], and I have written a shell script based on it for reference.

#!/bin/bash
while true; do
    # get dotnet process id
    pid=$DOTNET_PID
    # get dotnet process cgroup path
    cgroup_path=/proc/$pid/root/sys/fs/cgroup
    # check if cgroup path exists
    if [ ! -d $cgroup_path ]; then
        sleep 1
        continue
    fi
    # cpu, cpuacct dir are softlinks
    # cpuacct.stat:
    # Reports the total CPU time in nanoseconds
    # spent in user and system mode by all tasks in the cgroup.
    utime_start=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep user | awk '{print $2}')
    stime_start=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep system | awk '{print $2}')
    sleep 1
    utime_end=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep user | awk '{print $2}')
    stime_end=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep system | awk '{print $2}')
    # getconf CLK_TCK aka sysconf(_SC_CLK_TCK) returns USER_HZ
    # aka CLOCKS_PER_SEC which seems to be always
    # 100 independent of the kernel configuration.
    HZ=$(getconf CLK_TCK)
    
    # get cpu core number
    cfs_quota_us=$(cat $cgroup_path/cpu/cpu.cfs_quota_us)
    cfs_period_us=$(cat $cgroup_path/cpu/cpu.cfs_period_us)
    cpu_core_num=$((cfs_quota_us/cfs_period_us))

    # get container cpu usage
    # on top of user/system cpu time
    cpu_percent=$(( (utime_end+stime_end-utime_start-stime_start)*100/HZ/cpu_core_num ))
    
    # memory in Mib: used - inactive(cache)
    used=$(cat $cgroup_path/memory/memory.usage_in_bytes)
    inactive=$(grep -w inactive_file $cgroup_path/memory/memory.stat | awk {'print $2'})
    # numfmt: readable format

    mem_usage=$(cat $cgroup_path/memory/memory.usage_in_bytes)
    total_mem=$(cat $cgroup_path/memory/memory.limit_in_bytes)
    # local memory info
    local_mem_usage=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
    local_total_mem=$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
    mem_percent=$(echo "scale=2; ($mem_usage + $local_mem_usage) * 100 / ($total_mem + $local_total_mem)" | bc)
    
    if (( $(echo "$cpu_percent > $CPU_THRESHOLD" | bc -l) )) || (( $(echo "$mem_percent > $MEM_THRESHOLD" | bc -l) )); then
        if [ ! -f "/app/create_dump.lock" ];then
            echo $cpu_percent $mem_percent
            echo $(($used)) | numfmt --to=iec
            echo $(($total_mem)) | numfmt --to=iec
            ./procdump -pgid $pid /app/dump
            touch /app/create_dump.lock
        fi
    fi

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants