Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronization Issue between gcsfuse and Kubernetes Pod, When Application running on pods writes to GCS Bucket. #320

Open
raviprakash007 opened this issue Aug 2, 2024 · 3 comments
Labels
question Further information is requested

Comments

@raviprakash007
Copy link

raviprakash007 commented Aug 2, 2024

I have a pod which is running a python application with uwsgi service. The uwsgi service writes the logs to pod's /tmp/logs folder.

I have mounted the GCS bucket mapped with /tmp/logs folder so that every log file could go to GCS bucket.

Everything working as expected, but the log files are not visible when I jump to GCS bucket storage pages. However, if i created something manually (like touch a.txt) , it gets visible instantly on web page of GCS buckets.

I entered into POD and I updated the logs by writing something manually and it got reflected on GCS bucket page, with logs till that time. but newer lines in the logs are are again not visible in buckets page. even after refreshing the page.

Can someone assist.?

The configs are as follows:
PV

apiVersion: v1
kind: PersistentVolume
metadata:
  name: igcs-data-pv
  namespace: mynamespace
spec:
  storageClassName: cloud-data
  claimRef:
    namespace: mynamespace
    name: gcs-data-claim
  mountOptions:
    - implicit-dirs
    - dir-mode=777
    - file-mode=777
    - only-dir=my_subfolder_for_log_storage   <-- mounted this one with gcs-fuse
  capacity:
    storage: 5Gi 
  accessModes:
    - ReadWriteMany
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: app-application-logs
    volumeAttributes:
      gcsfuseLoggingSeverity: warning

PVC

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: gcs-data-claim
 namespace: mynamespace
spec:
  volumeName: gcs-data-pv
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
  storageClassName: cloud-data

Deployment Snippet

---
volumeMounts:
        - mountPath: /api_service_mount
          name: uwsgi-service-claim
        - mountPath: /tmp/logs/   <--- uwsgi Application writes here 
          name: logger
          subPath: logs
--
--
volumes:
      - name: uwsgi-service-claim
        persistentVolumeClaim:
          claimName:  gcs-data-another-bucket-claim
      - name: logger
        persistentVolumeClaim:
          claimName: gcs-logger-claim  <--

Expected Result: the logs files created , should be reflecting automatically , if something created manaully is reflecting to GCS buckets web page.

@ankitaluthra1
Copy link

ankitaluthra1 commented Aug 6, 2024

It appears that the issue is the application not flushing the log file since the file is still open as logs are being added to it. Data in log file will be uploaded to GCS bucket only when flushFile or syncFile is called from kernel/application writing to the file. Flush/Sync file occurs when the file is closed or when applications explicitly trigger sync/flush calls using os.Sync().

It can be verified by analyzing the gcsfuse logs. Follow these two steps to obtain and analyze the gcsfuse logs:

  • Turn on trace logs in gcsfuse config(see reference).
  • Look for FlushFile call in the logs similar to the one shown below, till this call is issued to gcsfuse, data will not be written to gcs, it will remain at temporary location on the machine.
{"timestamp":{"seconds":1722921973,"nanos":83035525},"severity":"TRACE","message":"fuse_debug: Op 0x00000292        connection.go:513] -> OK ()"}
{"timestamp":{"seconds":1722921973,"nanos":239823811},"severity":"TRACE","message":"fuse_debug: Op 0x00000294        connection.go:420] <- FlushFile (inode 8, PID 208049)"}

This is the intended behaviour, its upto application to trigger flush/sync, GCSFuse cannot sync without external trigger. One option can be to add log rotation in application with a shorter threshold. Log rotation frameworks automatically close old log files and create new ones after reaching threshold, thereby triggering the flush call to GCSFuse for closed files.

@raviprakash007
Copy link
Author

I am checking logs after enabling debug, then will revert on this thread.

@ashmeenkaur
Copy link

@raviprakash007 Just wanted to check in on this issue. I know you mentioned you were looking into the logs after enabling debug. Any updates on that? Are you still running into the same problem?

Let me know if you need any help!

@songjiaxun songjiaxun added the question Further information is requested label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants