-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker can't find iHD_drv_video.so #223
Comments
Is that with the worker having the |
Yes, it is. Here's the relevent ConfigMap:
|
This issue is stale because it has been open for 30 days with no activity. |
I'm having the same issue. Logging into the container, it looks like Plex isn't "fully-installed" there should be a cache with the extensions in those folders. See this reddit discussion, as it's the same error. https://www.reddit.com/r/PleX/comments/12ikwup/plex_docker_hardware_transcoding_issue/ |
What's odd to me is that local transcode works, its only on the remote workers that they fail. |
@kenlasko @pabloromeo Ok, I got it working. The clue was the fact that Plex didn't have it's config directory setup in the worker nodes. Plex needs it's configuration otherwise it's going to fail because Plex basically isn't setup. Here's how I fixed it:
Here's what my two files look like, though yours will look different depending on storage. Clusterplex-worker apiVersion: apps/v1
kind: StatefulSet
metadata:
name: clusterplex-worker
labels:
app.kubernetes.io/name: clusterplex-worker
app.kubernetes.io/part-of: clusterplex
spec:
serviceName: clusterplex-worker-service
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: clusterplex-worker
app.kubernetes.io/part-of: clusterplex
template:
metadata:
labels:
app.kubernetes.io/name: clusterplex-worker
app.kubernetes.io/part-of: clusterplex
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
name: clusterplex-worker
topologyKey: kubernetes.io/hostname
weight: 100
- podAffinityTerm:
labelSelector:
matchLabels:
name: clusterplex-pms
topologyKey: kubernetes.io/hostname
weight: 50
containers:
- name: plex-worker
image: lscr.io/linuxserver/plex:latest
startupProbe:
httpGet:
path: /health
port: 3501
failureThreshold: 40
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 3501
initialDelaySeconds: 60
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /health
port: 3501
initialDelaySeconds: 10
timeoutSeconds: 10
ports:
- name: worker
containerPort: 3501
envFrom:
- configMapRef:
name: clusterplex-worker-config
volumeMounts:
- name: data
mountPath: /data
- name: codecs
mountPath: /codecs
- name: data
mountPath: /transcode
- name: config
mountPath: /config
resources: # adapt requests and limits to your needs
requests:
cpu: 500m
memory: 200Mi
limits:
gpu.intel.com/i915: 1
volumes:
- name: data
persistentVolumeClaim:
claimName: "plex-media"
- name: config
persistentVolumeClaim:
claimName: "clusterplex-config-pvc"
# - name: transcode
# persistentVolumeClaim:
# claimName: "plex-media"
volumeClaimTemplates:
- metadata:
name: codecs
labels:
app.kubernetes.io/name: clusterplex-codecs-pvc
app.kubernetes.io/part-of: clusterplex
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
# specify your storage class
storageClassName: longhorn clusterplex-config-pvc apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: clusterplex-config-pvc
labels:
app.kubernetes.io/name: clusterplex-config-pvc
app.kubernetes.io/part-of: clusterplex
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: "10Gi"
# specify your storage class
storageClassName: longhorn |
I see! Yeah, the fact that Plex is not set up in the Workers is actually intentional. It shouldn't really be necessary, since the intention is to really only use the Plex transcoder (their fork from FFmpeg), without actually interacting with the local plex files. We use their base image to avoid redistributing their own transcoder ourselves, but plex doesn't really run on the worker. The reason we don't recommend sharing Plex's config in that way, using shares, is because Plex uses SQLLite as a database, which does not play well with network shares. And Longhorn's RWX is implemented with NFS behind the scenes. So you might end up corrupting the database or seeing odd issues. I'll see if I can set up a physical environment similar to yours, to see if there's a way around that. Maybe driver paths must be rewritten or something like that. I know others are running it with intel drivers on k8s, but I'm not aware if they had to do this same workaround or not. |
@pabloromeo excellent, I've been thinking about potential issues with my setup and what you've said makes sense. I'll try to see if I can do just the cache. |
I mounted Plex config in a different directory, then exec'd into the container and copied just the cache. No go, it throws errors.
After that, I copied everything from the temp folder and hardware transcoding works fine. We might actually be running into something to do with Plex having to be on premium and have a claim token to run hw transcoding. Another formulation I tried, adding the plex config as readonly, unfortunately the workers can't start because they can't run the fix permissions scripts that happen on start. |
This issue is stale because it has been open for 30 days with no activity. |
I am doing a helm chart deployment and ran into the issue. I already had to customize the charts to use env in the config for HW transcoding variable for workers, so I customized it to include the config and it no longer errors too. Not too knowledgeable on editing helm charts nor Plex but what if we make the directory or files with the sqlite DBS to be mounted read only? |
Hello, I just started using this and came across this issue while verifying settings for HW Transcode on my NUC cluster. Thanks for finding this issue before I experienced it :) @todaywasawesome , I noticed the iHD_drv_video.so you referenced wasnt actually in the Plex Media Server/Cache, but linked to it from Plex Media Server/Drivers/imd-74-linux-x86_64/dri/iHD_drv_video.so'. To get around the issue with both sharing the Cache and Drivers folders with the workers, as ReadOnly, but excluding other config so as not to disturb the DB, I have:
Additional Cache and Driver PVC
Worker: (PMS is the same excluding the readOnly; true on the spec.volumes)
Folders mounted inside Worker. Touch test for RO verify.
Remote VAAPI Transcode Success:
Hope this helps |
@audiophonicz that's an extremely clever approach, love it! :) Now, I've finally set up a similar environment to test this, and have also been seeing the same issue, as well as trying to identify a few workarounds. If that's the case, then there may be one other alternative approach, that doesn't depend on sharing Drivers and the Cache between PMS and the workers. And it's to initialize PMS on the workers at startup and then kill it once the local config has been created (I believe the linuxserver image does something along those lines too) so that the drivers for its hardware are downloaded. If this actually works I may add an additional optional parameter to force a PMS initialization on the Workers, but the default will be to not do it, to avoid breaking working installations as the ones mentioned above here. Now, question to you @audiophonicz and @todaywasawesome, when HW Transcoding on the worker with your working setups, does Plex show that it's being transcoded by HW or is it obvlivious to it? In my initial test it's just saying "Transcode" not "Transcode (hw)". |
@pabloromeo It's been transcode (hw) for me. Making sure to mount the needed hardware of course. I do have a concern that it might be limiting based on license. HW is a pro feature, so if Plex doesn't initialize as pro, it wouldn't enable HW transcoding. Might be able to use a claim key. |
So, weird Update: my method works but ONLY if the worker container is on the same physical node as the pms container. Theres no difference in the logs until it actually connects and starts to stream, then the remote workers simply kill the child process. I can even see the tile flash up in the pms Dashboard for half a second, then it disappears and tries another worker. when it finally gets to the worker on the same physical node, the logs pick up from the "segment:chunk-00000" and start playing.
@todaywasawesome can you confirm you can transcode on a worker container on a different physical node than pms when sharing the entire config? I'm thinking youre right about the PlexPass thing, and mine is matching the IP or something and only allowing it on the same node. @pabloromeo |
Can you check the logs on the workers? That might shed some light on what's going on. Regarding plex Pass it's hard to say how they validate it. |
I'll share my logs soon. My cluster is down for ISP issues ATM. |
TL;DR; Long: Anyways, I made some progress on my remote HW transcodes. Providing just the drivers for HW transcode doesnt seem to be enough, as it would only work on the same machine as my pms pod. Seeing that it seemed to work for todaywasawesome by sharing the whole config dir, which happens to contain a token file and the preferences.xml with the machineid UUID, i tried his method, and was riddled with SQLite db slow; waiting or some such logs. So, I flipped my original method and created a single additional PVC just for the databases in the /Plugin Support/ folder to essentially carve them out of the main /config folder and it seems to have worked. I am currently playing 7 plays simultaneously across 3 workers: I apparently have a bunch of devices that cant HW decode HEVC10 and it really pushes my little i3-6100U nodes, so they take a good 30-45 seconds to start playing, but it does work. Every now and then one play will freeze or fail and need to retry (pretty sure its HEVC10 playing havok), but for the most part auto-play next and seeks are working as well. 99% of my stuff is H264, I only found 1 with HEVC8 and HEVC10 so I should be good with this setup. I do still want to try to separate out the pid file so the worker isnt constantly deleting and overwriting each others pid file. It doesnt seem to hurt right now but its not optimal. |
The workaround I tried is copying the folder over manually from a temp config directory to the config directory. That way the worker can do whatever it wants with the local db, it's trashed anyway. Not great still. |
Ok guys I need some insight here. I still for the life of me cant get a worker to play if its not on the same node as PMS. its driving me mad. Weird thing is, if both PMS and Worker-0 are on NODE1, Direct Plays will Direct Play, and Transcodes will Transcode, HW or SW, life is good. If i simply move PMS to NODE2 while Worker-0 is on NODE1, all plays break. Direct Plays try to Transcode, and all Transcodes fail. Its not the /config dir. its not the /transcode or /codecs RWX speeds. Its purely on the same host or not, and I cant figure out what its using. My only idea left is that the transcode job is using https://127.0.0.1 for the video transcode sessions and its not translating across pods/nodes:
|
Plex definitely uses a loopback network for transcodes. On my freebsd plex jail if I don't give it a loopback address direct plays are fine but transcodes fail. (regardless of whether it needs to transcode audio or video). The address I give it is not 127.0.0.1 but it finds it okay. If direct plays aren't working for you I'm not sure if this is the same problem but it very well might be. Also maybe the direct play you tested was transcoding audio? |
Honestly I think this is probably a different issue and perhaps plex network configuration? - this issue is just about hardware transcoding failing, if you're not getting workers to transcode at all thats a more root problem |
Thank you for your reply but my question is specifically about HW transcoding across physically separate kubernetes nodes, and Im not sure how a freebsd jail pertains. I do not see anywhere in this chart for transcode network settings, so I am not sure what this "it" you are giving a loopback address is. Still looking for someone who has HW transcoding working across two physically separate nodes and what your plex network settings are for subnets and URL. |
Sorry for the confusion the long and short of it is yes, thats where plex communicates with the transcoder. The transcoder stub here remaps that to a different container, and the nginx proxy passes it back in. If direct play, and sw transcoding also are failing your issue isn't really about HW transcoding.. it's something else you have broken in the orchestration of the transcoder requests. |
Same issue here (Dockermod on unprivileged LXC on Proxmox). Mounting Thanks ! |
Remapping just drivers and cache as RWX across pms and the workers fixed this issue for me. |
This issue is stale because it has been open for 30 days with no activity. |
Here to report a different setup that suffers from the same issue: NAS Host with transcode and media shares exposed over NFS Separate host running a docker-compose stack of one PMS instance, one worker, and one orchestrator. (no swarm). Transcode and Media directories mounted over NFS as instructed (Read and Write). Worker HW transcode fails (intel igpu), while "local" HW transcode succeeds (same physical intel igpu) |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I'm leaving this issue Open, as I'm hoping I'll eventually have time to try to make this easier to set up. I get the feeling something that might help is to have the Plex install that is within the Workers initialize during startup, just so that it download the appropriate drivers for that particular instance. Has anybody gotten HW transcoding to work simultaneously on for example and NVIDIA GPU on one worker while having an Intel iGPU on another? |
Similar situation here: PMS (with qsv hardware transcoding) & Orchestrator on one host What I noticed was that even though I set FFMPEG_HWACCEL to false, in the worker logs I still see this coming as the request:
With disabling FFMPEG_HWACCEL, my expectation was this going to get the worker to do a CPU transcode. Is there something wrong in my config? On the PMS initially, it is expecting to use the iGPU (and would normally see transcode (hw)) - is the fact that it started PMS is configured for hw transcode mean that when the job is passed to the orchestrator, it can't go to a software only worker? edit: nevermind, everything is fine after I did what the two posters above did: shared and mounted the Cache and Drivers folders to the workers as well |
I have been unable so far to get this working in a mixed environment: Server 1: Running Plex + Orchestrator dockers. Has intel iGPU capability Nodes 1 & 2 work without issue. Node 3 looks like its working fine based on Plex / Orchestrator / Worker logs, but on the Client side, no video is delivered (and evenutally I have to stop the video). The end of the log below happens before I even hit 'stop' on the client. Here are the relevant excerpts from the clusterplex_worker log:
|
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
bringing this back up as I'm suffering similar issues. Is there a suggested work around for the remote workers to get the cache/drivers folder for the gpu drivers? Edit: Looked like @audiophonicz solution worked for me as well. |
This issue is stale because it has been open for 30 days with no activity. |
Describe the bug
When trying to play a transcoded video via a worker, the video fails to play. Worker logs indicate it cannot find iHD_drv_video.so. When I disable ClusterPlex and just use my "normal" PMS pod, HW transcoding works fine.
Intel GPU drivers are installed via Intel device plugins Helm chart: https://intel.github.io/helm-charts/
Same issue happens when using either standard Plex image with DOCKER_MOD or the ClusterPlex image
Relevant log file for worker:
The /config/Library/Application Support/ folder is empty, so it explains why it can't find the driver. Tried placing the driver that I pulled off the Plex server in the codecs PV, but no difference.
Environment
K3S v1.26.5+k3s1
Nodes are Beelink U59's with Intel N5105 processor
The text was updated successfully, but these errors were encountered: