Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seldon Pipeline Inspect fails with error Failed to resolve 'seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092' #5176

Open
stephaniegaspar opened this issue Dec 20, 2023 · 1 comment
Labels

Comments

@stephaniegaspar
Copy link

Currently we have installed Seldon Core v2 with version 2.6.0 on a k8s cluster. Based on the installation instructions of kafka on our k8s cluster, we followed the recommendation using Strimzi Operator with all default values from helm chart strimzi-kafka-operator (version 0.35.1) and all default values from seldon's helm chart seldon-core-v2-kafka (version 0.1.0). All control plane and data plane operations are running with the security protocol PLAINTEXT.

The goal of this issue is to find out what is the best solution to inspect output data from each model within a Pipeline.

Describe the bug

We built the seldon cli image on our machine and we created a custom configuration on /home/.config/seldon/cli with the following:

{
    "kafka": {
      "bootstrap": "localhost:9092",
      "namespace": "seldon-mesh",
      "protocol": "PLAINTEXT",
      "sasl":{
        "username": "seldon",
        "password": ""
      }
    },
    "dataplane": {
      "inferHost": "localhost:9000"
    },
    "controlplane": {
      "schedulerHost": "localhost:9004"
    }
}

We're port forwarding seldon-mesh, seldon-scheduler and seldon-kafka-bootstrap services to those ports configured above.

We followed the example described here.

We can make an inference through seldon cli, but when we try to do the command seldon pipeline inspect tfsimples the following error appears on console:

%3|1703095841.652|FAIL|rdkafka#consumer-1| [thrd:seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092/0]: seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092/0: Failed to resolve 'seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092': nodename nor servname provided, or not known (after 2ms in state CONNECT)
%3|1703095841.652|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092: Failed to resolve 'seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092': nodename nor servname provided, or not known (after 2ms in state CONNECT)
%4|1703095841.652|OFFSET|rdkafka#consumer-1| [thrd:main]: seldon.seldon-mesh.model.tfsimple1.inputs [0]: offset reset (at offset TAIL(1) (leader epoch -1), broker 0) to offset END (leader epoch -1): failed to query logical offset: Local: Host resolution failure
Error: seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092/0: Failed to resolve 'seldon-kafka-0.seldon-kafka-brokers.seldon-mesh.svc:9092': nodename nor servname provided, or not known (after 2ms in state CONNECT)

We've debugged the code of seldon cli and discovered the error is returned on this line.

Related bug encountered with the same error: #4776.

Expected behaviour

We were expecting to return the pipeline kafka topics with a JSON format of data outputs of those kafka partitions.

Environment

  • OS: macos Ventura 13.6.3
  • Architecture: ARM64
  • Cloud Provider: Azure
  • Kubernetes Cluster Version: v5.0.1
  • Deployed Seldon System Images: seldonv2-controller:2.6.0
@lc525
Copy link
Member

lc525 commented Mar 8, 2024

Just a note here: the kafka bootstrap server, which you are pointing to, returns a list of hostnames pointing to the actual kafka brokers. Because those are not visible/accessible from where you are running the seldon cli, the cli cannot connect to them and the error you've described pops up.

In other words, it's not sufficient to simply expose the kafka bootstrap server via a port forward.

I don't think this is a bug per-se, but we'll take it as an improvement request that the seldon pipeline inspect command should work better with the kafka install from within a k8s cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants