Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scans are slow every 5 minutes, with a single retry #1156

Open
desultory opened this issue Apr 11, 2024 · 13 comments
Open

Scans are slow every 5 minutes, with a single retry #1156

desultory opened this issue Apr 11, 2024 · 13 comments

Comments

@desultory
Copy link

desultory commented Apr 11, 2024

snmp_exporter version: output of snmp_exporter -version

snmp_exporter | ts=2024-04-11T15:09:38.160Z caller=main.go:194 level=info msg="Starting snmp_exporter" version="(version=0.25.0, branch=HEAD, revision=9c42d6c874d479314e612bca69558c81f8e26287)" concurrency=1
snmp_exporter | ts=2024-04-11T15:09:38.160Z caller=main.go:195 level=info build_context="(go=go1.21.5, platform=linux/amd64, user=root@880115266f70, date=20231210-10:05:18, tags=netgo)"

What device/snmpwalk OID are you using?

IF-MIB:

oid=1.3.6.1.2.1.2

If this is a new device, please link to the MIB(s).

It's a Netgear GS324TP S350

What did you do that produced an error?

Nothing different than what doesn't produce an error.

What did you expect to see?

snmp_exporter  | ts=2024-04-11T15:11:24.535Z caller=collector.go:460 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Starting scrape"
snmp_exporter  | ts=2024-04-11T15:11:24.535Z caller=collector.go:214 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walking subtree" oid=1.3.6.1.2.1.2
snmp_exporter  | ts=2024-04-11T15:11:24.731Z caller=collector.go:228 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walk of subtree completed" oid=1.3.6.1.2.1.2 duration_seconds=195.774683ms
snmp_exporter  | ts=2024-04-11T15:11:24.731Z caller=collector.go:214 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walking subtree" oid=1.3.6.1.2.1.31.1.1
snmp_exporter  | ts=2024-04-11T15:11:24.917Z caller=collector.go:228 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walk of subtree completed" oid=1.3.6.1.2.1.31.1.1 duration_seconds=186.542915ms
snmp_exporter  | ts=2024-04-11T15:11:24.935Z caller=collector.go:464 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Finished scrape" duration_seconds=0.400332611

What did you see instead?

# HELP snmp_scrape_duration_seconds Total SNMP time scrape took (walk and processing).
# TYPE snmp_scrape_duration_seconds gauge
snmp_scrape_duration_seconds{module="if_mib"} 5.410868688
# HELP snmp_scrape_packets_retried Packets retried for get, bulkget, and walk.
# TYPE snmp_scrape_packets_retried gauge
snmp_scrape_packets_retried{module="if_mib"} 1
# HELP snmp_scrape_packets_sent Packets sent for get, bulkget, and walk; including retries.
# TYPE snmp_scrape_packets_sent gauge
snmp_scrape_packets_sent{module="if_mib"} 58
# HELP snmp_scrape_pdus_returned PDUs returned from get, bulkget, and walk.
# TYPE snmp_scrape_pdus_returned gauge
snmp_scrape_pdus_returned{module="if_mib"} 1366
# HELP snmp_scrape_walk_duration_seconds Time SNMP walk/bulkwalk took.
# TYPE snmp_scrape_walk_duration_seconds gauge
snmp_scrape_walk_duration_seconds{module="if_mib"} 5.392635394

This result was taken from a different response, so the times differ a bit, but each query with a spike says one retried packet.

snmp_exporter  | ts=2024-04-11T15:14:24.534Z caller=collector.go:460 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Starting scrape"
snmp_exporter  | ts=2024-04-11T15:14:24.534Z caller=collector.go:214 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walking subtree" oid=1.3.6.1.2.1.2
snmp_exporter  | ts=2024-04-11T15:10:29.763Z caller=collector.go:228 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walk of subtree completed" oid=1.3.6.1.2.1.2 duration_seconds=5.228701568s
snmp_exporter  | ts=2024-04-11T15:10:29.763Z caller=collector.go:214 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walking subtree" oid=1.3.6.1.2.1.31.1.1
snmp_exporter  | ts=2024-04-11T15:10:29.962Z caller=collector.go:228 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Walk of subtree completed" oid=1.3.6.1.2.1.31.1.1 duration_seconds=199.234426ms
snmp_exporter  | ts=2024-04-11T15:10:29.980Z caller=collector.go:464 level=debug auth=netgear_switch target=172.16.99.2 module=if_mib msg="Finished scrape" duration_seconds=5.446228759

When I check the response, I can see that it says it retied a single packet, but it's not clear what was retried. The logs don't make any mention of what failed. Similar behavior is being observed with a Lenovo Xclarity BMC, also happening every 5 minutes with a single retry.

@desultory
Copy link
Author

image
it could be something related to the device? maybe it does something on 5m intervals that slow things down. I've been watching the network and I don't see anything unusual happening at that time, and mostly see the exporters talking to the device. There's a small firmware revision available for that device, but it has no mention of SNMP changes or anything that could affect this.

@SuperQ
Copy link
Member

SuperQ commented Apr 11, 2024

Yes, that very much seems like something about the device.

Best you can do is maybe tcpdump/wireshark the scrape to compare send and receive times to look for a response latency problem with the device.

@desultory
Copy link
Author

Yes, that very much seems like something about the device.

Best you can do is maybe tcpdump/wireshark the scrape to compare send and receive times to look for a response latency problem with the device.

I agree, but debug info about which oid was retried would be great info here.

@desultory
Copy link
Author

I'm using a cut down version of the IF-MIB config, not sure if that is impacting this somehow:

  if_mib:
    walk:
    - 1.3.6.1.2.1.2
    - 1.3.6.1.2.1.31.1.1
    metrics:
    - name: ifNumber
      oid: 1.3.6.1.2.1.2.1
      type: gauge
      help: The number of network interfaces (regardless of their current state) present
        on this system. - 1.3.6.1.2.1.2.1
    - name: ifIndex
      oid: 1.3.6.1.2.1.2.2.1.1
      type: gauge
      help: A unique value, greater than zero, for each interface - 1.3.6.1.2.1.2.2.1.1
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifType
      oid: 1.3.6.1.2.1.2.2.1.3
      type: EnumAsInfo
      help: The type of interface - 1.3.6.1.2.1.2.2.1.3
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
      enum_values:
        1: other
        2: regular1822
        3: hdh1822
        4: ddnX25
        5: rfc877x25
        6: ethernetCsmacd
        7: iso88023Csmacd
        8: iso88024TokenBus
        9: iso88025TokenRing
        10: iso88026Man
        11: starLan
        12: proteon10Mbit
        13: proteon80Mbit
        14: hyperchannel
        15: fddi
        16: lapb
        17: sdlc
        18: ds1
        19: e1
        20: basicISDN
        21: primaryISDN
        22: propPointToPointSerial
        23: ppp
        24: softwareLoopback
        25: eon
        26: ethernet3Mbit
        27: nsip
        28: slip
        29: ultra
        30: ds3
        31: sip
        32: frameRelay
        33: rs232
        34: para
        35: arcnet
        36: arcnetPlus
        37: atm
        38: miox25
        39: sonet
        40: x25ple
        41: iso88022llc
        42: localTalk
        43: smdsDxi
        44: frameRelayService
        45: v35
        46: hssi
        47: hippi
        48: modem
        49: aal5
        50: sonetPath
        51: sonetVT
        52: smdsIcip
        53: propVirtual
        54: propMultiplexor
        55: ieee80212
        56: fibreChannel
        57: hippiInterface
        58: frameRelayInterconnect
        59: aflane8023
        60: aflane8025
        61: cctEmul
        62: fastEther
        63: isdn
        64: v11
        65: v36
        66: g703at64k
        67: g703at2mb
        68: qllc
        69: fastEtherFX
        70: channel
        71: ieee80211
        72: ibm370parChan
        73: escon
        74: dlsw
        75: isdns
        76: isdnu
        77: lapd
        78: ipSwitch
        79: rsrb
        80: atmLogical
        81: ds0
        82: ds0Bundle
        83: bsc
        84: async
        85: cnr
        86: iso88025Dtr
        87: eplrs
        88: arap
        89: propCnls
        90: hostPad
        91: termPad
        92: frameRelayMPI
        93: x213
        94: adsl
        95: radsl
        96: sdsl
        97: vdsl
        98: iso88025CRFPInt
        99: myrinet
        100: voiceEM
        101: voiceFXO
        102: voiceFXS
        103: voiceEncap
        104: voiceOverIp
        105: atmDxi
        106: atmFuni
        107: atmIma
        108: pppMultilinkBundle
        109: ipOverCdlc
        110: ipOverClaw
        111: stackToStack
        112: virtualIpAddress
        113: mpc
        114: ipOverAtm
        115: iso88025Fiber
        116: tdlc
        117: gigabitEthernet
        118: hdlc
        119: lapf
        120: v37
        121: x25mlp
        122: x25huntGroup
        123: transpHdlc
        124: interleave
        125: fast
        126: ip
        127: docsCableMaclayer
        128: docsCableDownstream
        129: docsCableUpstream
        130: a12MppSwitch
        131: tunnel
        132: coffee
        133: ces
        134: atmSubInterface
        135: l2vlan
        136: l3ipvlan
        137: l3ipxvlan
        138: digitalPowerline
        139: mediaMailOverIp
        140: dtm
        141: dcn
        142: ipForward
        143: msdsl
        144: ieee1394
        145: if-gsn
        146: dvbRccMacLayer
        147: dvbRccDownstream
        148: dvbRccUpstream
        149: atmVirtual
        150: mplsTunnel
        151: srp
        152: voiceOverAtm
        153: voiceOverFrameRelay
        154: idsl
        155: compositeLink
        156: ss7SigLink
        157: propWirelessP2P
        158: frForward
        159: rfc1483
        160: usb
        161: ieee8023adLag
        162: bgppolicyaccounting
        163: frf16MfrBundle
        164: h323Gatekeeper
        165: h323Proxy
        166: mpls
        167: mfSigLink
        168: hdsl2
        169: shdsl
        170: ds1FDL
        171: pos
        172: dvbAsiIn
        173: dvbAsiOut
        174: plc
        175: nfas
        176: tr008
        177: gr303RDT
        178: gr303IDT
        179: isup
        180: propDocsWirelessMaclayer
        181: propDocsWirelessDownstream
        182: propDocsWirelessUpstream
        183: hiperlan2
        184: propBWAp2Mp
        185: sonetOverheadChannel
        186: digitalWrapperOverheadChannel
        187: aal2
        188: radioMAC
        189: atmRadio
        190: imt
        191: mvl
        192: reachDSL
        193: frDlciEndPt
        194: atmVciEndPt
        195: opticalChannel
        196: opticalTransport
        197: propAtm
        198: voiceOverCable
        199: infiniband
        200: teLink
        201: q2931
        202: virtualTg
        203: sipTg
        204: sipSig
        205: docsCableUpstreamChannel
        206: econet
        207: pon155
        208: pon622
        209: bridge
        210: linegroup
        211: voiceEMFGD
        212: voiceFGDEANA
        213: voiceDID
        214: mpegTransport
        215: sixToFour
        216: gtp
        217: pdnEtherLoop1
        218: pdnEtherLoop2
        219: opticalChannelGroup
        220: homepna
        221: gfp
        222: ciscoISLvlan
        223: actelisMetaLOOP
        224: fcipLink
        225: rpr
        226: qam
        227: lmp
        228: cblVectaStar
        229: docsCableMCmtsDownstream
        230: adsl2
        231: macSecControlledIF
        232: macSecUncontrolledIF
        233: aviciOpticalEther
        234: atmbond
        235: voiceFGDOS
        236: mocaVersion1
        237: ieee80216WMAN
        238: adsl2plus
        239: dvbRcsMacLayer
        240: dvbTdm
        241: dvbRcsTdma
        242: x86Laps
        243: wwanPP
        244: wwanPP2
        245: voiceEBS
        246: ifPwType
        247: ilan
        248: pip
        249: aluELP
        250: gpon
        251: vdsl2
        252: capwapDot11Profile
        253: capwapDot11Bss
        254: capwapWtpVirtualRadio
        255: bits
        256: docsCableUpstreamRfPort
        257: cableDownstreamRfPort
        258: vmwareVirtualNic
        259: ieee802154
        260: otnOdu
        261: otnOtu
        262: ifVfiType
        263: g9981
        264: g9982
        265: g9983
        266: aluEpon
        267: aluEponOnu
        268: aluEponPhysicalUni
        269: aluEponLogicalLink
        270: aluGponOnu
        271: aluGponPhysicalUni
        272: vmwareNicTeam
        277: docsOfdmDownstream
        278: docsOfdmaUpstream
        279: gfast
        280: sdci
        281: xboxWireless
        282: fastdsl
        283: docsCableScte55d1FwdOob
        284: docsCableScte55d1RetOob
        285: docsCableScte55d2DsOob
        286: docsCableScte55d2UsOob
        287: docsCableNdf
        288: docsCableNdr
        289: ptm
        290: ghn
        291: otnOtsi
        292: otnOtuc
        293: otnOduc
        294: otnOtsig
        295: microwaveCarrierTermination
        296: microwaveRadioLinkTerminal
        297: ieee8021axDrni
        298: ax25
        299: ieee19061nanocom
        300: cpri
        301: omni
        302: roe
        303: p2pOverLan
    - name: ifPhysAddress
      oid: 1.3.6.1.2.1.2.2.1.6
      type: PhysAddress48
      help: The interface's address at its protocol sub-layer - 1.3.6.1.2.1.2.2.1.6
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifAdminStatus
      oid: 1.3.6.1.2.1.2.2.1.7
      type: EnumAsStateSet
      help: The desired state of the interface - 1.3.6.1.2.1.2.2.1.7
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
      enum_values:
        1: up
        2: down
        3: testing
    - name: ifOperStatus
      oid: 1.3.6.1.2.1.2.2.1.8
      type: EnumAsStateSet
      help: The current operational state of the interface - 1.3.6.1.2.1.2.2.1.8
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
      enum_values:
        1: up
        2: down
        3: testing
        4: unknown
        5: dormant
        6: notPresent
        7: lowerLayerDown
    - name: ifLastChange
      oid: 1.3.6.1.2.1.2.2.1.9
      type: gauge
      help: The value of sysUpTime at the time the interface entered its current operational
        state - 1.3.6.1.2.1.2.2.1.9
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifInOctets
      oid: 1.3.6.1.2.1.2.2.1.10
      type: counter
      help: The total number of octets received on the interface, including framing
        characters - 1.3.6.1.2.1.2.2.1.10
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifInDiscards
      oid: 1.3.6.1.2.1.2.2.1.13
      type: counter
      help: The number of inbound packets which were chosen to be discarded even though
        no errors had been detected to prevent their being deliverable to a higher-layer
        protocol - 1.3.6.1.2.1.2.2.1.13
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifInErrors
      oid: 1.3.6.1.2.1.2.2.1.14
      type: counter
      help: For packet-oriented interfaces, the number of inbound packets that contained
        errors preventing them from being deliverable to a higher-layer protocol -
        1.3.6.1.2.1.2.2.1.14
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifOutOctets
      oid: 1.3.6.1.2.1.2.2.1.16
      type: counter
      help: The total number of octets transmitted out of the interface, including
        framing characters - 1.3.6.1.2.1.2.2.1.16
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifOutDiscards
      oid: 1.3.6.1.2.1.2.2.1.19
      type: counter
      help: The number of outbound packets which were chosen to be discarded even
        though no errors had been detected to prevent their being transmitted - 1.3.6.1.2.1.2.2.1.19
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifOutErrors
      oid: 1.3.6.1.2.1.2.2.1.20
      type: counter
      help: For packet-oriented interfaces, the number of outbound packets that could
        not be transmitted because of errors - 1.3.6.1.2.1.2.2.1.20
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifOutQLen
      oid: 1.3.6.1.2.1.2.2.1.21
      type: gauge
      help: The length of the output packet queue (in packets). - 1.3.6.1.2.1.2.2.1.21
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
    - name: ifConnectorPresent
      oid: 1.3.6.1.2.1.31.1.1.1.17
      type: EnumAsStateSet
      help: This object has the value 'true(1)' if the interface sublayer has a physical
        connector and the value 'false(2)' otherwise. - 1.3.6.1.2.1.31.1.1.1.17
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString
      enum_values:
        1: "true"
        2: "false"
    - name: ifCounterDiscontinuityTime
      oid: 1.3.6.1.2.1.31.1.1.1.19
      type: gauge
      help: The value of sysUpTime on the most recent occasion at which any one or
        more of this interface's counters suffered a discontinuity - 1.3.6.1.2.1.31.1.1.1.19
      indexes:
      - labelname: ifIndex
        type: gauge
      lookups:
      - labels:
        - ifIndex
        labelname: ifAlias
        oid: 1.3.6.1.2.1.31.1.1.1.18
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
      - labels:
        - ifIndex
        labelname: ifName
        oid: 1.3.6.1.2.1.31.1.1.1.1
        type: DisplayString

@SuperQ
Copy link
Member

SuperQ commented Apr 11, 2024

We would need to have retry callback hooks in the SNMP library in order to get more per walk debugging. It may be possible, I would have to look at the library code.

@desultory
Copy link
Author

We would need to have retry callback hooks in the SNMP library in order to get more per walk debugging. It may be possible, I would have to look at the library code.

if this is possible, it would be great, I'm not sure how else I could easily determine which oid was failing. I may try to decode some of the snmp3 packets to try to see what was requested twice.

@SuperQ
Copy link
Member

SuperQ commented Apr 12, 2024

Can you give this PR a try? #1157

You can find the output build artifacts in CircleCI.

@desultory
Copy link
Author

desultory commented Apr 12, 2024

Can you give this PR a try? #1157

You can find the output build artifacts in CircleCI.

Thank you very much, I was able to build that branch with "go build" and can run the output with "./snmp_exporter --snmp.debug-packets --log.level=debug".

I've been running this build on my desktop for a while, which is accessing that problem device over a VPN, and in 10 minutes I haven't been able to record a duration greater than 600ms in the last 10 minutes. I have curls running with "watch -n 60" so it does one query a minute.

Do you know what I could filter the output by to only see that particular new debug message? That may make it easier to catch this issue with this improved debugging.

@SuperQ
Copy link
Member

SuperQ commented Apr 12, 2024

I would look for Retry number and request timeout in the logs.

@desultory
Copy link
Author

For whatever reason I'm not able to reproduce this on my desktop:

> grep "Retry number" log.txt
> grep "request timeout" log.txt
> grep "Finished scrape" log.txt
ts=2024-04-12T16:22:10.522Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.647985982
ts=2024-04-12T16:22:26.162Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.634859661
ts=2024-04-12T16:22:41.775Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.607646046
ts=2024-04-12T16:22:57.411Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.630984308
ts=2024-04-12T16:23:13.042Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.626529731
ts=2024-04-12T16:23:28.661Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.613124682
ts=2024-04-12T16:23:44.301Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.634122488
ts=2024-04-12T16:23:59.934Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.627908045
ts=2024-04-12T16:24:15.549Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.610465316
ts=2024-04-12T16:24:31.166Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.611277663
ts=2024-04-12T16:24:46.799Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.628145301
ts=2024-04-12T16:25:02.422Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.61721881
ts=2024-04-12T16:25:18.060Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.632497305
ts=2024-04-12T16:25:33.698Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.633533768
ts=2024-04-12T16:25:49.326Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.622159064
ts=2024-04-12T16:26:04.950Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.618739644
ts=2024-04-12T16:26:20.579Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.623703638
ts=2024-04-12T16:26:36.203Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.618068765
ts=2024-04-12T16:26:51.821Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.613324038
ts=2024-04-12T16:27:07.453Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.625984028
ts=2024-04-12T16:27:23.092Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.633752763
ts=2024-04-12T16:27:38.735Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.638352054
ts=2024-04-12T16:27:54.368Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.626199045
ts=2024-04-12T16:28:10.009Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.636024974
ts=2024-04-12T16:28:25.630Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.615045888
ts=2024-04-12T16:28:41.265Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.630309173
ts=2024-04-12T16:28:56.892Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.621721412
ts=2024-04-12T16:29:12.546Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.64898922
ts=2024-04-12T16:29:28.179Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.627764007
ts=2024-04-12T16:29:43.820Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.635411245
ts=2024-04-12T16:29:59.454Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.627675049
ts=2024-04-12T16:30:15.075Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.616801067
ts=2024-04-12T16:30:30.718Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.63589801
ts=2024-04-12T16:30:46.359Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.635852961
ts=2024-04-12T16:31:02.014Z caller=collector.go:448 level=debug auth=netgear_switch target=172.16.99.2 worker=0 module=if_mib msg="Finished scrape" duration_seconds=0.649401456

@desultory
Copy link
Author

image
I'm still observing that behavior on the existing docker container. I'll try to run this patched code directly on that server next.

@SuperQ
Copy link
Member

SuperQ commented Apr 12, 2024

Yea, I typically don't recommend running SNMP over VPN/WAN links. SNMP, being UDP-based, is very sensitive to packet loss and latency.

My standard recommendation is to run the snmp_exporter as close to the target devices as possible.

@desultory
Copy link
Author

desultory commented Apr 12, 2024

Yea, I typically don't recommend running SNMP over VPN/WAN links. SNMP, being UDP-based, is very sensitive to packet loss and latency.

My standard recommendation is to run the snmp_exporter as close to the target devices as possible.

The server with the issues is as close as it can be, it's under a router and then that router is attached to this switch.

What is strange is that I'm not experiencing the issue over a VPN.

The topology where the problem is observed: Docker container -> core router -> switch (snmp)
The docker host is not NATing or anything, the core router is aware of the docker subnets on that host.

The topology where no issue is observed: My desktop -> vpn -> management router -> switch (snmp)

A different router is being used, but I've checked the core router and it doesn't report any dropped/error packets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants