Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WriteAPI - WriteOptions #145

Closed
csballa opened this issue Aug 25, 2020 · 9 comments
Closed

WriteAPI - WriteOptions #145

csballa opened this issue Aug 25, 2020 · 9 comments
Labels
duplicate This issue or pull request already exists
Milestone

Comments

@csballa
Copy link

csballa commented Aug 25, 2020

Hello,

I was trying to change a simple logger from the old java client to this one. I have expected a similar behaviour from the WriteOptions, as the previous client's batching mechanism.

Using the writeApi with the default WriteOptions doesn't seem to actually keep a batch, or retry writing if one attempt has failed.
It writes points to the given DB/retentionPolicy, but shuting down the influx server, or deleting the DB/RP would result in different logged, but swallowed errors (NotFoundException, InfluxException, ConnectException...). (I was trying to simulate connection loss/wrong configuration this way.)
After starting up the influx server, creating the DB/RP, it would continue to work, however points attempted to be written during the off time of the influxDB are lost.
As the errors are handled in the client, I don't see an easy solution to implement my own batching for points.

There is need for some extra config for the write options to take effect? As I have seen the default options are set even implicitly, but if I have to active them somehow I totally missed where or how:
image

(Not related: But I don't know where should I ask, but how can I test/query with this client for existence of a DB and/or RP?)

Thanks in advance for any fix/info!

Specifications:

  • Client Version: tested with: 1.10, 1.11
  • InfluxDB Version: 1.8
  • Platform: Windows
@bednar
Copy link
Contributor

bednar commented Aug 26, 2020

Hi @csballa,

thanks for using our client.

You could configure batching by WriteOptions, try to use something like:

WriteOptions writeOptions = WriteOptions.builder()
        .batchSize(5000)
        .flushInterval(1000)
        .bufferLimit(10000)
        .jitterInterval(1000)
        .retryInterval(5000)
        .build();

WriteApi writeApi = client.getWriteApi(writeOptions);

(Not related: But I don't know where should I ask, but how can I test/query with this client for existence of a DB and/or RP?)

We currently doesn't support DB/RP API (https://v2.docs.influxdata.com/v2.0/api/#tag/DBRPs), but it is a good suggestion to improve, so we will implemented it.

Regards

@csballa
Copy link
Author

csballa commented Aug 26, 2020

Hi @bednar,

thanks for the quick response.

As far as I can see the default write options are similar and having looked into the source code, I figured out that the issue here seems to be, that loosing connection or not having the configured DB/RP are not considered a retrieable error.

Consequently the points are lost, in the batch and the write OP is not retried. I would suggest:

  1. That in no case an error response, unsuccesfull write operation should cause the loss of data. (The older influx client's batch retried all the time, after the batch started to run out it would start calling the passed in error handling function, so in case of write failure it was easy to store, the would be lost points.)
  2. The WriteSuccesEvent contains the Line Protocol written, however, I couldn't find similar, easily accessible data about the attempted writes in Error events, so handling the faulty writes by other means also proves to be tricky, getting back the Point/Data in case of errors would also be a nice improvement for error handling.

What do you think about the 1. point? Would it be possible to change the implementation so all unsuccesful write would be retried later on and the batch kept until it grows over its limit?

@bednar
Copy link
Contributor

bednar commented Sep 24, 2020

Hi @csballa,

We are working on improvement of retry strategy for the client. We will introduce new configuration options to be more user friendly:

Property Description Default Value
max_retries the number of max retries when write fails 5
max_retry_delay maximum delay when retrying write in milliseconds 180000
exponential_base the base for the exponential retry delay, the next delay is computed as retry_interval * exponential_base^(attempts - 1) + random(jitter_interval) 5
batch_abort_on_exception the batching worker will be aborted after failed retry strategy false

Regards

@csballa
Copy link
Author

csballa commented Sep 24, 2020

Hello @bednar,

Thank you for the improvement!
Just to clarify: Can we expect that all import errors, not just retrieables will be retried? (Like a wrong configuration/connection lost)

Thanks again!

@bednar
Copy link
Contributor

bednar commented Sep 24, 2020

We will be retry all HTTP connection error + HTTP errors >= 429.

@csballa
Copy link
Author

csballa commented Sep 25, 2020

Will you consider improving the error handling to provide the failed points in the ErrorEvents (Especially in WriteErrorEvent)?
Also a similar option for handling overflown points from the batch would be nice.
Without these options it requires some workaround to ensure no data gets lost. Also if we have access to the failed points it would be a lot more easier to write it into another target DB/file.

@bednar
Copy link
Contributor

bednar commented Sep 29, 2020

@csballa yes, we could do that in next PR after we will improve our retry strategy

@csballa
Copy link
Author

csballa commented Jan 5, 2021

Hello @bednar,

I have finally was able to try out the changes:

  1. Getting the points in case of an error is still soarly missed, but the new config options are really helpful.

  2. Maybe wort another ticket, but I have discovered a new issue during testing(client version 1.14, influx 1.8):

Scenario:
I tested the retries with simply starting up the influxdb only after some write attempts, so I can see how my writes would be retried.

Result:

  • Points attempted to be written before the retries happen, would be written after DB start up. (expected behaviour)
  • Points attempted to be written after the first retry, would throw (after db startup) an Interrupted exception. These point are lost, point before the retrie are saved correctly, and points after the error also. Here I would expect that the points during retries are also added to the batch, and would be part of the same retry attempts, or queued up with the other attempts.
    This behaviour causes the point written during retries get lost, and therefore making the retries unutilizable.
    InterruptedStackTrace.txt

Regards

@bednar
Copy link
Contributor

bednar commented Jul 18, 2022

The interrupted exception should be fixed by #358.

@bednar bednar closed this as completed Jul 18, 2022
@bednar bednar added this to the 6.4.0 milestone Jul 18, 2022
@bednar bednar added the duplicate This issue or pull request already exists label Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants