Connections not being returned to pool on transaction failure #445

AlexLaviolette · 2018-11-22T18:31:49Z

Currently on play 2.6 with play-slick 3.0.1, I’ve been running into an issue during load testing where the app is consuming tons of db connections from the pool and not returning them. I have narrowed it down to the use of transactionally, but have still not been able to pin point the exact cause.

As proof of this, when I increase the db connection pool to 340 (our max_connections is 350), I can hit up to 28k rpm on one server. But with a pool of only 100, it almost instantly crashes under the same load with the connections timing out waiting for a connection from the pool. I can confirm that the database transactions are happening rapidly. When I enable hikari connection leak detection, I can see that some connections are taking > 60s to be returned, yet the database is showing nothing over 5 seconds. From everything I’ve read, we should be able to configure this to run with a much smaller pool. When I remove the use of transactionally, the behaviour of the pool returns to normal.

Here are some logs:
with pool of 100 and transactionally enabled

com.zaxxer.hikari.pool.HikariPool - db - Timeout failure stats (total=100, active=100, idle=0, waiting=33)

with pool of 340 and transactionally enabled

com.zaxxer.hikari.pool.HikariPool - db - After cleanup stats (total=340, active=244, idle=96, waiting=0)

Some errors with pool of 100 and transactionally enabled:

java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 10000ms.
at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
at slick.jdbc.hikaricp.HikariCPJdbcDataSource.createConnection(HikariCPJdbcDataSource.scala:14)
at slick.jdbc.JdbcBackend$BaseSession.(JdbcBackend.scala:438)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:46)
at slick.jdbc.JdbcBackend$DatabaseDef.createSession(JdbcBackend.scala:37)
at slick.basic.BasicBackend$DatabaseDef.acquireSession(BasicBackend.scala:216)
at slick.basic.BasicBackend$DatabaseDef.acquireSession$(BasicBackend.scala:215)

NewRelic is also showing these requests spending the majority of their time waiting on com.zaxxer.hikari.HikariDataSource.getConnection()

Here is an example of the code that is causing the issue:

def _insertIfNotExists(partner: Partner, accountId: String): DBIO[Account] = {
    accounts
      .filter(_.account_id === accountId)
      .filter(_.partner === partner)
      .result.headOption.flatMap {
      case Some(account) => DBIO.successful(account) // no-op
      case _ => {
        accounts.returning(accounts.map(_.id)).into((account, id) => account.copy(id = Some(id))) += Account(None, partner, accountId)
      }
    }
  }

  def createWithDeviceIfNotExists(partner: Partner, accountId: String, deviceId: String): Future[Device] = {
    db.run(
      _insertIfNotExists(partner, accountId).flatMap { account =>
        devices._insertIfNotExists(deviceId, account.id.get) // this looks exactly like the _insertIfNotExists above
      }.transactionally
    )
  }

I am unable to reproduce this at low load or locally by manually causing the queries to fail. But something about using transactionally here is causing connections to not be returned to the pool for a very long time. There are reports that Previously reported leaked connection com.mysql.jdbc.JDBC4Connection@795acb1e was returned to the pool (unleaked) but often after several minutes.

The text was updated successfully, but these errors were encountered:

octonato · 2018-11-26T10:13:20Z

@AlexLaviolette can you share your slick configuration?

The number of threads used by Slick must be equal to the number of connections in the pool.

AlexLaviolette · 2018-11-26T16:04:59Z

I was just using the default configuration for slick. But after reading through slick/slick#1614 and some other discussions and trying several of the suggestions in there (including max_conns = num_threads) I am still running into this issue.

My configuration before was:

slick.dbs {
  default {
    profile="slick.jdbc.MySQLProfile$"
    db {
      driver="com.mysql.jdbc.Driver"
      url=${PLAY_DB_URL}
      user=${PLAY_DB_USER}
      password=${PLAY_DB_PASSWORD}
      leakDetectionThreshold=60000
    }
  }
}

I have since upgraded play-slick from 3.0.1 -> 3.0.3 which is the latest version that I believe supports play 2.6.x (I am using 2.6.12)

I also upgraded slick from the version bundled in play-slick from 3.2.1 -> 3.2.3. Unfortunately, it seems like the relevant bug fixes were present in 3.2.1 already, but I gave it a shot regardless. This did provide a nice warning message which basically echoes what is said in the discussion:

[warn] slick.util.AsyncExecutor - Having maxConnection > maxThreads can result in deadlocks if transactions or database locks are used.

I configured slick as follows which stifled the warning message

slick.dbs {
  default {
    profile="slick.jdbc.MySQLProfile$"
    db {
      driver="com.mysql.jdbc.Driver"
      url=${PLAY_DB_URL}
      user=${PLAY_DB_USER}
      password=${PLAY_DB_PASSWORD}
      leakDetectionThreshold=60000
      numThreads=20
      maxConnections=20
    }
  }
}

However I am still encountering the error. In the discussion it seemed that if this fix only applied to Sequence and AndThen while I was using flatMap. I am unsure if the fix didn’t apply because it didn’t need fixing or it wasn’t fixed.

So as a test, I switched to using an AndThen query - I still encountered the issue unfortunately. However, this did eventually show a new exception I had not seen yet, although I’m not sure how relevant it is:

Exception in thread “db-10” java.lang.IllegalArgumentException: requirement failed: count cannot be decreased
at scala.Predef$.require(Predef.scala:277)
at slick.util.ManagedArrayBlockingQueue.$anonfun$decreaseInUseCount$1(ManagedArrayBlockingQueue.scala:54)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at slick.util.ManagedArrayBlockingQueue.locked(ManagedArrayBlockingQueue.scala:201)
at slick.util.ManagedArrayBlockingQueue.decreaseInUseCount(ManagedArrayBlockingQueue.scala:53)
at slick.util.AsyncExecutor$$anon$2$$anon$1.afterExecute(AsyncExecutor.scala:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1157)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Please let me know if anything sticks out to you, and thanks for responding!

ignasi35 · 2018-11-27T13:06:33Z

@AlexLaviolette thanks for the super-thorough explanations and tests. If you could provide a reproducer it'd be a great help to narrow down the issue.

AlexLaviolette · 2018-11-28T20:47:43Z

I've put together a demo application here that replicates the issue https://github.com/AlexLaviolette/slick-demo.
It's basically a stripped down version of what we are actually running that only includes an example of the problematic code.

I have that deployed to a single machine (2GB RAM VM), connected to a single mysql db (1GB RAM VM).

The script I run to reproduce it is in the repo under /test/api_load_test.py. I am able to reproduce it running that from a single VPS (8GB RAM, 4 CPUs)

Hopefully this helps, let me know if you need anything else.

AlexLaviolette · 2018-12-07T21:52:21Z

I forked this and added the change referenced here to version 3.0.3. Unfortunately, this did not seem to fix my issue.

dspasojevic · 2020-05-06T10:21:54Z

We are experiencing this same issue. Dropping transactionally resolves the issue for us as well.

@AlexLaviolette did you find a resolution for this in the end?

dspasojevic · 2020-05-08T02:57:56Z

For future searchers, we got to the bottom of this issue. Ours was actually caused by an error on our side.

We'd flatmapped DBIOs and Futures using DBIO.from. The implementation of the method that was returning the Future was changed to use the database, meaning that it had its own transaction.

So it looked something like:

val dbio = for {
   a <- getAFromDatabase // a DBIO
   b <- getBFromDatabase // a DBIO
   c <- DBIO.from(calculateC) // A Future, but needing to run a DBIO to perform the calculation
   d <- doIt(a, b, c) // a DBIO
} yield d

x.run(dbio.transactionally)

Because the outer transaction (where the DBIO and Future were combined) was still active while the Future was running, in some cases the inner transaction run to produce the Future result was not able to acquire a connection for that transaction.

Under load, this quickly lead to a deadlock - all connections were consumed by the outer transactions and none of the inner transactions (used by calculateC) were able to acquire a connection from the pool.

This is unlikely to be related to the problem faced by @AlexLaviolette but did have the same symptoms.

petercoppensdatylon · 2021-02-27T12:08:48Z

Because the outer transaction (where the DBIO and Future were combined) was still active while the Future was running, in some cases the inner transaction run to produce the Future result was not able to acquire a connection for that transaction.

Under load, this quickly lead to a deadlock - all connections were consumed by the outer transactions and none of the inner transactions (used by calculateC) were able to acquire a connection from the pool.

Thanks a lot! Was looking for a hangup for 8 hours or so and slowly started to understand that it's related to nested transactions. What you describe is exactly what I have and after your description, working around it was trivial. Thanks mate!
Peter.

OlegYch · 2021-07-08T18:53:07Z

In my case this was caused by NewRelic javaagent
(using play 2.8.7 slick 3.3.3 (numThreads = 5) hikari 4.0.3 mysql connector 8.0.22)
with newrelic:

without newrelic:

This comment has been minimized.

Sign in to view

ignasi35 mentioned this issue Nov 29, 2018

Update Play to 2.7.0-RC8 #446

Merged

dwijnand added the status:backlog label Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connections not being returned to pool on transaction failure #445

Connections not being returned to pool on transaction failure #445

AlexLaviolette commented Nov 22, 2018 •

edited

octonato commented Nov 26, 2018

AlexLaviolette commented Nov 26, 2018

ignasi35 commented Nov 27, 2018

This comment has been minimized.

AlexLaviolette commented Nov 28, 2018 •

edited

AlexLaviolette commented Dec 7, 2018

dspasojevic commented May 6, 2020

dspasojevic commented May 8, 2020

petercoppensdatylon commented Feb 27, 2021

OlegYch commented Jul 8, 2021

Connections not being returned to pool on transaction failure #445

Connections not being returned to pool on transaction failure #445

Comments

AlexLaviolette commented Nov 22, 2018 • edited

octonato commented Nov 26, 2018

AlexLaviolette commented Nov 26, 2018

ignasi35 commented Nov 27, 2018

This comment has been minimized.

AlexLaviolette commented Nov 28, 2018 • edited

AlexLaviolette commented Dec 7, 2018

dspasojevic commented May 6, 2020

dspasojevic commented May 8, 2020

petercoppensdatylon commented Feb 27, 2021

OlegYch commented Jul 8, 2021

AlexLaviolette commented Nov 22, 2018 •

edited

AlexLaviolette commented Nov 28, 2018 •

edited