Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There seems to be a bug after training a model that is flushed to redis. #1288

Open
srfication opened this issue Apr 2, 2024 · 0 comments
Open

Comments

@srfication
Copy link

Versions used

  • metarank: 0.7.8
  • redis: 7.2.4
  • java: 17

Steps to reproduce:

  • import
    import --config config.yml --data data.json
  • train
    train --config config.yml --data data.json

Here are the error logs after running the train command

14:50:07.954 INFO  ai.metarank.main.Main$ - Metarank vunknown is starting.
14:50:08.429 INFO  ai.metarank.config.Config$ - api conf block is not defined: using default ApiConfig(Hostname(0.0.0.0),Port(8080),5 minutes)
14:50:08.431 INFO  ai.metarank.config.Config$ - train conf block is not defined: using default RedisTrainConfig(Hostname(localhost),Port(6379),2,CacheConfig(4096,1 hour,true),PipelineConfig(128,1 second,true),BinaryStoreFormat,None,None,RedisTimeouts(1 second,1 second,1 second))
14:50:08.631 INFO  ai.metarank.config.Config$ - Loaded config file, state=redis://localhost:6379, features=[transaction_count], models=[xgboost]
14:50:08.699 INFO  ai.metarank.main.Main$ - usage analytics enabled: release=false analytics=true errors=true
14:50:09.040 INFO  a.m.fstore.redis.client.RedisClient$ - auth is not enabled
14:50:09.040 INFO  a.m.fstore.redis.client.RedisClient$ - TLS disabled
14:50:09.715 INFO  a.m.fstore.redis.client.RedisClient$ - Enabled redis pipelining
14:50:09.721 INFO  a.m.fstore.redis.client.RedisClient$ - opened read+write connection redis://localhost:6379, db=0 (pipeline size: 128 period: 1 second)
14:50:09.722 INFO  a.m.fstore.redis.client.RedisClient$ - started flush timer every 1 second
14:50:09.724 INFO  a.m.fstore.redis.client.RedisClient$ - auth is not enabled
14:50:09.724 INFO  a.m.fstore.redis.client.RedisClient$ - TLS disabled
14:50:09.742 INFO  a.m.fstore.redis.client.RedisClient$ - opened read+write connection redis://localhost:6379, db=3 (pipeline size: 128 period: 1 second)
14:50:09.742 INFO  a.m.fstore.redis.client.RedisClient$ - periodic flushing disabled
14:50:09.743 INFO  a.m.fstore.redis.client.RedisClient$ - auth is not enabled
14:50:09.743 INFO  a.m.fstore.redis.client.RedisClient$ - TLS disabled
14:50:09.766 INFO  a.m.fstore.redis.client.RedisClient$ - Enabled redis pipelining
14:50:09.766 INFO  a.m.fstore.redis.client.RedisClient$ - opened read+write connection redis://localhost:6379, db=1 (pipeline size: 128 period: 1 second)
14:50:09.766 INFO  a.m.fstore.redis.client.RedisClient$ - started flush timer every 1 second
14:50:09.781 INFO  a.m.fstore.redis.client.RedisClient$ - auth is not enabled
14:50:09.781 INFO  a.m.fstore.redis.client.RedisClient$ - TLS disabled
14:50:09.797 INFO  a.m.fstore.redis.client.RedisClient$ - Enabled redis pipelining
14:50:09.797 INFO  a.m.fstore.redis.client.RedisClient$ - opened read+write connection redis://localhost:6379, db=2 (pipeline size: 128 period: 1 second)
14:50:09.797 INFO  a.m.fstore.redis.client.RedisClient$ - started flush timer every 1 second
14:50:09.798 INFO  ai.metarank.main.command.Train$ - Training all models: List(xgboost)
14:50:10.626 INFO  a.m.m.r.LambdaMARTRanker$LambdaMARTPredictor - loaded 12 clickthroughs
14:50:10.629 INFO  a.m.m.c.t.SplitStrategy$TimeSplit - using time split strategy, ratio=80%
14:50:10.634 INFO  a.m.m.r.LambdaMARTRanker$LambdaMARTPredictor - Train/Test split finished: 10/2 click-through events
14:50:11.195 INFO  i.g.m.ltrlib.booster.XGBoostBooster$ - [0] NDCG@10:train = 0.5659087521617943 NDCG@10:test = 0.5088912804029996
14:50:11.218 INFO  a.m.m.r.LambdaMARTRanker$LambdaMARTPredictor - NDCG@10: source=0.8154648767857287 reranked=0.5088912804029995 random=0.3379588167762147 (took 10)
14:50:11.257 INFO  a.m.fstore.redis.RedisModelStore - serialized model xgboost, size=Some(1362)
14:50:11.264 INFO  a.m.fstore.cache.CachedModelStore$ - removing model ModelName(xgboost) due to EXPLICIT
14:50:11.268 INFO  a.m.fstore.redis.RedisPersistence - redis pipeline flushed, took 6ms
14:50:11.269 INFO  ai.metarank.main.command.Train$ - model uploaded to store
14:50:11.274 INFO  a.m.fstore.redis.client.RedisClient$ - closing redis connection
14:50:11.282 INFO  a.m.fstore.redis.client.RedisClient$ - closing redis connection
14:50:11.285 INFO  a.m.fstore.redis.client.RedisClient$ - closing redis connection
14:50:11.288 INFO  a.m.fstore.redis.client.RedisClient$ - closing redis connection
ml.dmlc.xgboost4j.java.XGBoostError: [14:50:11] /Users/runner/work/xgboost/xgboost/src/c_api/c_api.cc:915: DMatrix/Booster has not been initialized or has already been disposed.
Stack trace:
  [bt] (0) 1   libxgboost4j9179524500566518608.dyl 0x0000000173f19675 dmlc::LogMessageFatal::~LogMessageFatal() + 117
  [bt] (1) 2   libxgboost4j9179524500566518608.dyl 0x0000000173f2cdbd XGBoosterGetNumFeature + 93
  [bt] (2) 3   libxgboost4j9179524500566518608.dyl 0x0000000173f1757f Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterGetNumFeature + 31
  [bt] (3) 4   ???                                 0x000000011a78953a 0x0 + 4739077434
  [bt] (4) 5   ???                                 0x000000011a785317 0x0 + 4739060503
  [bt] (5) 6   ???                                 0x000000011a78535c 0x0 + 4739060572


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.getNumFeature(Booster.java:750)
	at io.github.metarank.ltrlib.booster.XGBoostBooster.$anonfun$weights$1(XGBoostBooster.scala:51)
	at io.github.metarank.ltrlib.booster.Booster.whenNotClosed(Booster.scala:15)
	at io.github.metarank.ltrlib.booster.Booster.whenNotClosed$(Booster.scala:15)
	at io.github.metarank.ltrlib.booster.XGBoostBooster.whenNotClosed(XGBoostBooster.scala:13)
	at io.github.metarank.ltrlib.booster.XGBoostBooster.weights(XGBoostBooster.scala:50)
	at ai.metarank.ml.rank.LambdaMARTRanker$LambdaMARTModel.weights(LambdaMARTRanker.scala:392)
	at ai.metarank.main.command.Train$.$anonfun$train$5(Train.scala:83)
	at apply @ ai.metarank.util.Logging.info(Logging.scala:10)
	at map @ ai.metarank.main.command.Train$.$anonfun$train$4(Train.scala:79)
	at map @ ai.metarank.fstore.redis.client.RedisClient.$anonfun$doFlush$1(RedisClient.scala:130)
	at set @ org.http4s.ember.client.EmberConnection.cleanup(EmberConnection.scala:69)
	at flatMap @ ai.metarank.fstore.redis.client.RedisClient.doFlush(RedisClient.scala:129)
	at map @ ai.metarank.fstore.redis.RedisPersistence.$anonfun$sync$6(RedisPersistence.scala:124)
	at apply @ ai.metarank.fstore.redis.client.RedisClient.$anonfun$doFlush$2(RedisClient.scala:131)
	at fromCompletableFuture @ ai.metarank.fstore.redis.RedisPersistence$.$anonfun$create$8(RedisPersistence.scala:170)
	at void @ ai.metarank.util.analytics.AnalyticsReporter$.ping(AnalyticsReporter.scala:45)
	at void @ ai.metarank.util.analytics.AnalyticsReporter$.ping(AnalyticsReporter.scala:45)
	at map @ ai.metarank.fstore.redis.client.RedisClient.$anonfun$doFlush$1(RedisClient.scala:130)
	at set @ org.http4s.ember.client.EmberConnection.cleanup(EmberConnection.scala:69)

Process finished with exit code 1

this is the config file setup

features:
  - name: transaction_count
    type: interaction_count
    scope: item
    interaction: transaction

models:
  xgboost:
    type: lambdamart
    backend:
      type: xgboost
      iterations: 1
      ndcgCutoff: 10
      seed: 0
    weights:
      transaction: 1
    features:
      - transaction_count
    warmup:
        sampledRequests: 1
        duration: 1s

state:
  type: redis
  host: localhost
  port: 6379

imported data file
data.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant