Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Considerable delay to enter to WizardScene after adding '@telegraf/session/mongodb' #1914

Open
elC0mpa opened this issue Dec 19, 2023 · 25 comments

Comments

@elC0mpa
Copy link

elC0mpa commented Dec 19, 2023

Context

I used to have a telegram bot using long polling mode inside an EC2 instance and everything worked well. I decided to run this bot in webhook mode using AWS lambda service.
In this project I have a lot of WizardScenes and the first problem I had was that after getting into the first step of the WizardScene it didn't advance to the following step. I solved this problem by using '@telegraf/session/mongodb'.
After this, I can complete a Wizard Scene but the bot become very unresponsive and unreliable every time a wizard stage listen for the event to enter a WizardScene, I mean, it just doesn't answer or answer after I sent him two other actions for example.
I just need to execute my bot in webhook mode and that it works as it used to do in long polling mode, right now the only problem I had is related to the WizardScenes

  • Telegraf.js Version: 4.15.3
  • Node.js Version: 16.20.2
  • Operating System: Fedora 39

Minimal Example Code Reproducing the Issue

Bot initialization

// your code here. Please omit everything not related to the issue,
// to make it easier to pinpoint the cause.
// Please try running this code to make sure it actually reproduces the issue.
  const config = { telegram: { webhookReply: false } }
  bot = new Telegraf<BotContext>(process.env.BOT_TOKEN, config)
  const wizardsStage = new Scenes.Stage<any>([])
  const store = Mongo({
    url: process.env.DATABASE_URL,
    database: process.env.SESSION_DATABASE_NAME,
  })
  bot.use(session({ store }))
  bot.use(wizardsStage.middleware())

Lisening to events

    bot.hears(UserInteractionConstants.CURRENCIES_MENU, ctx => {
    // This works perfectly
    ctx.reply('Menú divisas', ctx.customState.isAdmin ? CurrencyActionsKeyboard : SalesmanCurrencyActionsKeyboard)
  })

  wizardsStage.hears(UserInteractionConstants.CURRENCY_QUERY_ACTION, async ctx => {
  // Every time wizardsStage hears for the event the bot becomes unresponsive
    const customState: BotState = (ctx as any).customState
    if (customState.isAdmin) {
      ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
    } else if (customState.isSalesman) {
      ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
    }
  })
  wizardsStage.hears(UserInteractionConstants.OTHER_CURRENCIES_QUERY, async ctx => {
    return ctx.scene.enter(SceneIDS.QUERY_OTHER_USERS_CURRENCIES)
  })

Expected Behavior

Screencast.from.2023-12-19.14-25-29.webm

Current Behavior

Screencast.from.2023-12-19.14-27-24.webm
@MKRhere
Copy link
Member

MKRhere commented Dec 19, 2023

When an update comes in, session is trying to fetch the latest session from MongoDB. If the database is located far from the bot, or is running on poor hardware, it can respond slowly, causing session to wait for it. You can investigate this by enabling debug logs.

On Linux/Unixy shells, you can do:

DEBUG=telegraf:* node bot.js

On Windows, cmd.exe:

set DEBUG=telegraf:*
node bot.js

Powershell:

$env:DEBUG = 'telegraf:*'
node bot.js

The logs should give you an idea for where it's spending most of its time waiting.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 19, 2023

Thanks for the fast answer but I run the same code right now with the long polling mode and the bot run as expected.
Probably it should be some broken promise or something but I am not sure
Let me ask you, in the bot config what is the webhookReply field for?

@MKRhere
Copy link
Member

MKRhere commented Dec 20, 2023

TLDR: webhookReply is very problematic, and we recommend leaving it off.

webhookReply is a minor but problematic optimisation.

In webhook mode, Bot API sends you an update and you can send one operation (which would normally be a separate API request you make, like sendMessage or answerCallbackQuery) back in the response body. That saves you from making one API call of your own, for example to call sendMessage. But because you're sending the request in the response body, you cannot know whether it was successful, and there will be no return value. Telegraf only uses webhookReply for a handful of API methods, particularly those that do not have a response body.

With webhookReply,

  • you cannot know if the request was successful
  • you cannot get a response body for that request
  • you may not realise which API call used webhookReply (only one per update, and then we fallback to normal API calls)

But most importantly, the first API call while handling a webhook update will cause that update to be marked as handled.

This is problematic because in webhook mode, Telegram sends one update per chat and waits for us to finish processing before sending the next update. An early webhookReply will mean Telegram considers that request handled and sends the next update. This may or may not be preferable.

So in any case, we do not recommend using webhookReply. The feature exists for completion, but it causes many an unexpected behaviour.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 20, 2023

Thanks for the detailed explanation @MKRhere. Don't you have any idea why could this be? I mean, the undesired behavior I am getting...

One question, even though I am pretty sure my problem is not related to the mongodb session because I execute the same code in long polling mode and have no problem, I couldnt set the debug as you explained me. I am using nodemon and have a dev script to launch the bot in development mode, how could I do this?

@MKRhere
Copy link
Member

MKRhere commented Dec 20, 2023

Just set it in the shell, like export DEBUG=telegraf:* before running nodemon, etc.

@MKRhere
Copy link
Member

MKRhere commented Dec 20, 2023

Are you able to run webhook mode locally? Since you mention it only happens in webhook mode.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 20, 2023

Are you able to run webhook mode locally? Since you mention it only happens in webhook mode.

No I am not, I can only run webhook inside AWS lambda. But I can execute the bot in local environment using mongodb session and I have no problem, it works as expected and I can even check the mongodb collection.

Maybe it could help if you have any example of a telegraf bot written in typescript which uses webhook and wizard scenes

@elC0mpa
Copy link
Author

elC0mpa commented Dec 20, 2023

@MKRhere I just run the bot in local mode using long polling mode with the debug enabled and as I told you everything works fine

  telegraf:session (642696417) did not find cached session +0ms
  telegraf:session (642696417) fetching from upstream store +0ms
  telegraf:session (642696417) updating cache +310ms

  telegraf:session (642696417) refcounter reached 0, removing cached +2ms
  telegraf:session (642696417) middlewares completed, checking session +1ms

  telegraf:session (642696418) did not find cached session +1m
  telegraf:session (642696418) fetching from upstream store +0ms
  telegraf:session (642696418) updating cache +282ms

I mean, definitely there is n problem when using the mongodb session, but in some way it doesn't work correctly when I combine it with webhooks inside lambda

Is there any way to enable this debug for lambda functions?
Maybe I can set the DEBUG env variable for this lambda function, this is my question

@MKRhere
Copy link
Member

MKRhere commented Dec 20, 2023

Install the debug package, then in the start of your bot:

import debug from "debug";
debug.enable("telegraf:*");

You can also set the env var in AWS Lambda's settings. That will also work.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 20, 2023

I already found the problem (but still don't know how to solve it)
Look at the logs when running the bot in webhook mode in AWS Lambda(only showing logs related to session)

 2023-12-20T14:25:29.214Z telegraf:session (642696426) did not find cached session
 2023-12-20T14:25:29.214Z telegraf:session (642696426) fetching from upstream store
 END RequestId: 5b8e1b5d-4e74-4691-83ac-d21894357d13

As you can see, comparing it to the logs received in long polling mode, it doesn't wait for it to update cache
However after some random requests it prints me the following:

2023-12-20T14:25:39.277Z telegraf:session (642696426) updating cache

Definitely there is something related to my code that is not waiting for the promise to be resolved

Take a look at my code:

  wizardsStage.hears(UserInteractionConstants.CURRENCY_QUERY_ACTION, async ctx => {
   const customState: BotState = (ctx as any).customState
   if (customState.isAdmin) {
     ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
   } else if (customState.isSalesman) {
     ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
   }
 })

Maybe I should return the promises or something for it to wait for them
There is no chance that it be related to the database because I have it in the same VPC I execute my lambda functions, so definitely that is not the problem

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

The difference of 10 seconds between those logs for the same update definitely means it is waiting 10s for the database query. There's nothing your code could influence this. For some reason the database request is just extremely slow. When you test this locally, you're connected to the same database?

@elC0mpa
Copy link
Author

elC0mpa commented Dec 21, 2023

My bad, I was not explicit enough, as you said there is a 10 seconds difference but this is because it is what I printed, I mean, I waited some time to make a New request and when I make another random request it printed that in the logs.
And yes, when running the bot in local mode I connect to the same database just to be sure it could be the problem

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

But they have the same update_id (in parens), so they belong to the same update.

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

So then, just to rule out something weird with Lambda, you should try running webhook mode elsewhere, in a VPS or preferably locally.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 21, 2023

So then, just to rule out something weird with Lambda, you should try running webhook mode elsewhere, in a VPS or preferably locally.

Ok, I will run it inside an ec2 instance and will let you know, but tomorrrow I would like to see if this could help me (#567)

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

I don't think it's related.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 21, 2023

I don't think it's related.

You are right, but I was reading this (#1786) right now and I am pretty sure it is related, Actually I think it is some problem related to the way lambda functions work. It is a shame that this one does not have a solution either.
I think that after the await done internally by the mongodb session package lambda thinks it is finished and the request is closed, it is pretty similar to what happens in the discussion I referenced

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

I just realised webhookReply is set to true by default. That's a bad default, and you should set

new Telegraf(token, { telegram: { webhookReply: false } });

It should be defaulted to false in v5. I'm not sure this will help in your case, but might as well try. What I expect with WHR is that the first Bot API call you make causes AWS Lambda to terminate the request, since it's marked as handled. Database calls should not count.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 21, 2023

Thanks but in the first comment I shared my bot initialization code and there is webhookReply set to false!

@MKRhere
Copy link
Member

MKRhere commented Dec 21, 2023

In that case, the linked issue is not relevant. There is no reason to believe AWS Lambda will terminate a request after a database query. If I stretch my imagination, I could consider that AWS Lambda will sleep the isolate and wake it back up when a response was received, which is consistent with the observation, but it is honestly a far stretch. Let us eliminate other possibilities by testing an identical bot on EC2 first.

@elC0mpa
Copy link
Author

elC0mpa commented Dec 22, 2023

I just found the problem and a solution even thought it is not a good one @MKRhere
First of all let me show you my aws lambda handler

export const message = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  try {
    const body = JSON.parse(event.body!)
    console.log(body)
    initBot()
    await webhookHandle(body)

    return { body: JSON.stringify({ message: 'Ok' }), statusCode: 200 }
  } catch (error) {
    return { body: JSON.stringify({ message: 'Error' }), statusCode: 500 }
  }
}

webhookHandle function:

const webhookHandle = (update: Update) => {
  return bot.handleUpdate(update)
}

My problem basically is that I don´t know why but the webhook handle function is not waiting for the bot to completely handle the update

I added this to my code and everything works perfectly now:

    console.log('Finished webhook handle')
    await sleep(500)

I added this above the return statement inside the handler function and everything worked now. But it is not an elegant solution and I added 500 ms of billing time to every request, so the question is:

Why do you think it is not waiting for the bot to handle the update?

Let me show you my initBot function

const initBot = () => {
  const config = { telegram: { webhookReply: false } }
  bot = new Telegraf<BotContext>(process.env.BOT_TOKEN, config)
  const wizardsStage = new Scenes.Stage<any>([])
  const store = Mongo({
    url: process.env.DATABASE_URL,
    database: process.env.SESSION_DATABASE_NAME,
  })

  bot.use(checkUserType)
  bot.use(allowedUsers)

  bot.start(ctx => {
    console.log('start bot')
    return ctx.reply('Menú Principal', ctx.customState.isSalesman ? SalesmanMainmenuKeyboard : MainmenuKeyboard)
  })
  bot.hears(UserInteractionConstants.MAIN_MENU, ctx => {
    return ctx.reply('Menú Principal', ctx.customState.isSalesman ? SalesmanMainmenuKeyboard : MainmenuKeyboard)
  })
  initBrandsRoutes(bot, wizardsStage)
  initModulesRoutes(bot, wizardsStage)
  initInventoriesRoutes(bot, wizardsStage)
  initProductsRoutes(bot, wizardsStage)
  initCurrenciesRoutes(bot, wizardsStage)
  initSalesRoutes(bot, wizardsStage)
  initUsersRoutes(bot, wizardsStage)
  initPaymentsRoutes(bot, wizardsStage)
  initDoubtsRoutes(bot, wizardsStage)
  initSettingsRoutes(bot, wizardsStage)

  bot.use(session({ store }))
  bot.use(wizardsStage.middleware())
}

Now inside initCurrenciesRoute function (this is just an example of how it is implemented all routes for each entity)

export const initCurrenciesRoutes = (bot: Telegraf<BotContext>, wizardsStage: WizardsStageType) => {
  registerScenes(wizardsStage)
  bot.hears(UserInteractionConstants.CURRENCIES_MENU, ctx => {
    ctx.reply('Menú divisas', ctx.customState.isAdmin ? CurrencyActionsKeyboard : SalesmanCurrencyActionsKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_INCREASE_ACTION, ctx => {
    ctx.reply('Incrementar divisas', CurrencyIncreaseKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_DECREASE_ACTION, ctx => {
    ctx.reply('Decrementar divisas', CurrencyDecreaseKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_EXCHANGE_ACTION, ctx => {
    ctx.reply('Intercambiar divisas', CurrencyExchangeKeyboard)
  })
  wizardsStage.hears(UserInteractionConstants.CURRENCY_QUERY_ACTION, async ctx => {
    console.log(`Inside ${UserInteractionConstants.CURRENCY_QUERY_ACTION}`);
    const customState: BotState = (ctx as any).customState
    if (customState.isAdmin) {
      ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
    } else if (customState.isSalesman) {
      ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
    }
  })
  bot.command('query_currencies', ctx => {
    const customState: BotState = (ctx as any).customState
    if (customState.isAdmin) {
      ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
    } else if (customState.isSalesman) {
      ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
    }
  })
}

My question is, according to this code is there something I am missing that makes this to happen?

@elC0mpa
Copy link
Author

elC0mpa commented Dec 22, 2023

image
In this image we can verify how the access to the database is around 100 ms which is what it doesn't wait if I don´t add the wait

@elC0mpa
Copy link
Author

elC0mpa commented Dec 26, 2023

I just found the problem and a solution even thought it is not a good one @MKRhere
First of all let me show you my aws lambda handler

export const message = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  try {
    const body = JSON.parse(event.body!)
    console.log(body)
    initBot()
    await webhookHandle(body)

    return { body: JSON.stringify({ message: 'Ok' }), statusCode: 200 }
  } catch (error) {
    return { body: JSON.stringify({ message: 'Error' }), statusCode: 500 }
  }
}

webhookHandle function:

const webhookHandle = (update: Update) => {
  return bot.handleUpdate(update)
}

My problem basically is that I don´t know why but the webhook handle function is not waiting for the bot to completely handle the update

I added this to my code and everything works perfectly now:

    console.log('Finished webhook handle')
    await sleep(500)

I added this above the return statement inside the handler function and everything worked now. But it is not an elegant solution and I added 500 ms of billing time to every request, so the question is:

Why do you think it is not waiting for the bot to handle the update?

Let me show you my initBot function

const initBot = () => {
  const config = { telegram: { webhookReply: false } }
  bot = new Telegraf<BotContext>(process.env.BOT_TOKEN, config)
  const wizardsStage = new Scenes.Stage<any>([])
  const store = Mongo({
    url: process.env.DATABASE_URL,
    database: process.env.SESSION_DATABASE_NAME,
  })

  bot.use(checkUserType)
  bot.use(allowedUsers)

  bot.start(ctx => {
    console.log('start bot')
    return ctx.reply('Menú Principal', ctx.customState.isSalesman ? SalesmanMainmenuKeyboard : MainmenuKeyboard)
  })
  bot.hears(UserInteractionConstants.MAIN_MENU, ctx => {
    return ctx.reply('Menú Principal', ctx.customState.isSalesman ? SalesmanMainmenuKeyboard : MainmenuKeyboard)
  })
  initBrandsRoutes(bot, wizardsStage)
  initModulesRoutes(bot, wizardsStage)
  initInventoriesRoutes(bot, wizardsStage)
  initProductsRoutes(bot, wizardsStage)
  initCurrenciesRoutes(bot, wizardsStage)
  initSalesRoutes(bot, wizardsStage)
  initUsersRoutes(bot, wizardsStage)
  initPaymentsRoutes(bot, wizardsStage)
  initDoubtsRoutes(bot, wizardsStage)
  initSettingsRoutes(bot, wizardsStage)

  bot.use(session({ store }))
  bot.use(wizardsStage.middleware())
}

Now inside initCurrenciesRoute function (this is just an example of how it is implemented all routes for each entity)

export const initCurrenciesRoutes = (bot: Telegraf<BotContext>, wizardsStage: WizardsStageType) => {
  registerScenes(wizardsStage)
  bot.hears(UserInteractionConstants.CURRENCIES_MENU, ctx => {
    ctx.reply('Menú divisas', ctx.customState.isAdmin ? CurrencyActionsKeyboard : SalesmanCurrencyActionsKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_INCREASE_ACTION, ctx => {
    ctx.reply('Incrementar divisas', CurrencyIncreaseKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_DECREASE_ACTION, ctx => {
    ctx.reply('Decrementar divisas', CurrencyDecreaseKeyboard)
  })
  bot.hears(UserInteractionConstants.CURRENCY_EXCHANGE_ACTION, ctx => {
    ctx.reply('Intercambiar divisas', CurrencyExchangeKeyboard)
  })
  wizardsStage.hears(UserInteractionConstants.CURRENCY_QUERY_ACTION, async ctx => {
    console.log(`Inside ${UserInteractionConstants.CURRENCY_QUERY_ACTION}`);
    const customState: BotState = (ctx as any).customState
    if (customState.isAdmin) {
      ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
    } else if (customState.isSalesman) {
      ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
    }
  })
  bot.command('query_currencies', ctx => {
    const customState: BotState = (ctx as any).customState
    if (customState.isAdmin) {
      ctx.reply('Consultar divisas', CurrencyQueryKeyboard)
    } else if (customState.isSalesman) {
      ctx.scene.enter(SceneIDS.QUERY_CURRENCIES_SALESMAN)
    }
  })
}

My question is, according to this code is there something I am missing that makes this to happen?

Any thought about this?

@MKRhere
Copy link
Member

MKRhere commented Dec 29, 2023

Does this mean AWS Lambda behaves in some non-standard way where it doesn't realise the created promise can be awaited right afterwards and not on the same line?

image

I have a hard time believing that that could be the problem. I'll need to test.

@elC0mpa
Copy link
Author

elC0mpa commented Jan 22, 2024

telegraf/session#5
Any update about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants