Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moderated message lost if someone clicks on "distribute" link twice in quick succession, #1886

Open
dpc22 opened this issue Sep 2, 2024 · 11 comments
Labels

Comments

@dpc22
Copy link
Contributor

dpc22 commented Sep 2, 2024

Version

6.2.72

Installation method

My own rpm, derived from "official" rpm for RHEL 9.

Expected behavior

Moderation messages contain some some text of the form:

To distribute the attached message in list ucam-language-sciences:
https://lists.cam.ac.uk/sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 

If someone clicks on that link twice in rapid succession, I would expect the first click to distribute the message, and the second click to generate an error page.

Actual behavior

The user reported the error page, but the message was not distributed.

I did find the following in /var/spool/sympa/moderation/

-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

I don't know what these ".distribute" files are used for, but this is the only file with that suffix in /var/spool/sympa/moderation on a busy server, and it is still there 7 hours later. So I think and hope that this might be a smoking gun for whatever went wrong.

Additional information

Here are the two incoming do_distribute requests for e9f3e269192343c654798f6681a91f37

Sep 2 09:29:36 lists-1 wwsympa[3097526]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]

Sep 2 09:29:40 lists-1 wwsympa[3097453]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]

sympa_msg generated the following error at 09:29:40

Sep 2 09:29:40 lists-1 sympa_msg[1946661]: err main::#247 > Sympa::Spindle::spin#95 > Sympa::Spindle::DoCommand::_twist#120 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::distribute::_twist#61 Unable to find message with key for list Sympa::List [email protected]

That is the only time that "Unable to find message with key" appears in my logs in the last 28 days. (I have lots of "Unable to find message with" from wwsympa, but nothing similar from sympa_msg).

This feels very much like a race condition when wwsympa and sympa_msg are trying to work on the same moderation files at the same time. I hope that the single "Unable to find message with key" error combined with the single distribute file in /var/spool/sympa/moderation/ will be enough to narrow down a cause.

I will attach the full logs as a text attachment to this ticket

sympa.txt

@dpc22 dpc22 added the bug label Sep 2, 2024
@ikedas
Copy link
Member

ikedas commented Sep 2, 2024

Hi @dpc22,

  • Please undo all customizations you made in the past and try again.
  • Please show all logs and describe all your operations, from when you clicked the link in the mail at the first time, to when you made the last operation.

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 3, 2024

I think that the only significant customization is Steve Shipley's modlist plugin. I'm not in a position to disable this on our live lists servers as (literally) thousands of list owners depend on this. Other that this there are mostly fixes that you yourself have proposed to problems that I have had with 6.2.72, namely:

#1685
#1699
#1725
#1862

I do have a test server which is fair game, but I am unable to reproduce the issue at the moment either on the live server or the test server, which unfortunately is often the case with awkward race conditions.

I attached the logs to the original bug report as a text attachment "sympa.txt". Everything happens in the space of 4 seconds, and appears to be a consequence of my user clicking a link twice in the moderation message that I quoted:

To distribute the attached message in list ucam-language-sciences:
https://lists.cam.ac.uk/sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 

Here are the corresponding entries in my Apache access_log, which correspond to log entries in the sympa.txt log.

10.249.206.106 - - [02/Sep/2024:09:29:36 +0100] "GET /sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 HTTP/1.1" 200 16052 "https://shib.raven.cam.ac.uk/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"

10.249.206.106 - - [02/Sep/2024:09:29:40 +0100] "GET /sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 HTTP/1.1" 200 16691 "https://shib.raven.cam.ac.uk/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"

At the moment the obvious smoking gun is the following file in /var/spool/sympa/moderation/

-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

which is the missing message.

This file is still there 23 hours later, and remains the only file with a ".distribute" suffix in that directory. I don't know if these .distribute files time out after a certain number of days or weeks. The fact that there is only one of them feels significant. What we are looking at is not normal on a busy Sympa server with 8000 active lists.

@ikedas
Copy link
Member

ikedas commented Sep 3, 2024

I think that the only significant customization is Steve Shipley's modlist plugin. I'm not in a position to disable this on our live lists servers as (literally) thousands of list owners depend on this. Other that this there are mostly fixes that you yourself have proposed to problems that I have had with 6.2.72, namely:

#1685 #1699 #1725 #1862

I do have a test server which is fair game, but I am unable to reproduce the issue at the moment either on the live server or the test server, which unfortunately is often the case with awkward race conditions.

By undoing all these customizations and modifications It would be more helpful for other users to try to reproduce the problem. But that is not absolutely necessary.

In any case, you need to reproduce the problem on your end to get a clue as to how to solve it.

I attached the logs to the original bug report as a text attachment "sympa.txt". Everything happens in the space of 4 seconds, and appears to be a consequence of my user clicking a link twice in the moderation message that I quoted:

To distribute the attached message in list ucam-language-sciences:
https://lists.cam.ac.uk/sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 

If you saw this log and decided that it was a sign that "someone clicks on that link twice in rapid succession", you are wrong. When the user follows on the disribute link, they are demanded confirmation, and the process is executed only when they click on the submit button.

Visiting the link and clicking on the button will log do_distribute action twice. Thus, this is not a user accidentally clicking twice in a short period. This is expected normal operation.

Here are the corresponding entries in my Apache access_log, which correspond to log entries in the sympa.txt log.

10.249.206.106 - - [02/Sep/2024:09:29:36 +0100] "GET /sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 HTTP/1.1" 200 16052 "https://shib.raven.cam.ac.uk/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"

10.249.206.106 - - [02/Sep/2024:09:29:40 +0100] "GET /sympa/distribute/ucam-language-sciences/e9f3e269192343c654798f6681a91f37 HTTP/1.1" 200 16691 "https://shib.raven.cam.ac.uk/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15"

At the moment the obvious smoking gun is the following file in /var/spool/sympa/moderation/

-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

which is the missing message.

This file is still there 23 hours later, and remains the only file with a ".distribute" suffix in that directory. I don't know if these .distribute files time out after a certain number of days or weeks. The fact that there is only one of them feels significant. What we are looking at is not normal on a busy Sympa server with 8000 active lists.

Renaming u(...)7.distribute in moderation spool to u(...)7 may allow the message to be distributed again. By this, you may try to reproduce the problem.

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 3, 2024

If you saw this log and decided that it was a sign that "someone clicks on that link twice in rapid succession", you are wrong. When the user follows on the disribute link, they are demanded confirmation, and the process is executed only when they click on the submit button.

I just tried this on a test list and there was no confirmation dialogue.

The following URL:

https://lists.cam.ac.uk/sympa/distribute/test-dpc22/ebe7220098759658fd75ad6d302bd450

flashed up a message "distribute: action will complete soon", and then showed me the output of:

https://lists.cam.ac.uk/sympa/modindex/test-dpc22

without any HTTP redirect. In fact I ran the same test before I submitted the original ticket to confirm that there wasn't a confirmation dialogue, before I asserted that this was the case there.

Is is possible this behaviour is influenced by a setting in sympa.conf or a scenari rule? The only custom scenari that we have are send.* rules.

We do use web_sso in /etc/sympa/auth.conf (Shibboleth), but I do see the same thing if I use a username and password to authenticate instead.

@ikedas
Copy link
Member

ikedas commented Sep 3, 2024

  • Had the message been distributed?
  • What about log output?

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 3, 2024

Had the message been distributed?

I see a different message in my Web browser if the message was already distributed:

 ERROR (distribute) - Unable to access the message on list test-dpc22 with key ebe7220098759658fd75ad6d302bd450. Warning: this message may already have been sent by one of the list's moderators. 

(I think that is what you are asking here).

What about log output?

I think that you have all the log output that I have now: the sympa.txt attachment at the bottom of the original bug report, plus the two lines from the Apache access_log later on.

I'm happy to ramp up logging on my test server or add custom logging to either the test or live system, but that probably isn't going to get us very far if I can't reproduce an issue reported by a random list owners. I don't think that this can be common given that this is the first time that a list owner has reported a problem to me.

@ikedas
Copy link
Member

ikedas commented Sep 3, 2024

I think that you have all the log output that I have now: the sympa.txt attachment at the bottom of the original bug report, plus the two lines from the Apache access_log later on.

No. As long as the operation was performed at different times and under different conditions, it is impossible for the exact same log to be output. Unless objective information is provided that can reproduce the problem, it cannot be investigated for resolution.

I'm going to bed for now. Please continue to keep us informed if you like.

@ikedas
Copy link
Member

ikedas commented Sep 3, 2024

I wrote:

If you saw this log and decided that it was a sign that "someone clicks on that link twice in rapid succession", you are wrong. When the user follows on the disribute link, they are demanded confirmation, and the process is executed only when they click on the submit button.

Visiting the link and clicking on the button will log do_distribute action twice. Thus, this is not a user accidentally clicking twice in a short period. This is expected normal operation.

This was my misunderstanding. Strangely, Sympa currently confirms the distribute operation only when the message needs to be tagged. This is a bug and should be fixed separately.

Anyway, it is normal behaviour that "Unable to access the message" is reported on the second click. The anomaly is that the first click has not initiated message delivery.

More information is awaited.

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 4, 2024

No. As long as the operation was performed at different times and under different conditions, it is impossible for the exact same log to be output. Unless objective information is provided that can reproduce the problem, it cannot be investigated for resolution.

I only have a single operation which led to a missing message, and I attached the logs relating to that event to this ticket.

The only part that I elided was the original message delivery which generated the moderation request, which I can include if you like. The list moderator (who was the same person to send the message) clicked on the "distribute" link in the moderation message that she received 27 seconds later, and again 4 seconds after that.

I agree that we need a reproducible test case, but as I indicated yesterday I'm not having any success in generating one.

The only obvious curiosity that I have to work with is the following in /var/spool/sympa/moderation/

-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

which is still there two days later. This remains the only ".distribute" file in that directory, on a busy Sympa system.

If I have some spare time later today I will trace through the code and see if I can work out what generated that specific file, which appears to have been orphaned.

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 4, 2024

FWIW, a successful moderation cycle on the same list a few hours later looks like the following:

Sep  2 11:06:28 lists-1 wwsympa[3097453]: info main::do_distribute(9420992658abbf17fc12def51d885605) [robot lists.cam.ac.uk] [session 86881337951441] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep  2 11:06:28 lists-1 wwsympa[3097453]: info main::do_modindex() [robot lists.cam.ac.uk] [session 86881337951441] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Request::Message::__parse() Parsing: QUIET DISTRIBUTE ucam-language-sciences 9420992658abbf17fc12def51d885605
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spindle::ProcessMessage::_twist() Processing Sympa::Request <action=distribute;[email protected]>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spool::store() Sympa::Message <[email protected]_9420992658abbf17fc12def51d885605.distribute/shelved:arc_cv=pass;dkim_sign> is stored into Sympa::Spool::Archive as <1725271590.1725271590.678475.ucam-language-sciences@lists.cam.ac.uk,1946661,9580>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spool::store() Sympa::Message <[email protected]_9420992658abbf17fc12def51d885605.distribute/shelved:arc_cv=pass;dkim_sign> is stored into Sympa::Spool::Digest <[email protected]> as <1725271590.1725271590.693644,1946661,5903>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spindle::ToList::_send_msg() No non VERP subscribers left to distribute message to list Sympa::List <[email protected]>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spindle::ToList::_send_msg() No non VERP subscribers left to distribute message to list Sympa::List <[email protected]>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spool::Outgoing::store() Message Sympa::Message <[email protected]_9420992658abbf17fc12def51d885605.distribute/shelved:arc_cv=pass;decorate;dkim_sign;dmarc_protect> is stored into bulk spool as <5.5.1725271590.1725271590.704931.ucam-language-sciences@lists.cam.ac.uk_z,1946661,7916>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spool::Outgoing::store() Message Sympa::Message <[email protected]_9420992658abbf17fc12def51d885605.distribute/shelved:arc_cv=pass;decorate;dkim_sign;dmarc_protect> is stored into bulk spool as <5.5.1725271590.1725271590.714288.ucam-language-sciences@lists.cam.ac.uk_0,1946661,6450>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: notice Sympa::Spindle::ToList::_send_msg() No non VERP subscribers left to distribute message to list Sympa::List <[email protected]>
Sep  2 11:06:30 lists-1 sympa_msg[1946661]: info Sympa::Spindle::ToList::_twist() Message Sympa::Message <[email protected]_9420992658abbf17fc12def51d885605.distribute/shelved:arc_cv=pass;dkim_sign> for Sympa::List <[email protected]> from [email protected] accepted (0.04 seconds, 14 sessions, 561 subscribers), message ID=LO4P265MB59320B6DB9284366BCD7DF3DF7922@LO4P265MB5932.GBRP265.PROD.OUTLOOK.COM, size=18119

compared to the one that failed, which has a repeated do_distribute action, and two different errrors:

Sep  2 09:29:31 lists-1 wwsympa[3096790]: info main::do_login() [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [list ucam-language-sciences]
Sep  2 09:29:36 lists-1 wwsympa[3097526]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep  2 09:29:36 lists-1 wwsympa[3097526]: info main::do_modindex() [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep  2 09:29:40 lists-1 wwsympa[3097453]: info main::do_distribute(e9f3e269192343c654798f6681a91f37) [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]
Sep  2 09:29:40 lists-1 sympa_msg[1946661]: notice Sympa::Request::Message::__parse() Parsing: QUIET DISTRIBUTE ucam-language-sciences e9f3e269192343c654798f6681a91f37
Sep  2 09:29:40 lists-1 sympa_msg[1946661]: notice Sympa::Spindle::ProcessMessage::_twist() Processing Sympa::Request <action=distribute;[email protected]>
Sep  2 09:29:40 lists-1 sympa_msg[1946661]: err main::#247 > Sympa::Spindle::spin#95 > Sympa::Spindle::DoCommand::_twist#120 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::distribute::_twist#61 Unable to find message with key <e9f3e269192343c654798f6681a91f37> for list Sympa::List <[email protected]>
Sep  2 09:29:40 lists-1 wwsympa[3097453]: err main::#1562 > main::do_distribute#7740 [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences] Unable to find message with <e9f3e269192343c654798f6681a91f37> for list Sympa::List <[email protected]>
Sep  2 09:29:40 lists-1 wwsympa[3097453]: info main::do_modindex() [robot lists.cam.ac.uk] [session 95252205934297] [client 10.249.206.106] [user [email protected]] [list ucam-language-sciences]

"Unable to find message with <..." is the normal message for repeated do_distribute actions, leading to the error screen that we were both expecting to see.

"Unable to find message with key <..." (note "with key") is the only instance of that error in the last 30 days of logs.

The key corresponds to the file in /var/spool/sympa/moderation/

-rw-r----- 1 sympa sympa 17859 Sep 2 09:29 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

@dpc22
Copy link
Contributor Author

dpc22 commented Sep 4, 2024

Okay, the Sympa::Spool stuff is causing my brain to leak out of my ears,.

I do now understand that wwsympa do_distribute() renames the file in the moderation spool to add a ".distribute" suffix, which should stop repeat distribute actions on the same key. So that is where that file came from. It then injects "QUIET DISTRIBUTE ucam-language-sciences e9f3e269192343c654798f6681a91f37" into the sympa incoming mail queue to be processed by the sympa_msg daemon.

I don't understand why lib/Sympa/Request/Handler/distribute.pm

    my $spindle = Sympa::Spindle::ProcessModeration->new(
        distributed_by => $sender,
        context        => $robot,
        authkey        => $key,
        quiet          => $request->{quiet}
    );

    unless ($spindle and $spindle->spin) {    # No message.
        $log->syslog('err',
            'Unable to find message with key <%s> for list %s',
            $key, $list);

failed to find the following file in /var/spool/sympa/moderation:

-rw-r----- 1 sympa sympa 17859 2024-09-02 09:29:08.000000000 +0100 [email protected]_e9f3e269192343c654798f6681a91f37.distribute

with the following error:

Sep  2 09:29:40 lists-1 sympa_msg[1946661]: err main::#247 > Sympa::Spindle::spin#95 > Sympa::Spindle::DoCommand::_twist#120 > Sympa::Spindle::spin#95 > Sympa::Request::Handler::distribute::_twist#61 Unable to find message with key <e9f3e269192343c654798f6681a91f37> for list Sympa::List <[email protected]>

given lib/Sympa/Spool/Moderation.pm

use constant _marshal_format => '%s@%s_%s%s';
use constant _marshal_keys   => [qw(localpart domainpart AUTHKEY validated)];

localpart, domainpart, AUTHKEY all seem to match, and perldoc on that module tells me:

=item {validated}

Keeps a string representing extension, if message has been renamed using
remove() with option.

We also have the following, which seems to match:

use constant _marshal_regexp =>
    qr{\A([^\s\@]+)\@([-.\w]+)_([\da-f]+)(.distribute)?\z};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants