Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUGFIX] Fix wrong sorting of volumes in listview and toc #931

Merged

Conversation

michaelkubina
Copy link
Collaborator

This PR fixes an issue, where in certain cases volumes get sorted the wrong way in the listview and the toc. The "MODS-Anwendungsprofil" (page 42) specifies in the section for mods:part, that the order-attribute needs to be a positive integer for sorting-related purposes. This is also considered in the MetadataDefaults.php with the XPATH for volume_sorting retrieving the value from ./mods:part[@type="host"]/@order

The quierybuilder requests the value being sorted in ascending order, but since its declared as a VARCHAR in the database-table it gets sorted lexicographically. This causes the following order for cases where alphabetical sorting differs from numerical sorting:

1
10
11
2
3
4
...

This PR applies a typecast to the querybuilder, that casts the value to an UNSIGNED-integer, that is then sorted numerically in ascending order.

The (likely better) alternative would be to change the fieldtype in the database layout ext_tables.sql to an integer, but would require a DB-migration when upgrading the extension.

@sebastian-meyer sebastian-meyer self-requested a review April 5, 2023 12:15
@sebastian-meyer sebastian-meyer added the 🐛 bug A non-security related bug. label Apr 5, 2023
@sebastian-meyer sebastian-meyer changed the title Fix wrong sorting of volumes in listview and toc [BUG] Fix wrong sorting of volumes in listview and toc Apr 5, 2023
@sebastian-meyer
Copy link
Member

  1. We need to make sure that this also works for queries to the Solr index, not just for database queries. Both should behave the same way.
  2. I am not sure, if casting to UNSIGNED is the proper way of dealing with this issue. This only works if volume_sorting only holds integer values. Theoretically, it can hold string values as well, for which the change would break sorting. A better approach would be to use a specific collation for sorting, but I don't know how to do this with Doctrine. (In pure SQL it would look like ORDER BY volume_sorting COLLATE '...'.)

@sebastian-meyer sebastian-meyer changed the title [BUG] Fix wrong sorting of volumes in listview and toc [BUGFIX] Fix wrong sorting of volumes in listview and toc Aug 7, 2023
@sebastian-meyer
Copy link
Member

Is this related to #984?

Copy link
Member

@sebastian-meyer sebastian-meyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments above.

@sebastian-meyer
Copy link
Member

@michaelkubina I'd like to add this to the upcoming 5.0 release, but you haven't addressed my remarks above. Could you please tell me how you would like to continue?

@michaelkubina
Copy link
Collaborator Author

There is a certain relation to #984, where its focused on the mets_orderlabel from what i see. But this PR adresses wrong sorting of volumes in the ToC and also the sorting of volumes in the moredetails section in the Listview, when volume_sorting is actually available instead of the fallback to the mets_orderlabel.

as to 1)
Sadly SOLR is incapable of using natural sorting on a string field, which the dynamic field *_sorting becomes, when indexed - otherwise it could have simply be applied globally.
Just as a quick thought: For this to work theoretically, we would need to manually re-sort a solr result for volume_sorting - i believe this creates new issues, that need to be resolved - first thought is keeping track of the pagination of results. This is not the scope of this PR - it adresses strictly the ToC and the children in the listview.moredetails section.

as to 2)
If volume_sorting is set, the ToCController (using getTableOfContentsFromDb() from the DocumentRepository) and the ListviewController (using findAllByUids() from the DocumentRepository through the SolrSearch) will use this metadata as the primary sorting criterium. As i have described, the application profile says, that the @ORDER attribute in the mods:part must be of type integer. And the MetadataDefaults.php is actually using this attribute as the source for volume_sorting:

    'volume' => [
        'format' => [
            [
                'format_root' => 'mods',
                'xpath' => './mods:part/mods:detail/mods:number',
                'xpath_sorting' => './mods:part[@type="host"]/@order',
            ],
        ],

...

    ],

While true that it can theoretically be also a string or alphanumeric (because the database allows for it), in case of a METS file it should/must not be - or are you thinking of a custom configuration here?

The MODS-Schema from the LOC is also clear, that the @order attribute should be an integer (https://www.loc.gov/standards/mods/mods.xsd): <xs:attribute name="order" type="xs:integer"/>.

as to how to proceed)
Typecasting was the easiest solution, without altering the current database definition - alternativly i tried to use the query-result and using PHP natural sorting mechanism, which would work as well with an anonymous inline function via natural string comparison. This way we could at least get rid of typecasting and still get the wanted result - i would need to SELECT volume_sorting as well then for it to work:

$hasVolumeSorting = !empty(array_filter(array_column($allResults, 'volume_sorting')));
if ($hasVolumeSorting) {
    // Perform natural sorting only if 'volume_sorting' is present and not empty
    usort($allResults, function($a, $b) {
        return strnatcmp($a['volume_sorting'], $b['volume_sorting']);
    });
}

Would you be more comfortable with this approach, which does not involve typecasting in the sql-query?

I have experimented with COLLATIONS, and they have not worked for me resolving this issue, but also it has shown, that it is highly dependend of the database one is using and of the configuration. I have got an error like /* SQL-Fehler (1253): COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'utf8mb3' */, which indicates that i need to reconfigure my.cnf for it to work. So this could potentialy be a cause of issues for other users as well. I was not able to get it work with the Typo3 querybuilder (doctrine-dbal), but even if it could still cause the above mentioned issue. So i would rather like to not to look further into it.

@sebastian-meyer
Copy link
Member

Thank you very much for the explanation! That totally makes sense!

@sebastian-meyer sebastian-meyer merged commit a0af049 into kitodo:master Feb 28, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug A non-security related bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants