Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sπŸ”‘ πŸ”β— now returns grapheme index instead of UTF-8 index #209

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

joeskeen
Copy link
Contributor

@joeskeen joeskeen commented Nov 27, 2022

sπŸ”‘ πŸ”β— now returns grapheme index instead of UTF-8 index

Fixes #208

sπŸ”‘ πŸ”β— now returns grapheme index instead of UTF-8 index
@joeskeen joeskeen changed the title fix #208 sπŸ”‘ πŸ”β— now returns grapheme index instead of UTF-8 index Nov 27, 2022
@thbwd
Copy link
Member

thbwd commented Nov 30, 2022

Thanks for immediately opening a PR to fix this! Unfortunately, your implementation runs in O(n) in terms of memory as πŸ”ͺ allocates memory for each substring. Do you think you could maybe create an overload of 🎼 that takes a start index from which to compare? Then no substring would be needed.

@joeskeen
Copy link
Contributor Author

joeskeen commented Dec 2, 2022

OK I'll look into using that approach. Thanks for the feedback!

@joeskeen
Copy link
Contributor Author

@thbwd I've started the approach you suggested, but it looks like I'm running into the same problem that caused the issue to begin with. Once I get over to the C++ side, the sStringBeginsWith code uses a memory comparison between string->characters.get(), beginning->characters.get(). There isn't enough context here to be able to determine where in the string to start comparing since we would have to get the graphemes, measure them, and then start the memory comparison at that calculated index. And since this would still be done in a loop, this code would have to be called multiple times.

I'm wondering if there would be a better approach involving passing not only the source string, but also the source string in grapheme array form, allowing that operation to only be done once.

@joeskeen
Copy link
Contributor Author

OK @thbwd I believe this is ready to be reviewed again.

Copy link
Member

@thbwd thbwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think this should work, we could improve this even further by exposing an iterator that works like sStringCodepoints as an API in Emojicode. This would get rid of the array allocation. Let me know what you think πŸ™‚

return false;
}
return std::memcmp(string->characters.get(), beginning->characters.get(), beginning->count) == 0;
extern "C" char sStringBeginsWithAtIndex(String *string, String *beginning, int utf8Index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be runtime::Integer instead of int here. https://www.emojicode.org/docs/guides/api.html#function-signatures

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think it would be better to check [string->characters.get() + utf8Index, string->characters.get() + utf8Index + beginning->count) is in bounds here, as this method is exposed publicly below.


πŸ” byteIndex βž• searchByteLength β—€οΈπŸ™Œ byteLength πŸ‡
🐽chars graphemeIndex❗ ➑️ grapheme
πŸ“grapheme❗ ➑️ graphemeLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be moved after the if statement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect or inconsistent behavior with sπŸ”‘ πŸ”β— method
2 participants