s🔡 🔍❗ now returns grapheme index instead of UTF-8 index #209

joeskeen · 2022-11-27T05:49:34Z

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index

Fixes #208

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index

thbwd · 2022-11-30T14:42:31Z

Thanks for immediately opening a PR to fix this! Unfortunately, your implementation runs in O(n) in terms of memory as 🔪 allocates memory for each substring. Do you think you could maybe create an overload of 🎼 that takes a start index from which to compare? Then no substring would be needed.

joeskeen · 2022-12-02T03:19:52Z

OK I'll look into using that approach. Thanks for the feedback!

joeskeen · 2022-12-16T16:19:37Z

@thbwd I've started the approach you suggested, but it looks like I'm running into the same problem that caused the issue to begin with. Once I get over to the C++ side, the sStringBeginsWith code uses a memory comparison between string->characters.get(), beginning->characters.get(). There isn't enough context here to be able to determine where in the string to start comparing since we would have to get the graphemes, measure them, and then start the memory comparison at that calculated index. And since this would still be done in a loop, this code would have to be called multiple times.

I'm wondering if there would be a better approach involving passing not only the source string, but also the source string in grapheme array form, allowing that operation to only be done once.

joeskeen · 2022-12-16T17:29:51Z

OK @thbwd I believe this is ready to be reviewed again.

thbwd

While I think this should work, we could improve this even further by exposing an iterator that works like sStringCodepoints as an API in Emojicode. This would get rid of the array allocation. Let me know what you think 🙂

thbwd · 2023-08-07T19:22:11Z

s/String.cpp

- return false;
- }
- return std::memcmp(string->characters.get(), beginning->characters.get(), beginning->count) == 0;
+extern "C" char sStringBeginsWithAtIndex(String *string, String *beginning, int utf8Index) {


I think it should be runtime::Integer instead of int here. https://www.emojicode.org/docs/guides/api.html#function-signatures

Also, I think it would be better to check [string->characters.get() + utf8Index, string->characters.get() + utf8Index + beginning->count) is in bounds here, as this method is exposed publicly below.

thbwd · 2023-08-07T19:29:44Z

s/🔡.🍇

+
+ 🔁 byteIndex ➕ searchByteLength ◀️🙌 byteLength 🍇
+ 🐽chars graphemeIndex❗ ➡️ grapheme
+ 📐grapheme❗ ➡️ graphemeLength


This can be moved after the if statement

fix emojicode#208

8531f07

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index

joeskeen changed the title ~~fix #208~~ s🔡 🔍❗ now returns grapheme index instead of UTF-8 index Nov 27, 2022

🚧WIP: string index via startsWith approach

d4a2e45

joeskeen added 2 commits December 16, 2022 09:55

🔨fix/optimize string searching

c563b54

🔧 fix up string search

59e0f99

thbwd reviewed Aug 7, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index #209

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index #209

joeskeen commented Nov 27, 2022 •

edited

thbwd commented Nov 30, 2022

joeskeen commented Dec 2, 2022

joeskeen commented Dec 16, 2022

joeskeen commented Dec 16, 2022

thbwd left a comment

thbwd Aug 7, 2023

thbwd Aug 7, 2023

thbwd Aug 7, 2023

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index #209

Are you sure you want to change the base?

s🔡 🔍❗ now returns grapheme index instead of UTF-8 index #209

Conversation

joeskeen commented Nov 27, 2022 • edited

thbwd commented Nov 30, 2022

joeskeen commented Dec 2, 2022

joeskeen commented Dec 16, 2022

joeskeen commented Dec 16, 2022

thbwd left a comment

Choose a reason for hiding this comment

thbwd Aug 7, 2023

Choose a reason for hiding this comment

thbwd Aug 7, 2023

Choose a reason for hiding this comment

thbwd Aug 7, 2023

Choose a reason for hiding this comment

joeskeen commented Nov 27, 2022 •

edited