-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyVim Slows Down on Large Files #131
Comments
I also wanted to add that there are other python-based text editors such as Suplemon, which also don't slow down when editing large files with syntax highlighting (Suplemon isn't written with Prompt Toolkit, though). I really want to use prompt_toolkit though, because it's compatible with everything else I want such as PtPython, offering autocomplete prompts etc... |
One file with 10,000 LOC, that's a problem in itself lol. |
totally disagree. what if you want to inspect a javascript library like jquery or ajax? because you just want to analyze the code. |
anyway. the performance is crap, even with 300 LOC file. |
Hi all, The main reason that the performance suffers on big files is because of the way the editor buffer is stored in prompt_toolkit. We are using a simple Python string to represent the buffer content. But Python strings are immutable, so every modification (like typing a single character) involves copying over the string into a new string. That doesn't work for big files. To work around this, prompt_toolkit should use a "rope" or similar data structure, but this is far from trivial. Almost all code (like regex search, etc...) operates on Python strings. Syntax highlighting is not an issue. Depending on the file type we have synchronization points. Prompt_toolkit looks for a start point close to the cursor position to start the highlighting: https://github.com/prompt-toolkit/python-prompt-toolkit/blob/master/prompt_toolkit/lexers/pygments.py#L113 Now, I think it's important to notice that prompt_toolkit was not designed from the ground up to become a real text editor. It was mainly a Readline replacement. It just happened to include all capabilities to build a text editor on top of it, as long as the size of the text files was reasonable. Now, if somebody is willing to implement a "rope" data structure or similar (I don't know the trade-offs), ideally without 3rd party dependencies, in a clean way, unit-tested, and backward-compatible with the rest of the code, I would consider adopting it. But it's certainly not trivial. Especially that Python's regex engine expects plain strings. |
you didnt answer my questions .. |
@jonathanslenders I don't think the current bottleneck is caused by the immutability of strings; I still think it's because we retokenizing the buffer each time an edit is made. The main reason I think syntax highlighting is the culprit is because with syntax highlighting off, it gets a lot faster. Edits only seem to lag badly when syntax highlighting is turned on. This hypothesis was supported when I tried using py-spy to profile pyvim as it was running - it showed pygments code taking up a lot of time. I understand the reasoning though - the time complexity of making any tiny edit to a string that's immutable is O(n) with respect to the length of the string, and using a rope datastructure could turn that into O(length of the edited line + log(number of lines)). However, python's string editing is very fast. And though I agree - to make this a truly fast text editor we probably will need some kind of datastructure like that, perhaps using the blist module - I don't think that's the current bottleneck. For example, try running this code:
This takes a string with 10,000 lines of code, each with 1000 characters, and makes one 1000 modifications to it in the worse case (where a character is added to the beginning of the string, which python handles very slowly in comparison to adding a character to the end of a string). I'm not sure how this performance would generalize to regex expressions, but if I recall correctly I don't think basic edits in the buffer class such as inserting or deleting characters make use of the regex module. These basic edits (such as inserting characters, using arrow keys and pressing backspace) are what I was testing when experiencing lag. I don't know how to fix this problem, but I'm hoping this will point us in the right direction. You did mention synchronization points though - I'm not quite sure what that means, but does that mean it doesn't need to retokenize the entire string on every modification? |
@alexzanderr: Actually I have tried pypy with success. I don't recall any numbers, but it was definitely a bit faster than cpython back then. @RyannDaGreat: |
@jonathanslenders I was testing it with Python files, in particular a few large ones such as |
I'm considering trying to build a new text editor with Prompt Toolkit, and I used PyVim to see how practical this would be. Unfortunately, although PyVim is wonderful, it chokes on files that have 10,000 lines of code upward (but only when using syntax highlighting). Native vim doesn't slow down on the same file; or files that are 100,000 lines upward (I'm talking about the time it takes for me to insert or delete a character).
I think it has to do with the way prompt_toolkit handles syntax higlighting via pygments; and that it's not cacheing it as well as it could (because it slows down to unbearable speeds when you give it enough code at a time; vs vim which seems to maintain a constant speed regardless of how many lines of code are in the file you're editing)
Because of this I've decided to wait out on building a text editor with prompt_toolkit until I'm assured that it's time complexity for syntax highlighting is not a function of file size; or at least, is so fast that I don't feel like it's lagging badly.
Is there any way to get around this? (I want to edit large files in a prompt_toolkit buffer with syntax highlighting)
Thank you,
Ryan
The text was updated successfully, but these errors were encountered: