Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create and use a separate table (parts_by_lcsc) to index FTS5 parts table by rowid to speed up. #444

Closed
wants to merge 5 commits into from

Conversation

gyohng
Copy link
Contributor

@gyohng gyohng commented Apr 21, 2024

The purpose of this PR is to increase the speed of any operation that involves list rebuilding, which is basically any operation, such as part selection, sorting, hiding/showing BOM parts, or even invoking the plugin window.

The database increase due to this is +3.3%, which I think is a reasonable trade-off until a better way is found to integrate it with FTS5 directly while maintaining the speed.

The problem is illustrated in this YouTube video (the principal part of the commit is initially commented out and then uncommented to show the difference in performance impact).

Youtube video with demonstrating the problem

The commit includes the following changes:

  1. Creation of a New Table (parts_by_lcsc): A new table is created in the database to hold mappings between part identifiers (partsId) and the LCSC Part numbers. This table is designed to optimise lookup times by indexing parts based on their LCSC Part number rather than through direct queries on the parts table, which seems to be slow.

  2. Modification in get_part_details Function:

    • The method has been updated to first attempt fetching partsId values from the new parts_by_lcsc table using the provided LCSC Part numbers.
    • If the new method finds all required partsId, it then fetches the parts details directly using these IDs, which is much faster as it leverages the row IDs for direct access instead of scanning through text fields.
    • In case of any issues (e.g., the new table not having all the entries), it falls back to the original method of fetching data directly using the LCSC Part numbers from the parts table.
  3. Database Indexing in the download Method:

    • A significant process added in the download method is the population of the parts_by_lcsc table and the removal of an old index if it exists.
    • It iterates through every row in the parts table, inserts or updates the mapping in the parts_by_lcsc table, and updates a progress indicator, which communicates the operation's progress to the user interface.
    • After filling up the new table, it creates an index on the lcsc column of the parts_by_lcsc table to speed up the lookups.
    • This indexing is expected to significantly reduce the query time by utilising a quicker, more efficient index.
  4. User Interface Updates: The code updates a gauge control that reflects the progress of the database indexing operation. On a modern M1 MacBook Pro it finishes under 5 seconds and is significantly faster than the downloading.

@chmorgan
Copy link
Collaborator

chmorgan commented Apr 21, 2024

Can we avoid rebuilding the list to reduce the impact of list rebuilding being expensive?

I get having a separate index table and considered such things. Imo it isn't worth the complexity if the cost of a few seconds is only when users alter the sort list of a component list with 100+ components. But again, I'm likely biased.

@gyohng
Copy link
Contributor Author

gyohng commented Apr 22, 2024

Can we avoid rebuilding the list to reduce the impact of list rebuilding being expensive?

I don't think it's possible if we want to keep the component list synchronised to the database. Even at a cost of a major refactor, you'll still need to do it every time the plugin starts, which is quite annoyingly slow right now.

In the view of the FTS5 change, which expanded the database four-fold (and if you download the second time, watch for the multi-gigabyte bak file), I think a db size increase of 3.3% and an extra SELECT is not such a big thing.

@whmountains
Copy link
Collaborator

whmountains commented Apr 24, 2024

I second the observation that plugin startup has been very slow since the fts5 change. The plugin also freezes for about 20 seconds whenever I select a part which sort of defeats the purpose of fts5 in the first place. Search as you type is no fun if you then have to wait for the changes to commit.

I was just in the process of doing some profiling to troubleshoot the issue but it looks like @gyohng may have found it already.

(I'm using MacOS and KiCAD 8 BTW)

@chmorgan
Copy link
Collaborator

chmorgan commented Apr 24, 2024 via email

@whmountains
Copy link
Collaborator

whmountains commented Apr 24, 2024

The slowness seems to come from the populate_footprint_list function. I added some log statements to print the execution time whenever the function was called. Here are some measurements I got in standalone mode:

Time taken in the populate_footprint_list function:

  • Startup - 15.94s
  • Click "Select part" from the part picker dialog - 13.74s
  • Sort by stock in the main window - 13.40s.

@chmorgan
Copy link
Collaborator

chmorgan commented Apr 24, 2024 via email

@whmountains
Copy link
Collaborator

whmountains commented Apr 24, 2024

I think #360 is related. The issue was actually fixed in 5eb59e2, with the comment "massive startup time improvement" but it never got closed. We lost this index with the move to FTS5, which caused startup time to once again take a lot longer.

@gyohng
Copy link
Contributor Author

gyohng commented Apr 25, 2024

I think #360 is related. The issue was actually fixed in 5eb59e2, with the comment "massive startup time improvement" but it never got closed. We lost this index with the move to FTS5, which caused startup time to once again take a lot longer.

Hi! :) I am actually a fan of your previously submitted PR with LIKE optimisation, but now FTS5 took over...

This PR is essentially a replica of 5eb59e2. The main problem with 5eb59e2 (indexing by LCSC Part) now is that there's no way to reintroduce it with FTS5 - the FTS5's own index isn't quite as efficient, and sqlite won't accept index creation for an FTS table.

I went even as far as creating an index on the internal parts_content table for the FTS structure (it didn't help). If I had to choose, I'd probably keep the FTS5 part separately just for the part search and otherwise resort to the old table with index approach for everything else, even despite the data growth.

@whmountains
Copy link
Collaborator

Hi @gyohng! I thought I already replied but I can't find the comment so I'll reply again.

I think this comes down to workflow. You and I make heavy use of parametric search, so #439 was really all we needed and FTS5 has a high bar to meet. @chmorgan and probably a lot of other people prefer a single search-box interface hence the project going in that direction.

But I think we all agree that the UI freezing for 15 seconds or more every time a part is selected is not acceptable to anyone's workflow. You may be interested in the discussion under #451, where we are trying to figure out where the slowdown is coming from. And #450 which is a totally diffferent idea for a parts database.

@Bouni
Copy link
Owner

Bouni commented Jun 7, 2024

I just tried this PR but it always stops at "DEBUG - download - Indexing parts table..."
The progress bar is not filling up so I guess something is not working as expected.
@gyohng Any ideas why?

@gyohng
Copy link
Contributor Author

gyohng commented Jun 8, 2024

I just tried this PR but it always stops at "DEBUG - download - Indexing parts table..." The progress bar is not filling up so I guess something is not working as expected. @gyohng Any ideas why?

It used to work fine, I will check with the latest pull. By the way, I noted that your logging is buffered and not flushed, so you normally won't see things until several lines (sometimes minutes) afterwards.

@gyohng
Copy link
Contributor Author

gyohng commented Jul 3, 2024

Please check again. 'IF EXISTS' clause was missing from DROP TABLE. I could never run in to it at my end, because this table was already indexed with a previous version. Now it should work fine.

@gyohng gyohng force-pushed the speedup-fix--add-index-table branch from c83db12 to b5b4299 Compare July 3, 2024 08:22
@gyohng gyohng force-pushed the speedup-fix--add-index-table branch 4 times, most recently from 93202d6 to b81f952 Compare July 3, 2024 08:39
@gyohng gyohng force-pushed the speedup-fix--add-index-table branch from b81f952 to 062be73 Compare July 3, 2024 08:40
@gyohng
Copy link
Contributor Author

gyohng commented Jul 3, 2024

I added another commit to make linter happy with this PR.

@Oscilllator
Copy link
Contributor

I didn't see this pull request prior to taking a look at the performance myself, but I was able to make get_part_details 100x faster without too much trouble in #503. From talking with my good friend ChatGPT it seems that many of the SQL queries seem to not take into account the FTS5 nature of the table.

@chmorgan
Copy link
Collaborator

@Oscilllator speeding the query without the additional table is a simpler approach.

@gyohng
Copy link
Contributor Author

gyohng commented Jul 28, 2024

@chmorgan if the other one works the same fast, let's close then?

@chmorgan
Copy link
Collaborator

@gyohng it might not be as fast but it does appear like its faster (note that I'm back using the plugin to release some boards but I haven't tested it). Your call on closing this one if you think its been solved, or keeping it open if you think this is a better approach.

@Bouni
Copy link
Owner

Bouni commented Aug 5, 2024

I'll close this as its no longer an issue

@Bouni Bouni closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants