Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request: Refactor Recursive File Pinning to Use Multithreading #30

Open
2 tasks done
Lypsolon opened this issue Aug 28, 2024 · 0 comments
Open
2 tasks done
Assignees

Comments

@Lypsolon
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues.

Please describe the feature you have in mind and explain what the current shortcomings are?

Issue Description:

The current implementation of the file pinning process in our codebase uses a recursive approach. While this method works effectively for smaller structures, it can potentially hit Python's recursion limit when dealing with larger structures. This not only risks stack overflow errors but also may not be the most performant solution for handling large-scale operations.

To address these concerns, I propose researching alternative methods that can:

  • Avoid recursion limits: Shift from a recursive implementation to an iterative one to handle large structures more robustly.
  • Improve performance: Explore the use of multithreading to process multiple files simultaneously, potentially reducing the overall processing time.

Action Points for Research:

  • Recursive to Iterative Approach: Investigate and propose an iterative method to replace the current recursive approach for file pinning. Analyze the potential challenges and benefits of this shift.

  • Multithreading Implementation: Explore the feasibility of using multithreading to handle multiple files concurrently. Consider the implications of threading, such as data integrity, thread safety, and the Global Interpreter Lock (GIL) in Python.

  • Performance Metrics: Benchmark the current recursive implementation and compare it with the proposed iterative and multithreaded approach to quantify performance improvements.

  • Edge Cases and Limitations: Identify any edge cases or limitations that might arise from changing the current implementation to a multithreaded approach.

  • Recommendations: Based on the research, provide recommendations on the best path forward, including any necessary changes to the codebase and potential impacts on existing functionality.

Additional Information:

  • Current recursion limit issues have been observed in [specific modules/functions], particularly when dealing with structures exceeding [specific size].
  • Consideration should be given to maintaining backward compatibility with existing functionality.

This research will guide us in refactoring the file pinning process to be more robust and performant, ensuring scalability as the codebase grows.

How would you imagine the implementation of the feature?

Implementation Suggestion: Multithreading with a Tree/Graph Structure

To implement the solution, we could use a tree or graph structure to represent the data. Here's how the approach would work:

  1. Tree/Graph Structure:

    • Each Sdf.Layer would represent an individual node in the tree/graph structure.
    • Nodes would be connected to their parent nodes, forming a hierarchy that mirrors the structure of the files.
    • Every Parent node needs to have an pointer to the child nodes for better traversal
  2. Multithreading:

    • Once the structure is established, each node (representing a traversal element or layer) could be processed on a separate thread.
    • This approach would allow for concurrent processing of multiple nodes, improving performance and scalability.
  3. Main Thread Coordination:

    • It would distribute tasks to worker threads, monitor their progress, and handle any exceptions or errors that arise during processing.
    • This approach ensures that the main thread remains available to oversee the operation and handle any issues that may occur without being bogged down by the actual file processing.

This method would allow us to avoid recursion limits by eliminating deep recursion and leveraging multithreading to parallelize the workload, resulting in improved performance, particularly with large data structures.

Are there any labels you wish to add?

  • I have added the relevant labels to the enhancement request.

Describe alternatives you've considered:

No response

Additional context:

No response

@Lypsolon Lypsolon self-assigned this Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant