Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check to make sure helpers need to be killed before sending kill signal #89

Merged
merged 4 commits into from
Sep 27, 2024

Conversation

eriktaubeneck
Copy link
Member

@eriktaubeneck eriktaubeneck commented Sep 26, 2024

the failure we we're seeing was an issue with this kill signal failing with an http error. we wrap that in a try/except in #88 to make sure it always cleans up, but this change makes sure we don't send extra kill signals that don't need to be sent. I'm not entirely convinced this would fix the issue without #88, but it seems like it could be the issue.

here are the logs from Helper 1 in the crash:

2024-09-26 08:31:47 | INFO     |    Helper 1 - thread 'query_runtime' panicked at library/core/src/panicking.rs:227:5:
2024-09-26 08:31:47 | INFO     |    Helper 1 - panic in a destructor during cleanup
2024-09-26 08:31:47 | INFO     |    Helper 1 - thread caused non-unwinding panic. aborting.
2024-09-26 08:31:47 | INFO     |    Helper 1 - sending kill signals
2024-09-26 08:31:47 | INFO     |    Helper 1 - sent post request: {"message":"Query already complete","query_id":"5931fdf3-5483-49d3-8407-7e6f0d94c7bb"}
2024-09-26 08:31:52 | ERROR    |    Helper 1 - The read operation timed out
2024-09-26 08:31:52 | INFO     |    Helper 1 - sending kill signals
2024-09-26 08:31:52 | INFO     |    Helper 1 - sent post request: {"message":"Query already complete","query_id":"5931fdf3-5483-49d3-8407-7e6f0d94c7bb"}
2024-09-26 08:31:52 | INFO     |    Helper 1 - sent post request: {"message":"Query already complete","query_id":"5931fdf3-5483-49d3-8407-7e6f0d94c7bb"}
2024-09-26 08:31:52 | INFO     |    Helper 1 - sent post request: {"message":"Query already complete","query_id":"5931fdf3-5483-49d3-8407-7e6f0d94c7bb"}

associated with this error on the report collector:

httpx.ReadTimeout: The read operation timed out

Summary by CodeRabbit

  • New Features

    • Enhanced status checking process for improved clarity and efficiency.
    • Introduced a new method in the Status class to handle JSON input for status retrieval.
    • Added new methods in the Helper class for managing query statuses, including methods to construct URLs and manage query actions.
    • Added a property to return a list of helper instances that do not match the current role.
  • Bug Fixes

    • Improved handling of unknown statuses by defaulting to UNKNOWN when the status key is missing.
  • Tests

    • Added comprehensive tests for the new from_json method to ensure correct behavior with various JSON inputs.
  • Chores

    • Updated project configuration to include tomllib as part of the extra standard libraries.

Copy link

vercel bot commented Sep 26, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
draft ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 27, 2024 3:56am

Copy link
Contributor

coderabbitai bot commented Sep 26, 2024

Walkthrough

The changes involve modifications to the send_kill_signals, run, and send_finish_signals methods in sidecar/app/query/ipa.py, enhancing the logic for managing query statuses. A new from_json method is introduced in the Status class within sidecar/app/query/status.py, allowing for improved parsing of status from JSON input. Additionally, several new methods are added to the Helper class in sidecar/app/helpers.py for better query management. A new test function is also included in sidecar/tests/app/query/test_status.py to validate the behavior of the from_json method. The pyproject.toml file is updated to recognize tomllib, and a new property method is added to sidecar/app/settings.py.

Changes

Files Change Summary
sidecar/app/query/ipa.py Modified send_kill_signals, run, and send_finish_signals methods to improve status checking logic and streamline code using enums.
sidecar/app/query/status.py Added from_json class method to Status for parsing status from JSON, handling errors with a default return value.
sidecar/app/helpers.py Introduced methods for constructing query management URLs and handling query status, including kill_query and finish_query.
sidecar/tests/app/query/test_status.py Added test_status_from_json to validate the from_json method against various JSON inputs and expected outputs.
pyproject.toml Updated to include "tomllib" in the extra_standard_library entry.
sidecar/app/settings.py Added a property method other_helpers to return a list of Helper instances excluding the current instance's role.

Poem

In the code where rabbits play,
Status checks now lead the way.
With enums bright and tests so keen,
A smoother path, a clearer scene.
Hopping through JSON, swift and spry,
Our code now dances, oh so high! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1b0f4d9 and 8a9dcb2.

📒 Files selected for processing (1)
  • sidecar/app/helpers.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sidecar/app/helpers.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 40e7c9b and bbfa409.

📒 Files selected for processing (3)
  • sidecar/app/query/ipa.py (2 hunks)
  • sidecar/app/query/status.py (1 hunks)
  • sidecar/tests/app/query/test_status.py (1 hunks)
🔇 Additional comments (4)
sidecar/app/query/status.py (2)

23-30: Consider broader implications of the new from_json method

The addition of the from_json method to the Status enum is a good step towards standardizing status parsing across the project. This aligns with the PR objective of improving status checking logic.

While this change doesn't directly address the issue of checking if helpers need to be killed before sending a kill signal, it could be a crucial building block for implementing such checks. By providing a standardized way to parse status from JSON, it enables more robust status checking in other parts of the code where the kill signal logic is implemented.

Consider the following to maximize the impact of this change:

  1. Update any existing code that parses status from JSON to use this new method, ensuring consistency across the codebase.
  2. Review the areas of code responsible for sending kill signals, and consider how this new method can be utilized to improve the logic there.
  3. Update documentation to reflect this new way of parsing status, ensuring that other developers are aware of and use this method.

To help identify areas where this new method could be utilized, we can search for places in the code that are currently parsing status manually:

#!/bin/bash
# Search for potential manual status parsing
rg --type python 'status.*=.*get\("status"'

This will help identify areas of the code that might benefit from using the new from_json method.


23-30: 🛠️ Refactor suggestion

Improve error handling and input validation in from_json method

The new from_json method is a good addition for parsing status from JSON input. However, there are a few areas where it could be improved:

  1. Input validation: The method doesn't handle cases where the input is not a dictionary or is None.
  2. Silent error handling: Returning UNKNOWN for any invalid status might mask errors.
  3. Type checking: The method doesn't validate that the "status" value is a string before using it.

Consider refactoring the method to address these issues:

@classmethod
def from_json(cls, response: Optional[dict[str, Any]]) -> Status:
    if not isinstance(response, dict):
        return cls.UNKNOWN
    
    status_str = response.get("status")
    if not isinstance(status_str, str):
        return cls.UNKNOWN
    
    try:
        return cls[status_str.upper()]
    except KeyError:
        return cls.UNKNOWN

This refactored version:

  1. Handles cases where the input is not a dictionary.
  2. Checks if the "status" value is a string.
  3. Converts the status string to uppercase for case-insensitive matching.

Additionally, consider logging warnings when returning UNKNOWN due to invalid input, to help with debugging.

To ensure this method is used correctly throughout the codebase, we can search for its usage:

This will help verify that the method is being called with the correct input type across the project.

sidecar/tests/app/query/test_status.py (1)

118-131: New test function aligns with PR objectives

The addition of the test_status_from_json() function is a positive change that aligns with the PR objectives. It provides test coverage for the new from_json method of the Status class, which is likely part of the enhanced status checking logic mentioned in the PR summary.

The test function covers the main scenarios for parsing status from JSON input, including valid statuses, invalid inputs, and empty JSON. This will help ensure the robustness of the status checking mechanism, which is crucial for addressing the issue of unnecessary kill signals being sent.

While the current implementation is good, the suggested improvements in the previous comment would further enhance the test coverage and maintainability of the code.

sidecar/app/query/ipa.py (1)

275-279: Ensure Python 3.10 compatibility for 'match' statements

The use of match and case syntax requires Python 3.10 or higher. Verify that the runtime environment supports Python 3.10. If compatibility with earlier Python versions is necessary, refactor the match statements to use if-elif conditions.

sidecar/tests/app/query/test_status.py Outdated Show resolved Hide resolved
sidecar/app/query/ipa.py Outdated Show resolved Hide resolved
sidecar/app/query/ipa.py Outdated Show resolved Hide resolved
sidecar/app/query/ipa.py Outdated Show resolved Hide resolved
sidecar/app/query/ipa.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (4)
sidecar/app/settings.py (1)

65-67: LGTM! Consider a minor optimization.

The new other_helpers property is well-implemented and provides a useful way to access helpers with different roles. The implementation is concise and efficient.

For a slight performance improvement, especially if self._helpers is large, consider using a generator expression instead of a list comprehension:

@property
def other_helpers(self) -> list[Helper]:
    return list(helper for helper in self._helpers.values() if helper.role != self.role)

This change would create the list only when it's needed, potentially saving memory if the property is accessed but the result is not fully iterated.

sidecar/app/query/ipa.py (3)

31-33: Approve changes with a suggestion for error handling

The simplification of the send_kill_signals method is a good improvement. It reduces the complexity of the code and aligns with the PR objective of preventing unnecessary kill signals.

Consider adding error handling to catch potential exceptions from the kill_query method:

 for helper in settings.other_helpers:
-    response = helper.kill_query(self.query_id)
-    self.logger.info(response)
+    try:
+        response = helper.kill_query(self.query_id)
+        self.logger.info(response)
+    except Exception as e:
+        self.logger.error(f"Failed to kill query on helper: {e}")

This will ensure that if one helper fails, it doesn't prevent sending kill signals to the remaining helpers.


240-264: Approve changes with a suggestion for timeout mechanism

The updates to the run method in IPACoordinatorWaitForHelpersStep are excellent improvements. The use of get_current_query_status and the match statement with Status enum values makes the code more robust and readable.

Consider using a more pythonic approach for the timeout mechanism:

-        max_unknonwn_status_wait_time = 100
-        current_unknown_status_wait_time = 0
-        loop_wait_time = 1
+        max_unknown_status_wait_time = 100
+        start_time = time.time()
         while True:
             status = helper.get_current_query_status(self.query_id)
             match status:
                 case Status.IN_PROGRESS:
                     break
                 case Status.KILLED | Status.NOT_FOUND | Status.CRASHED:
                     self.success = False
                     return
                 case Status.STARTING | Status.COMPILING | Status.WAITING_TO_START:
                     # keep waiting while it's in a startup state
                     continue
                 case Status.UNKNOWN | Status.NOT_FOUND:
                     # eventually fail if the status is unknown or not found
                     # for ~100 seconds
-                    current_unknown_status_wait_time += loop_wait_time
-                    if (
-                        current_unknown_status_wait_time
-                        >= max_unknonwn_status_wait_time
-                    ):
+                    if time.time() - start_time >= max_unknown_status_wait_time:
                         self.success = False
                         return
-            time.sleep(1)
+            time.sleep(1)

This approach is more accurate and easier to understand, as it uses the actual elapsed time rather than an approximation based on loop iterations.


343-348: Approve changes with a suggestion for error handling

The renaming of send_terminate_signals to send_finish_signals and the simplification of the method are good improvements. They align with the naming convention used elsewhere and reduce the complexity of the code.

Consider adding error handling to catch potential exceptions from the finish_query method, similar to the suggestion for send_kill_signals:

 def send_finish_signals(self):
     self.logger.info("sending finish signals")
     settings = get_settings()
     for helper in settings.other_helpers:
-        resp = helper.finish_query(self.query_id)
-        self.logger.info(resp)
+        try:
+            resp = helper.finish_query(self.query_id)
+            self.logger.info(resp)
+        except Exception as e:
+            self.logger.error(f"Failed to finish query on helper: {e}")

This will ensure that if one helper fails, it doesn't prevent sending finish signals to the remaining helpers.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bbfa409 and 1b0f4d9.

📒 Files selected for processing (5)
  • pyproject.toml (1 hunks)
  • sidecar/app/helpers.py (2 hunks)
  • sidecar/app/query/ipa.py (3 hunks)
  • sidecar/app/settings.py (1 hunks)
  • sidecar/tests/app/query/test_status.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sidecar/tests/app/query/test_status.py
🔇 Additional comments (3)
sidecar/app/settings.py (1)

65-67: Good integration with existing code.

The new other_helpers property integrates well with the existing codebase:

  1. It follows the established naming conventions and coding style.
  2. It complements the existing helpers property, providing a convenient way to access a subset of helpers.
  3. The addition doesn't introduce any side effects or modify existing behavior.

To further improve the code, consider adding a docstring to explain the purpose and return value of this property. For example:

@property
def other_helpers(self) -> list[Helper]:
    """
    Returns a list of Helper instances excluding the one with the same role as the current instance.
    This can be useful for inter-helper communication or coordination.
    
    Returns:
        list[Helper]: A list of Helper instances with different roles.
    """
    return [helper for helper in self._helpers.values() if helper.role != self.role]
sidecar/app/query/ipa.py (1)

Line range hint 1-424: Overall assessment: Improvements align with PR objectives

The changes made to sidecar/app/query/ipa.py are well-aligned with the PR objectives. They simplify the code, improve error handling, and prevent unnecessary kill signals. The use of helper methods like kill_query and finish_query instead of direct HTTP requests reduces complexity and potential errors in URL construction.

The updated run method in IPACoordinatorWaitForHelpersStep now handles different statuses more robustly and includes a timeout mechanism for unknown statuses, which prevents potential infinite loops.

These improvements contribute to a more maintainable and reliable codebase. The suggested error handling additions will further enhance the robustness of the system.

sidecar/app/helpers.py (1)

1-1: Verify tomllib availability for Python versions below 3.11

The tomllib module is available in Python 3.11 and above. If your project supports Python versions earlier than 3.11, consider using the third-party tomli library for TOML parsing to maintain compatibility.

To confirm the project's Python version compatibility, run the following script:

pyproject.toml Show resolved Hide resolved
sidecar/app/helpers.py Show resolved Hide resolved
sidecar/app/helpers.py Show resolved Hide resolved
sidecar/app/helpers.py Show resolved Hide resolved
sidecar/app/helpers.py Outdated Show resolved Hide resolved
current_unknown_status_wait_time
>= max_unknonwn_status_wait_time
):
self.success = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would that inform the UI that we failed to get a "good" status from helper and this is the reason why we couldn't start a query?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a Step finishes with self.success = False, it causes the Query to crash. The UI will show the report collector as crashed, and in the logs we'll see that the IPACoordinatorWaitForHelpersStep failed.

sidecar/app/helpers.py Outdated Show resolved Hide resolved
@eriktaubeneck eriktaubeneck merged commit 3046a27 into main Sep 27, 2024
3 checks passed
@eriktaubeneck eriktaubeneck deleted the kill-bug-fix branch September 27, 2024 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants