Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code review: Lib/queue.py #127092

Open
mengqinyuan opened this issue Nov 21, 2024 · 2 comments
Open

Code review: Lib/queue.py #127092

mengqinyuan opened this issue Nov 21, 2024 · 2 comments
Labels
pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir

Comments

@mengqinyuan
Copy link

mengqinyuan commented Nov 21, 2024

Code Review

Tips

My English is not so good, so I use ChatCPT to translate. Sorry for that.

Environment

OS : Windows 10
Python version: 3.11.9

Security and Vulnerabilities

  1. Exception Handling:

    • In the put and get methods, a ShutDown exception is raised when the queue is closed. It is suggested to add exception handling where these methods are called to prevent the program from crashing due to unhandled exceptions.
    • Example of modified code:
      try:
          queue.put(item, block=True, timeout=None)
      except ShutDown:
          print("Queue has been shut down.")
  2. Input Validation:

    • In the put and get methods, the validation for the timeout parameter can be more strict to ensure it is a non-negative number.
    • Example of modified code:
      def put(self, item, block=True, timeout=None):
          if timeout is not None and timeout < 0:
              raise ValueError("'timeout' must be a non-negative number")
          # Other code...
  3. Resource Release:

    • Ensure that locks are released correctly in all cases. Although the current code already does this, adding comments at critical points can help other developers understand the intent of the code.
    • Example of modified code:
      with self.not_full:
          # Ensure the lock is released in all cases
          try:
              if self.is_shutdown:
                  raise ShutDown
              # Other code...
          finally:
              self.not_full.release()

Performance and Maintainability

  1. Reduce Lock Hold Time:

    • In the put and get methods, minimize the time the lock is held to improve concurrency performance. For example, release the lock between checking the queue state and waiting for the condition variable.
    • Example of modified code:
      def put(self, item, block=True, timeout=None):
          with self.not_full:
              if self.is_shutdown:
                  raise ShutDown
              if self.maxsize > 0:
                  if not block:
                      if self._qsize() >= self.maxsize:
                          raise Full
                  elif timeout is None:
                      while self._qsize() >= self.maxsize:
                          self.not_full.wait()
                          if self.is_shutdown:
                              raise ShutDown
                  elif timeout < 0:
                      raise ValueError("'timeout' must be a non-negative number")
                  else:
                      endtime = time() + timeout
                      while self._qsize() >= self.maxsize:
                          remaining = endtime - time()
                          if remaining <= 0.0:
                              raise Full
                          self.not_full.wait(remaining)
                          if self.is_shutdown:
                              raise ShutDown
          with self.not_full:
              self._put(item)
              self.unfinished_tasks += 1
              self.not_empty.notify()
  2. Reduce Duplicate Code:

    • There is a lot of duplicate code in the put and get methods. Extract common logic to reduce duplication.
    • Example of modified code:
      def _wait_for_condition(self, condition, timeout=None):
          if timeout is None:
              while not condition():
                  self.not_full.wait()
                  if self.is_shutdown:
                      raise ShutDown
          elif timeout < 0:
              raise ValueError("'timeout' must be a non-negative number")
          else:
              endtime = time() + timeout
              while not condition():
                  remaining = endtime - time()
                  if remaining <= 0.0:
                      raise Full if condition == self._is_queue_full else Empty
                  self.not_full.wait(remaining)
                  if self.is_shutdown:
                      raise ShutDown
      
      def put(self, item, block=True, timeout=None):
          with self.not_full:
              if self.is_shutdown:
                  raise ShutDown
              if self.maxsize > 0:
                  if not block:
                      if self._qsize() >= self.maxsize:
                          raise Full
                  else:
                      self._wait_for_condition(lambda: self._qsize() < self.maxsize, timeout)
          with self.not_full:
              self._put(item)
              self.unfinished_tasks += 1
              self.not_empty.notify()
      
      def get(self, block=True, timeout=None):
          with self.not_empty:
              if self.is_shutdown and not self._qsize():
                  raise ShutDown
              if not block:
                  if not self._qsize():
                      raise Empty
              else:
                  self._wait_for_condition(lambda: self._qsize() > 0, timeout)
          with self.not_empty:
              item = self._get()
              self.not_full.notify()
              return item
  3. Optimize shutdown Method:

    • In the shutdown method, optimize the immediate shutdown case to reduce unnecessary loops.
    • Example of modified code:
      def shutdown(self, immediate=False):
          with self.mutex:
              self.is_shutdown = True
              if immediate:
                  self.unfinished_tasks -= self._qsize()
                  self.queue.clear()
                  self.all_tasks_done.notify_all()
              self.not_empty.notify_all()
              self.not_full.notify_all()
  4. Documentation and Comments:

    • Add more documentation and comments, especially for complex logic parts, to make it easier for other developers to understand and maintain the code.
    • Example of modified code:
      def put(self, item, block=True, timeout=None):
          '''Put an item into the queue.
      
          If optional args 'block' is true and 'timeout' is None (the default),
          block if necessary until a free slot is available. If 'timeout' is
          a non-negative number, it blocks at most 'timeout' seconds and raises
          the Full exception if no free slot was available within that time.
          Otherwise ('block' is false), put an item on the queue if a free slot
          is immediately available, else raise the Full exception ('timeout'
          is ignored in that case).
      
          Raises ShutDown if the queue has been shut down.
          '''
          with self.not_full:
              if self.is_shutdown:
                  raise ShutDown
              if self.maxsize > 0:
                  if not block:
                      if self._qsize() >= self.maxsize:
                          raise Full
                  else:
                      self._wait_for_condition(lambda: self._qsize() < self.maxsize, timeout)
          with self.not_full:
              self._put(item)
              self.unfinished_tasks += 1
              self.not_empty.notify()
@nineteendo
Copy link
Contributor

Can you give a benchmark showing how much the performance has improved? I'm not sure about the other suggestions though.

@picnixz
Copy link
Contributor

picnixz commented Nov 21, 2024

Documentation at the docstring level may or may not necessary as we have the online docs which users and developers should consult. Inline comments (#) may be useful but not that not everyone is meant to understand the code. However it's not useful if you are commenting every line and say what this or that does. with contexts are sufficient and shouldn't be burden with verbose comments.

Usage of lambda function may or may not be ideal even if we deduplicate the logic.

Releasing and acquiring the lock in a short time does not necessarily improve performances. We need bemchmarks.

Operations on closed queues are illegal and the caller is reponsible for that. In particular, the exceptions in this case are meant to be propagated, not silenced. Recall that this is a standard library not an application. The caller should handle the exception themselves in this case.

Validating the timeout parameter may be useful but documenting it should be sufficient I think. What is the current behaviour if you use a negative timeout? does it raise a TypeError? or a ValueError (which may be better in this case. If you want strict input checks, a TypeError should be raised if the timeout is not a number, while a ValueError should be raised if it's negative).

@picnixz picnixz added stdlib Python modules in the Lib dir pending The issue will be closed if no feedback is provided labels Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

3 participants