Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache wrapper #8

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Cache wrapper #8

wants to merge 3 commits into from

Conversation

BBC-Esq
Copy link

@BBC-Esq BBC-Esq commented Oct 9, 2024

Read commit notes for details, but here are some brainstorms to evaluate if/when basic metrics are implemented:

Caching Strategy Description
RotatingKVCache Rotates cache entries based on a fixed schedule, discarding the oldest entries as new ones are added to maintain a constant cache size.
LRU (Least Recently Used) Evicts the least recently accessed items first, ensuring that frequently accessed data remains in the cache while older, unused data is removed.
LFU (Least Frequently Used) Removes items that are accessed least frequently, prioritizing entries that are used more often to stay in the cache longer.
FIFO (First In First Out) Discards the oldest entries based on the order they were added, without considering how frequently or recently they have been accessed.
Time-based Expiration Invalidates cache entries after a specified time period has elapsed since their last access or creation, ensuring that stale data is periodically removed.
Size-based Eviction Limits the cache size by removing entries once a predefined size threshold is reached, maintaining the cache within memory constraints.
Custom Replacement Policies Allows the implementation of user-defined rules for cache eviction based on specific application requirements, such as combining multiple criteria like recency and frequency.
Multi-Level Caching Utilizes multiple layers of caching (e.g., in-memory, disk-based) to optimize performance and resource utilization by distributing data across different storage mediums.
Cache Preloading Loads frequently used or common prompts into the cache ahead of time to reduce processing latency during runtime operations.
Dynamic Cache Sizing Adjusts the cache size dynamically based on workload, memory availability, or access patterns to optimize resource utilization and maintain performance under varying conditions.
Automatic Cache Optimization Based on Usage Patterns Analyzes cache usage metrics to automatically fine-tune caching strategies, such as adjusting cache size or replacement policies, to enhance cache efficiency and performance.

I replaced the zip and range based loop with itertools.islice in the create_generator function for some small efficiency.
Ensuring that detokenizer.finalize() is always called, even if an exception occurs during the token generation loop.
Implement basic cache metrics gathering so you can see how it's typically used to implement improvements down the line.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant