-
-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add final blog #656
Open
imzahra
wants to merge
2
commits into
ucsc-ospo:main
Choose a base branch
from
imzahra:final24
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
add final blog #656
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
62 changes: 62 additions & 0 deletions
62
content/report/osre24/osu/scalerep/20240918-imzahra/index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
--- | ||
title: "[Final] ScaleRep: Reproducing and benchmarking scalability bugs hiding in cloud systems" | ||
summary: | ||
authors: ["imzahra"] | ||
tags: ["osre24", "reproducibility", "scalability"] | ||
categories: ["SummerofReproducibility24"] | ||
date: 2024-09-18 | ||
lastmod: 2024-09-18 | ||
featured: false | ||
draft: false | ||
|
||
# Featured image | ||
# To use, add an image named `featured.jpg/png` to your page's folder. | ||
# Focal points: Smart, Center, TopLeft, Top, TopRight, Left, Right, BottomLeft, Bottom, BottomRight. | ||
image: | ||
caption: "" | ||
focal_point: "" | ||
preview_only: false | ||
--- | ||
|
||
Hello everyone, | ||
|
||
In my SoR 2024 project, [ScaleRep project](project/osre24/osu/scalerep/) for SoR 2024 under the mentorship of {{% mention bogdanstoica %}} and {{% mention wang.7564 %}}. I’m excited to share the final progress and insights we’ve gathered on tackling scalability bugs in large-scale distributed systems. I aimed to tackle the reproducibility challenges posed by scalability bugs in large-scale distributed systems. Below is a detailed summary of the investigations and findings we've conducted on scalability bugs in large-scale distributed systems. | ||
|
||
## Project Overview | ||
|
||
As you may recall, our project, ScaleRep, aimed to tackle the challenge of scalability bugs—those insidious issues that often arise in large-scale distributed systems under heavy workloads. These bugs, when triggered, can lead to significant system issues such as downtime, performance bottlenecks, and even data loss. They are particularly difficult to catch using traditional testing methods. | ||
|
||
Our primary focus was on reproducing these bugs, documenting the challenges involved, and providing insights into how these bugs manifest under various conditions. This documentation will help researchers identify, benchmark, and resolve similar issues in the future. | ||
|
||
## Progress | ||
|
||
Since the midterm update, several Apache Ignite bugs have been investigated, some of which have been successfully reproduced and uploaded to Trovi for the research community to access and reuse. Below is the progress on the bugs investigated: | ||
|
||
### Bugs Investigated | ||
1. **[IGNITE-20614](https://issues.apache.org/jira/browse/IGNITE-20614)** | ||
2. **[IGNITE-17407](https://issues.apache.org/jira/browse/IGNITE-17407)** | ||
3. **[IGNITE-20602](https://issues.apache.org/jira/browse/IGNITE-20602)** | ||
4. **[IGNITE-16600](https://issues.apache.org/jira/browse/IGNITE-16600)** | ||
5. **[IGNITE-16072](https://issues.apache.org/jira/browse/IGNITE-16072)** | ||
6. **[IGNITE-16582](https://issues.apache.org/jira/browse/IGNITE-16582)** | ||
7. **[IGNITE-16581](https://issues.apache.org/jira/browse/IGNITE-16581)** | ||
|
||
|
||
## Key Insights & Challenges | ||
|
||
1. Complexity of Scalability Bugs | ||
Many scalability bugs involve subtle and complex interactions that are not easily detected in standard testing environments. For instance, IGNITE-20602 only manifested under certain high-load conditions and required a specific workload and environment to reliably trigger the issue. This highlights the importance of large-scale testing when investigating scalability issues. | ||
|
||
2. Dependency and Documentation Gaps | ||
We encountered significant challenges with outdated dependencies and incomplete documentation, particularly in older bugs like IGNITE-16072. In these cases, reproducing the bug required extensive modifications or wasn’t feasible without investing disproportionate effort in updating dependencies. | ||
|
||
3. Effectiveness of Trovi and Chameleon | ||
Packaging and sharing our reproducible investigations through Trovi and Chameleon have proven highly effective. By providing researchers with pre-configured environments and detailed documentation, we’ve laid the groundwork for future collaboration and further research on these bugs. We expect this to greatly benefit others attempting to reproduce similar issues. | ||
|
||
4. Impact of Speed-Based Throttling | ||
Our investigation into IGNITE-16600 revealed several important insights into speed-based throttling and its impact on system performance under high-load conditions. By analyzing the checkpoint starvation and thread throttling mechanisms, we were able to identify areas for improvement in the latest Ignite releases. | ||
|
||
## Next Steps | ||
Expanding Collaboration: The packaged bugs and replayable Trovi experiments will be made available to the broader research community, encouraging further investigation and enhancements to large-scale distributed systems. | ||
|
||
The ScaleRep project has been an exciting journey into the world of scalability bugs, pushing the boundaries of what’s possible in terms of reproducibility and benchmarking. Through this project, we’ve demonstrated the importance of rigorous testing and comprehensive documentation in improving the reliability of distributed systems. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a leading
/
toproject/osre24/osu/scalerep/
.Please also correct this in your initial and midterm blog posts.