README

This repository contains the research prototype of PFault which has been used for the following works:

"A Study of Failure Recovery and Logging of High-Performance Parallel File Systems", Runzhou Han, Om R. Gatla, Mai Zheng, Jinrui Cao, Di Zhang, Dong Dai, Yong Chen, and Jonathan Cook. ACM Transactions on Storage (TOS), Volume 18, Issue 2, 2022.
"SentiLog: Anomaly Detection on Parallel File Systems via Log-based Sentiment Analysis", Di Zhang, Dong Dai, Runzhou Han, and Mai Zheng, Proceedings of the 13th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), 2021. [Best Paper Nominee]
"Fingerprinting the Checker Policies of Parallel File Systems", Runzhou Han, Duo Zhang, and Mai Zheng, Proceedings of the 5th ACM/IEEE International Parallel Data Systems Workshop (PDSW) at ACM/IEEE Supercomputing (SC), 2020
"PFault: A General Framework for Analyzing the Reliability of High-Performance Parallel File Systems", Jinrui Cao, Om Rameshwar Gatla, Mai Zheng, Dong Dai, Vidya Eswarappa, Yan Mu, and Yong Chen. Proceedings of the 32nd ACM/SIGARCH International Conference on Supercomputing (ICS), 2018.
"A Generic Framework for Testing Parallel File Systems", Jinrui Cao, Simeng Wang, Dong Dai, Mai Zheng, and Yong Chen. Proceedings of the 1st ACM/IEEE Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems (PDSW) at ACM/IEEE Supercomputing (SC), 2016.

Introduction to PFault

The description of each component is as follows:

Failure State Emulator:

Including the Virtual Device Manager and Fault Models.

Virtual Device Manager:
- Source code in "pf_virtual_device_manager" folder
- Manages the persistent state of the target Parallel File System (PFS)
Fault Models:
- Source code in "pf_failure_state_emulator" folder
- Injects faults based on the following fault models (Please refer to the paper for their description):
  - Whole Device Failure
  - Network Partitioning
  - Global Inconsistency

PFS worker:

Source code in "pf_pfs_worker"
Generates I/O operations for aging purpose and checks correctness of the recovery

PFS Checker:

Source code in "pf_pfs_checker" folder
Invokes the default FSCK component of the target PFS

Orchestrator:

Source code in "pf_orchestrator" folder
Controls and coordinates the overall workflow of PFault automatically

For more information you could refer to our research paper at http://ics2018.ict.ac.cn/essay/ics18-cameraready-submitted.pdf

PFault Initiation Guidance

Steps to initiate the tool:

(Optional) Required to be able to run sudo commands
Make tgtd first:

cd /path/to/pfault/pf_virtual_device_manager/iscsi

make
Set password-less ssh for all the servers and clients
Copy configuration template:

cd /path/to/pfault/configuration

cp configuration_template.sh configuration.sh

Fill the configuration.sh with the required Lustre setup
Run "Virtual Device Manager" to build the Lustre Cluster

cd /path/to/pfault/pf_virtual_device_manager/

./vdm.sh
Run any workloads on the cluster. For example, there are few workloads in the folder: /path/to/pfault/workload_example
The user may select the various failure models in "Failure State Emulator"

Users may also use the orchestrator to run the experiments automatically with following steps:

In configuration.sh, select a fault model by setting the variable "FAULT_MODEL" and set up other orchestrator variables as well
/path/to/pfault/pf_orchestrator/orchestrator

The orchestrator will first run the aging workload on client node, and then do fault injection correspondingly, and finally run the checking workload. Three log files will be created during each run.

We also provide log trace generated during our experiments. Log files along with their description are provided in folders under '/log trace'.

Contact

Contact: [email protected] [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Introduction to PFault

PFault Initiation Guidance

Contact

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
configuration		configuration
log trace		log trace
pf_failure_state_emulator		pf_failure_state_emulator
pf_orchestrator		pf_orchestrator
pf_pfs_checker		pf_pfs_checker
pf_pfs_worker		pf_pfs_worker
pf_virtual_device_manager		pf_virtual_device_manager
resource_leak		resource_leak
security_vulnerability		security_vulnerability
workload_example		workload_example
.DS_Store		.DS_Store
README.md		README.md

data-storage-lab/pfault

Folders and files

Latest commit

History

Repository files navigation

README

Introduction to PFault

PFault Initiation Guidance

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages