New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit 3: Data Storage #43
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Lesson Introduction: Major Data Storage Layouts
Much of the success in computer technology, has been the tremendous progress that data storage has undergone. When dealing with big data, the amount of data will directly have an influence on the performance of the system.
In this Unit, we will learn more about Memory Hierarchy and both internal and external data storage.
We will be able to relate the need for external data storage with large internet-based applications that require new and more scalable storage solutions. An example is cloud-based storage systems like AWS.
Topic: Introduction to Data Storage
How data is stored?
Normally, data is stored in a secondary storage device, which is the hard disk, to process the data in a computer, we need to load data into the main memory.
Processing Speed
CPU Register => CPU Cache => Main memory => Hard disk (Secondary Storage)
Internal Data Storage
Hard disk is a mechanical device, that's why it is very slow.
Back to the date, people use tape, because that is super cheap.
Modern, people use SSD, a lot faster than Hard disk.
Data on External Storage
Why do we have a buffer manager?
Memory is smaller than disk, so we cannot load every data into the main memory at once, we can only load it page by page.
Lesson Introduction: Alternative File Organizations
In addition to traditional data storage, there are alternative file organizations. Many alternatives exist, each ideal for some situations, and not so good for others. We will explore more about Heap (random order) files, Sorted files, and Indexes in this topic.
The Cost Model
The number of page accesses is a cost measure.
Reasoning
Page access cost is usually the dominant cost of database operations. An accurate model is too complex for analyzing algorithms.
Reading 3 pages is actually less time-consuming than reading just one page.
Heap File Advantage / Disadvantage
Advantage:
Disadvantages:
Indexes
B+ Tree Indexes
Most popular indexes structure in the database system
Non-leaf pages have index entries; only used to direct searches.
Knowledge Check: Data Storage
The text was updated successfully, but these errors were encountered: