Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit 3: Data Storage #43

Open
cheatsheet1999 opened this issue Sep 29, 2021 · 0 comments
Open

Unit 3: Data Storage #43

cheatsheet1999 opened this issue Sep 29, 2021 · 0 comments

Comments

@cheatsheet1999
Copy link
Owner

cheatsheet1999 commented Sep 29, 2021

Lesson Introduction: Major Data Storage Layouts

Much of the success in computer technology, has been the tremendous progress that data storage has undergone. When dealing with big data, the amount of data will directly have an influence on the performance of the system.

In this Unit, we will learn more about Memory Hierarchy and both internal and external data storage.

We will be able to relate the need for external data storage with large internet-based applications that require new and more scalable storage solutions. An example is cloud-based storage systems like AWS.

Screen Shot 2021-09-28 at 5 13 51 PM

Topic: Introduction to Data Storage

How data is stored?
Normally, data is stored in a secondary storage device, which is the hard disk, to process the data in a computer, we need to load data into the main memory.

Processing Speed
CPU Register => CPU Cache => Main memory => Hard disk (Secondary Storage)

Internal Data Storage
Hard disk is a mechanical device, that's why it is very slow.
Back to the date, people use tape, because that is super cheap.
Modern, people use SSD, a lot faster than Hard disk.

Data on External Storage

  • File
    • A logical collection of data, physically stored as a set of pages
  • File Organization
    • Method of arranging a file of records on external storage, organized by Record ID(rid)
  • Architecture
    • Buffer manager stages pages from external storage to the main memory buffer pool.
    • File and index layers make calls to the buffer manager.

Why do we have a buffer manager?
Memory is smaller than disk, so we cannot load every data into the main memory at once, we can only load it page by page.

Lesson Introduction: Alternative File Organizations

In addition to traditional data storage, there are alternative file organizations. Many alternatives exist, each ideal for some situations, and not so good for others. We will explore more about Heap (random order) files, Sorted files, and Indexes in this topic.

Screen Shot 2021-09-28 at 5 44 43 PM

The Cost Model
The number of page accesses is a cost measure.
Reasoning
Page access cost is usually the dominant cost of database operations. An accurate model is too complex for analyzing algorithms.
Reading 3 pages is actually less time-consuming than reading just one page.

Heap File Advantage / Disadvantage
Advantage:

  • Efficient
    • for bulk loading data, (don't care about the order, just keep inserting)
    • for relatively small relations as indexing overheads are avoided
    • When queries need to fetch a large proportion of stored records

Disadvantages:

  • Not Efficient
    • for selective queries
    • sorting is time-consuming

Indexes

  • File Index
    • Speeds up selections on the search key fields
    • Any subset of the fields of a relation can be search key for an index on the relation
    • An index contains a collection of data entries and supports efficient retrieval of all data entries k* with a given value k

B+ Tree Indexes
Most popular indexes structure in the database system
Non-leaf pages have index entries; only used to direct searches.

Screen Shot 2021-09-28 at 6 27 20 PM

Knowledge Check: Data Storage

  1. Where is the database stored in a computer?
  • Central Processing Unit
  • [Correct] Hard disk (A database is stored in the hard disk of a computer.)
  • Memory
  • Cache
  1. What is the correct order of processing speed of major units in a computer from the fastest to slowest?
  • CPU, cache, memory, hard disk
  1. Why is the processing speed of a traditional computer hard disk lower than a modern solid-state drive (SSD)?
  • [Correct] Because a hard disk is a mechanical device. Contrary to the solid-state drive, a hard disk has to spin and spend more time to find a requested data byte)
  • Because solid state drive is a mechanical device.
  • Because the size of a solid state drive is bigger than that of a hard disk.
  • Because a hard disk can only read pages in sequence.
  1. What is the name of the software component in a computer that loads pages from hard disk into memory?
  • Memory Manager
  • [Correct] Buffer Manager (Buffer manager loads pages from hard disk into memory)
  • Load Manager
  • Index Manager
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant