Skip to content
This repository has been archived by the owner on Jun 4, 2021. It is now read-only.

Properly handle opaque whiteout entries in layer flattener #110

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

JoshRosen
Copy link

@JoshRosen JoshRosen commented Sep 22, 2018

This PR fixes a bug in the flattener's handling of layers with opaque whiteout entries (.wh..wh..opq files). This bug could lead to deleted files re-appearing in flattened output.

Bug symptoms

According to the OCI spec on whiteouts:

An opaque whiteout entry is a file with the name .wh..wh..opq indicating that all siblings are hidden in the lower layer.

The existing flattening logic does not take these files into account, leading to weird anomalies when flattening images where directories are deleted in lower layers and re-created in higher layers. As an example, imagine that I have a Dockerfile which looks something like this:

FROM somewhere
RUN mkdir /directory && touch /directory/old1 /directory/old2
RUN rm -r directory && mkdir /directory && touch /directory/new

Per the spec, these filesystem changes can be represented via the following tar layers:

directory/
directory/old1
directory/old2

and

directory/
directory/.wh..wh..opq
directory/new

The expected flattened layer output is

directory/new

The current flattening implementation does not respect the .wh..wh..opq file, so the flattened output incorrectly contains both the old and new files:

directory/new
directory/old1
directory/old2

Handling of opaque whiteout files seems to be a problem for other projects, too; see sylabs/singularity#1962 and moby/moby#34300, for example.

Solution

This PR addresses this problem by modifying the client v2.2's flattening function to properly account for opaque whiteout files. The flattener now tracks the set of directories in higher layers which have opaque whiteouts.

Testing

I added a new unit test suite client_v2_2 and wrote tests for the extract() flattening function.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

@JoshRosen JoshRosen force-pushed the handle-opaque-whiteouts-in-flattener branch from 7ea40dc to 8e4699d Compare September 22, 2018 07:18
@JoshRosen JoshRosen force-pushed the handle-opaque-whiteouts-in-flattener branch from 8e4699d to 496d266 Compare September 22, 2018 07:19
@JoshRosen
Copy link
Author

JoshRosen commented Sep 22, 2018

I consider this a work-in-progress until the following TODOs and questions are resolved:

  • Does a similar change need to be made in the older client versions, or is it okay to change only in V2.2?
  • Does this need to be feature-flagged? Is it possible that anyone was relying on the old, incorrect behavior?
  • Add comments to explain the new code in extract().
  • Add a simple test of file contents (to make sure that the flattener picks the topmost version of a given file).
  • Is it possible for a tar to contain an opaque whiteout file for a directory but to add no new files under that directory? In this case, should the directory be present-but-empty or should it be removed? I need to see how other projects handle this corner-case. I may be able to test this behavior by simulating this case with my Docker environment and then ensuring that the flattener output agrees with my Docker output.

@JoshRosen JoshRosen changed the title Properly handle opaque whiteout entries in flattener Properly handle opaque whiteout entries in Docker image layer flattener Sep 24, 2018
@JoshRosen JoshRosen changed the title Properly handle opaque whiteout entries in Docker image layer flattener Properly handle opaque whiteout entries in layer flattener Sep 24, 2018
JoshRosen added a commit to databricks/containerregistry that referenced this pull request Sep 24, 2018
@JoshRosen
Copy link
Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@kevininderhees
Copy link

@KaylaNguyen, do you have any time to take a look at this? We ran into this bug again the other day.

@ahirreddy
Copy link

Bump on this PR

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants