Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZON #20271

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

ZON #20271

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
8524dcb
Squashes all zon work into one commit for easier rebase
MasonRemaley Nov 4, 2024
3cf23e2
Rebases and gets runtime zon import tests passing, comptime import ne…
MasonRemaley Nov 5, 2024
1e40033
Gets some import zon behavior tests working, skips failing tests for now
MasonRemaley Nov 5, 2024
787e7f6
Gets strings working again
MasonRemaley Nov 5, 2024
b10457c
Starts moving parsing into known result type
MasonRemaley Nov 5, 2024
20f1f00
Starts moving optionals to known result type parsing
MasonRemaley Nov 5, 2024
d02fc84
Parses integers to known result types. Not yet tested as those tests …
MasonRemaley Nov 6, 2024
2752c8c
Pulls out separate types into their own functions to make easier to j…
MasonRemaley Nov 6, 2024
4e4db38
Parses arrays to known result type
MasonRemaley Nov 6, 2024
006800c
Parses enums to known result type
MasonRemaley Nov 6, 2024
0476192
WIP tuples: crash on string literals right now
MasonRemaley Nov 6, 2024
62dad41
Parses tuples to known result types
MasonRemaley Nov 6, 2024
b6b9b2d
Parses strings to known result type
MasonRemaley Nov 6, 2024
46d2a4d
Parses unions to explicit return types
MasonRemaley Nov 12, 2024
cfc59e7
Fixes bug in explicit union tag parsing
MasonRemaley Nov 12, 2024
279145b
Parses void union fields from enum literals
MasonRemaley Nov 13, 2024
4dfc1f5
Parses structs to known result types
MasonRemaley Nov 13, 2024
0ca4bf5
Fixes parsing integers as floats
MasonRemaley Nov 13, 2024
531579a
Removes old implementation, renames parse to lower
MasonRemaley Nov 26, 2024
622d85c
Implements slice coercion
MasonRemaley Nov 27, 2024
05f1fe3
Starts work on compile error tests
MasonRemaley Nov 27, 2024
18284f5
Fixes more failing compile error tests
MasonRemaley Dec 3, 2024
cc7105a
Fixes more zon compile error tests
MasonRemaley Dec 3, 2024
083e233
Gets all ZON tests passing locally
MasonRemaley Dec 3, 2024
56b17fd
Removes platform specific separator from test case
MasonRemaley Dec 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion build.zig
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@ pub fn build(b: *std.Build) !void {
const optimization_modes = chosen_opt_modes_buf[0..chosen_mode_index];

const fmt_include_paths = &.{ "lib", "src", "test", "tools", "build.zig", "build.zig.zon" };
const fmt_exclude_paths = &.{"test/cases"};
const fmt_exclude_paths = &.{ "test/cases", "test/behavior/zon" };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these excluded from formatting?

Copy link
Contributor Author

@MasonRemaley MasonRemaley Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto format is good on code for readability, but normalizing the zon test data to standard formatting reduces coverage. (Tests cover invalid syntax and non standard formatting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I think incorrect formatting is something we could better test with some kind of automated testing (something to jiggle around whitespace in all of the behavior tests and check they work the same), and I don't see any particularly compelling reason to treat the ZON behavior tests differently to the other Zig behavior tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m open to more thorough tests being added for this in the future, but in the meantime, normalizing test data strictly decreases the quality of the tests and provides no upsides

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see your point -- I'll defer to @andrewrk's opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @mlugg's idea exactly describes fuzz testing combined with using these existing tests as an input corpus. If we had that sub-project already finished and fuzzing done regularly, then I think it would make sense to go with that suggestion. Until that is implemented, I think @MasonRemaley's stance makes more sense. I'm actively working on it.

const do_fmt = b.addFmt(.{
.paths = fmt_include_paths,
.exclude_paths = fmt_exclude_paths,
Expand Down
3 changes: 2 additions & 1 deletion lib/std/fmt.zig
Original file line number Diff line number Diff line change
Expand Up @@ -1581,7 +1581,8 @@ test parseInt {
try std.testing.expectEqual(@as(i5, -16), try std.fmt.parseInt(i5, "-10", 16));
}

fn parseIntWithSign(
/// Like `parseIntWithGenericCharacter`, but with a sign argument.
pub fn parseIntWithSign(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generally useful function IMO, and I needed access to it from the ZON parser, so I made it public

comptime Result: type,
comptime Character: type,
buf: []const Character,
Expand Down
1 change: 1 addition & 0 deletions lib/std/std.zig
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ pub const Thread = @import("Thread.zig");
pub const Treap = @import("treap.zig").Treap;
pub const Tz = tz.Tz;
pub const Uri = @import("Uri.zig");
pub const zon = @import("zon.zig");

pub const array_hash_map = @import("array_hash_map.zig");
pub const atomic = @import("atomic.zig");
Expand Down
3 changes: 2 additions & 1 deletion lib/std/zig/Ast.zig
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@
/// Reference to externally-owned data.
source: [:0]const u8,

mode: Mode,

tokens: TokenList.Slice,
/// The root AST node is assumed to be index 0. Since there can be no
/// references to the root node, this means 0 is available to indicate null.
nodes: NodeList.Slice,
extra_data: []Node.Index,
mode: Mode = .zig,

errors: []const Error,

Expand Down
181 changes: 50 additions & 131 deletions lib/std/zig/AstGen.zig
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,8 @@ fn appendRefsAssumeCapacity(astgen: *AstGen, refs: []const Zir.Inst.Ref) void {
}

pub fn generate(gpa: Allocator, tree: Ast) Allocator.Error!Zir {
assert(tree.mode == .zig);

var arena = std.heap.ArenaAllocator.init(gpa);
defer arena.deinit();

Expand Down Expand Up @@ -8812,36 +8814,22 @@ fn numberLiteral(gz: *GenZir, ri: ResultInfo, node: Ast.Node.Index, source_node:
}
}

fn failWithNumberError(astgen: *AstGen, err: std.zig.number_literal.Error, token: Ast.TokenIndex, bytes: []const u8) InnerError {
const is_float = std.mem.indexOfScalar(u8, bytes, '.') != null;
switch (err) {
.leading_zero => if (is_float) {
return astgen.failTok(token, "number '{s}' has leading zero", .{bytes});
} else {
return astgen.failTokNotes(token, "number '{s}' has leading zero", .{bytes}, &.{
try astgen.errNoteTok(token, "use '0o' prefix for octal literals", .{}),
});
},
.digit_after_base => return astgen.failTok(token, "expected a digit after base prefix", .{}),
.upper_case_base => |i| return astgen.failOff(token, @intCast(i), "base prefix must be lowercase", .{}),
.invalid_float_base => |i| return astgen.failOff(token, @intCast(i), "invalid base for float literal", .{}),
.repeated_underscore => |i| return astgen.failOff(token, @intCast(i), "repeated digit separator", .{}),
.invalid_underscore_after_special => |i| return astgen.failOff(token, @intCast(i), "expected digit before digit separator", .{}),
.invalid_digit => |info| return astgen.failOff(token, @intCast(info.i), "invalid digit '{c}' for {s} base", .{ bytes[info.i], @tagName(info.base) }),
.invalid_digit_exponent => |i| return astgen.failOff(token, @intCast(i), "invalid digit '{c}' in exponent", .{bytes[i]}),
.duplicate_exponent => |i| return astgen.failOff(token, @intCast(i), "duplicate exponent", .{}),
.exponent_after_underscore => |i| return astgen.failOff(token, @intCast(i), "expected digit before exponent", .{}),
.special_after_underscore => |i| return astgen.failOff(token, @intCast(i), "expected digit before '{c}'", .{bytes[i]}),
.trailing_special => |i| return astgen.failOff(token, @intCast(i), "expected digit after '{c}'", .{bytes[i - 1]}),
.trailing_underscore => |i| return astgen.failOff(token, @intCast(i), "trailing digit separator", .{}),
.duplicate_period => unreachable, // Validated by tokenizer
.invalid_character => unreachable, // Validated by tokenizer
.invalid_exponent_sign => |i| {
assert(bytes.len >= 2 and bytes[0] == '0' and bytes[1] == 'x'); // Validated by tokenizer
return astgen.failOff(token, @intCast(i), "sign '{c}' cannot follow digit '{c}' in hex base", .{ bytes[i], bytes[i - 1] });
},
.period_after_exponent => |i| return astgen.failOff(token, @intCast(i), "unexpected period after exponent", .{}),
}
fn failWithNumberError(
Copy link
Contributor Author

@MasonRemaley MasonRemaley Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a TODO master to DRY these string/error functions with a duplicate in Manifest.zig.

I also needed them, both in runtime and compile time ZON parsing, so rather than duplicate it further I resolved the TODO.

astgen: *AstGen,
err: std.zig.number_literal.Error,
token: Ast.TokenIndex,
bytes: []const u8,
) InnerError {
const note = err.noteWithSource(bytes);
const notes: []const u32 = if (note) |n| &.{try astgen.errNoteTok(token, "{s}", .{n})} else &.{};
try astgen.appendErrorTokNotesOff(
token,
@as(u32, @intCast(err.offset())),
"{}",
.{err.fmtWithSource(bytes)},
notes,
);
return error.AnalysisFail;
}

fn asmExpr(
Expand Down Expand Up @@ -9336,7 +9324,18 @@ fn builtinCall(
} else if (str.len == 0) {
return astgen.failTok(str_lit_token, "import path cannot be empty", .{});
}
const result = try gz.addStrTok(.import, str.index, str_lit_token);
const res_ty = try ri.rl.resultType(gz, node) orelse .none;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports now store result types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible ZIR optimization here would now be to replace the rvalue call below with special handling in the ty case, since Sema will do the coercion for us. Not a merge blocker, just worth noting!

const payload_index = try addExtra(gz.astgen, Zir.Inst.Import{
.res_ty = res_ty,
.path = str.index,
});
const result = try gz.add(.{
.tag = .import,
.data = .{ .pl_tok = .{
.src_tok = gz.tokenIndexToRelative(str_lit_token),
.payload_index = payload_index,
} },
});
const gop = try astgen.imports.getOrPut(astgen.gpa, str.index);
if (!gop.found_existing) {
gop.value_ptr.* = str_lit_token;
Expand Down Expand Up @@ -11422,85 +11421,20 @@ fn parseStrLit(
}
}

fn failWithStrLitError(astgen: *AstGen, err: std.zig.string_literal.Error, token: Ast.TokenIndex, bytes: []const u8, offset: u32) InnerError {
fn failWithStrLitError(
astgen: *AstGen,
err: std.zig.string_literal.Error,
token: Ast.TokenIndex,
bytes: []const u8,
offset: u32,
) InnerError {
const raw_string = bytes[offset..];
switch (err) {
.invalid_escape_character => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"invalid escape character: '{c}'",
.{raw_string[bad_index]},
);
},
.expected_hex_digit => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"expected hex digit, found '{c}'",
.{raw_string[bad_index]},
);
},
.empty_unicode_escape_sequence => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"empty unicode escape sequence",
.{},
);
},
.expected_hex_digit_or_rbrace => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"expected hex digit or '}}', found '{c}'",
.{raw_string[bad_index]},
);
},
.invalid_unicode_codepoint => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"unicode escape does not correspond to a valid unicode scalar value",
.{},
);
},
.expected_lbrace => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"expected '{{', found '{c}",
.{raw_string[bad_index]},
);
},
.expected_rbrace => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"expected '}}', found '{c}",
.{raw_string[bad_index]},
);
},
.expected_single_quote => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"expected single quote ('), found '{c}",
.{raw_string[bad_index]},
);
},
.invalid_character => |bad_index| {
return astgen.failOff(
token,
offset + @as(u32, @intCast(bad_index)),
"invalid byte in string or character literal: '{c}'",
.{raw_string[bad_index]},
);
},
.empty_char_literal => {
return astgen.failOff(token, offset, "empty character literal", .{});
},
}
return astgen.failOff(
token,
offset + @as(u32, @intCast(err.offset())),
"{}",
.{err.fmtWithSource(raw_string)},
);
}

fn failNode(
Expand Down Expand Up @@ -11618,7 +11552,7 @@ fn appendErrorTokNotesOff(
comptime format: []const u8,
args: anytype,
notes: []const u32,
) !void {
) Allocator.Error!void {
@branchHint(.cold);
const gpa = astgen.gpa;
const string_bytes = &astgen.string_bytes;
Expand Down Expand Up @@ -11814,32 +11748,17 @@ fn strLitAsString(astgen: *AstGen, str_lit_token: Ast.TokenIndex) !IndexSlice {
}

fn strLitNodeAsString(astgen: *AstGen, node: Ast.Node.Index) !IndexSlice {
const tree = astgen.tree;
const node_datas = tree.nodes.items(.data);

const start = node_datas[node].lhs;
const end = node_datas[node].rhs;

const gpa = astgen.gpa;
const data = astgen.tree.nodes.items(.data);
const string_bytes = &astgen.string_bytes;
const str_index = string_bytes.items.len;

// First line: do not append a newline.
var tok_i = start;
{
const slice = tree.tokenSlice(tok_i);
const line_bytes = slice[2..];
try string_bytes.appendSlice(gpa, line_bytes);
tok_i += 1;
}
// Following lines: each line prepends a newline.
while (tok_i <= end) : (tok_i += 1) {
const slice = tree.tokenSlice(tok_i);
const line_bytes = slice[2..];
try string_bytes.ensureUnusedCapacity(gpa, line_bytes.len + 1);
string_bytes.appendAssumeCapacity('\n');
string_bytes.appendSliceAssumeCapacity(line_bytes);
var parser = std.zig.string_literal.multilineParser(string_bytes.writer(gpa));
var tok_i = data[node].lhs;
while (tok_i <= data[node].rhs) : (tok_i += 1) {
try parser.line(astgen.tree.tokenSlice(tok_i));
}

const len = string_bytes.items.len - str_index;
try string_bytes.append(gpa, 0);
return IndexSlice{
Expand Down
9 changes: 8 additions & 1 deletion lib/std/zig/Zir.zig
Original file line number Diff line number Diff line change
Expand Up @@ -1667,7 +1667,7 @@ pub const Inst = struct {
.func = .pl_node,
.func_inferred = .pl_node,
.func_fancy = .pl_node,
.import = .str_tok,
.import = .pl_tok,
.int = .int,
.int_big = .str,
.float = .float,
Expand Down Expand Up @@ -3574,6 +3574,13 @@ pub const Inst = struct {
/// If `.none`, restore unconditionally.
operand: Ref,
};

pub const Import = struct {
/// The result type of the import, or `.none` if none was available.
res_ty: Ref,
/// The import path.
path: NullTerminatedString,
};
};

pub const SpecialProng = enum { none, @"else", under };
Expand Down
75 changes: 75 additions & 0 deletions lib/std/zig/number_literal.zig
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,83 @@ pub const Error = union(enum) {
invalid_exponent_sign: usize,
/// Period comes directly after exponent.
period_after_exponent: usize,

pub fn fmtWithSource(self: Error, bytes: []const u8) std.fmt.Formatter(formatErrorWithSource) {
return .{ .data = .{ .err = self, .bytes = bytes } };
}

pub fn noteWithSource(self: Error, bytes: []const u8) ?[]const u8 {
if (self == .leading_zero) {
const is_float = std.mem.indexOfScalar(u8, bytes, '.') != null;
if (!is_float) return "use '0o' prefix for octal literals";
}
return null;
}

pub fn offset(self: Error) usize {
return switch (self) {
.leading_zero => 0,
.digit_after_base => 0,
.upper_case_base => |i| i,
.invalid_float_base => |i| i,
.repeated_underscore => |i| i,
.invalid_underscore_after_special => |i| i,
.invalid_digit => |e| e.i,
.invalid_digit_exponent => |i| i,
.duplicate_period => 0,
.duplicate_exponent => |i| i,
.exponent_after_underscore => |i| i,
.special_after_underscore => |i| i,
.trailing_special => |i| i,
.trailing_underscore => |i| i,
.invalid_character => |i| i,
.invalid_exponent_sign => |i| i,
.period_after_exponent => |i| i,
};
}
};

const FormatWithSource = struct {
bytes: []const u8,
err: Error,
};

fn formatErrorWithSource(
self: FormatWithSource,
comptime fmt: []const u8,
options: std.fmt.FormatOptions,
writer: anytype,
) !void {
_ = options;
_ = fmt;
switch (self.err) {
.leading_zero => try writer.print("number '{s}' has leading zero", .{self.bytes}),
.digit_after_base => try writer.writeAll("expected a digit after base prefix"),
.upper_case_base => try writer.writeAll("base prefix must be lowercase"),
.invalid_float_base => try writer.writeAll("invalid base for float literal"),
.repeated_underscore => try writer.writeAll("repeated digit separator"),
.invalid_underscore_after_special => try writer.writeAll("expected digit before digit separator"),
.invalid_digit => |info| try writer.print("invalid digit '{c}' for {s} base", .{ self.bytes[info.i], @tagName(info.base) }),
.invalid_digit_exponent => |i| try writer.print("invalid digit '{c}' in exponent", .{self.bytes[i]}),
.duplicate_exponent => try writer.writeAll("duplicate exponent"),
.exponent_after_underscore => try writer.writeAll("expected digit before exponent"),
.special_after_underscore => |i| try writer.print("expected digit before '{c}'", .{self.bytes[i]}),
.trailing_special => |i| try writer.print("expected digit after '{c}'", .{self.bytes[i - 1]}),
.trailing_underscore => try writer.writeAll("trailing digit separator"),
.duplicate_period => try writer.writeAll("duplicate period"),
.invalid_character => try writer.writeAll("invalid character"),
.invalid_exponent_sign => |i| {
const hex = self.bytes.len >= 2 and self.bytes[0] == '0' and self.bytes[1] == 'x';
if (hex) {
try writer.print("sign '{c}' cannot follow digit '{c}' in hex base", .{ self.bytes[i], self.bytes[i - 1] });
} else {
try writer.print("sign '{c}' cannot follow digit '{c}' in current base", .{ self.bytes[i], self.bytes[i - 1] });
}
},
.period_after_exponent => try writer.writeAll("unexpected period after exponent"),
}
}

/// Parse Zig number literal accepted by fmt.parseInt, fmt.parseFloat and big_int.setString.
/// Valid for any input.
pub fn parseNumberLiteral(bytes: []const u8) Result {
Expand Down
Loading
Loading