Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
df.col = scalar
anddf[:, col] = scalar
are disallowed from DataFrames. My understanding is that this single rule is an unnecessary limitation that, combined with an ambition to make the API otherwise easy to use, has unfortunately cascaded into indexing and property rules that are unique (i.edf.col .= ...
behave like nothing else) as well as deviating from the intended behaviour described in Base Julia:.=
exists in the Julia language for the purpose of in-place operations (x[...] .=
is also supposed to be in-place).If
... = scalar
was implemented one could greatly simplify the indexing rules and avoid surprises for both beginners and advanced users alike. Improving consistency and ease-of-use. The changes in this PR are:=
allocate new columns by copying the RHS. Scalarsfill
..=
for in-place assignment.@alias df.col = ...
provides explicit aliasing. This is a new exported macro. Unintended aliasing of columns (wheredf.a === df.b
) has been a common source of bugs (previously suggested to be called@nocopy
before it was dropped).df[r, ...] =
assignment into a subset of rows promote column types to store both old and new values..=
to something that doesn't exist should error. Incompatible eltypes also error with.=
. I think that a user advanced enough to bother with in-place assignment care about precise behaviour. In a migration period these could still be allowed with a deprecation message "consider using = instead of .=" since DataFrames users are currently told to use.=
when they expect to=
to work.These changes would make
=
behave like.=
currently does, ford[!,..]
anddf.col
. Unfortunately this is a breaking change for code that relies on either of these (albeit obscure) features.df.col = v
with@alias df.col = v
ifdf.col === v
is assumed true after the assignment..=
with=
"..=
with=
".In addition to my own code base I tested my changes on these repos:
df.col = v
and later tested identicality withv
. Patched with@alias df.col = v
. The implementation outside of unit tests did not need to be patched."Wouldn't
... = scalar
just break another rule instead?"Whereas
.=
is supposed to always be in-place the=
may on the other hand convert and copy according to Julia docs.=
even if the LHS and RHS types are the same.fill
until later (i.e untildf[row, col] = ...
)Additional steps not currently included in this PR:
.=
isn't identical: "consider using = instead of .="@alias df[:, ...] = v
. It would then be possible to removedf[!, ...]
from the LHS without loss of functionality (as far as I can tell). That could imho lessen confusion about the multitude of overlapping indexing rules.@alias v = df[:, ...]
. This would make it possible to removedf[!, ...]
from the RHS without loss of functionality.By conforming to Base Julia the complete indexing rules would then be greatly simplified to just:
Example
After this PR a DataFrame would behave like this: