Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrange not reporting syntax errors on .by_group #6980

Open
py9mrg opened this issue Jan 8, 2024 · 1 comment
Open

arrange not reporting syntax errors on .by_group #6980

py9mrg opened this issue Jan 8, 2024 · 1 comment

Comments

@py9mrg
Copy link

py9mrg commented Jan 8, 2024

Hello,

When using arrange on grouped data (and wanting to respect the grouping), we need to use the .by_group argument. However, if you make a syntax error - e.g. by_group or .by_groups - then the arranging fails to respect the groups silently. This has the potential to lead to serious errors in analysis.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# correct behaviour
mtcars |>
  group_by(cyl) |>
  arrange(mpg, .by_group = TRUE)
#> # A tibble: 32 × 11
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
#>  2  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#>  3  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  4  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#>  5  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  6  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  7  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  8  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  9  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#> 10  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#> # ℹ 22 more rows

# syntax error leads to no grouping, but happens silently
mtcars |>
  group_by(cyl) |>
  arrange(mpg, .by_groups = TRUE)
#> # A tibble: 32 × 11
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
#>  2  10.4     8  460    215  3     5.42  17.8     0     0     3     4
#>  3  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
#>  4  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  5  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
#>  6  15       8  301    335  3.54  3.57  14.6     0     1     5     8
#>  7  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
#>  8  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2
#>  9  15.5     8  318    150  2.76  3.52  16.9     0     0     3     2
#> 10  15.8     8  351    264  4.22  3.17  14.5     0     1     5     4
#> # ℹ 22 more rows

Created on 2024-01-08 with reprex v2.0.2

I assume this is happening because of the ... argument combined with passing an object set to TRUE. I think there needs to be something here that either checks for arguments similar to .by_group but miss-spelt, or some other method to warn / throw an error in such circumstances. Failing to group silently seems a bit dangerous.

@philibe
Copy link

philibe commented Jan 8, 2024

For me your .by_groups is understood as unknown parameter, and maybe as column, and skipped in the condition below.

arrange.data.frame <- function(.data,

arrange.data.frame <- function(.data,
                               ...,
                               .by_group = FALSE,
                               .locale = NULL) {
  dots <- enquos(...)

  if (.by_group) {  ## my remark : skipped if .by_groups
    dots <- c(quos(!!!groups(.data)), dots)
  }

  loc <- arrange_rows(.data, dots = dots, locale = .locale)
  dplyr_row_slice(.data, loc)
}

Why to have a warning for everything ? To type .by_groups is a typo. End of the story.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants