Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump sqlfluff from 1.4.5 to 3.0.6 in /src #3656

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github May 7, 2024

Bumps sqlfluff from 1.4.5 to 3.0.6.

Release notes

Sourced from sqlfluff's releases.

[3.0.6] - 2024-05-06

Highlights

This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.

This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.

Thanks also to @​padraic00 & @​burhanyasar who made their first contributions in this release. 🎉🎉🏆🎉🎉

What’s Changed

New Contributors

... (truncated)

Changelog

Sourced from sqlfluff's changelog.

[3.0.6] - 2024-05-06

Highlights

This release primarily fixes an issue introduced by the recent dbt 1.7.14 release, and better support for dbt 1.7+. It also includes a range of dialect improvements and CLI refinements.

This release also includes the groundwork for linting the unrendered sections of Jinja templates. More documentation on this will be released in due course when it's ready for beta testing.

Thanks also to @​padraic00 & @​burhanyasar who made their first contributions in this release. 🎉🎉🏆🎉🎉

What’s Changed

... (truncated)

Commits

Dependabot compatibility score

You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note
Automatic rebases have been disabled on this pull request as it has been open for over 30 days.

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels May 7, 2024
Bumps [sqlfluff](https://github.com/sqlfluff/sqlfluff) from 1.4.5 to 3.0.6.
- [Release notes](https://github.com/sqlfluff/sqlfluff/releases)
- [Changelog](https://github.com/sqlfluff/sqlfluff/blob/main/CHANGELOG.md)
- [Commits](sqlfluff/sqlfluff@1.4.5...3.0.6)

---
updated-dependencies:
- dependency-name: sqlfluff
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot/pip/src/sqlfluff-3.0.6 branch from 76d96df to 9ebd94c Compare May 7, 2024 17:44
@max-ostapenko
Copy link
Contributor

See some SQL formatting example as a result of: $ sqlfluff fix ../sql/2022/privacy. All makes sense to me.

There are a few issues in 2022 that require a manual fix though:

(.venv) maxostapenko@Maxs-MBP src % sqlfluff lint ../sql/2022 
== [../sql/2022/caching/header_trends.sql] FAIL                                                                                                   
L:  53 | P:  48 | AL08 | Reuse of column alias 'total_using_both' from line 42.                                                                   
                       | [aliasing.unique.column]
L:  54 | P:  56 | AL08 | Reuse of column alias 'total_using_neither' from line
                       | 43. [aliasing.unique.column]
L:  71 | P:  59 | AL08 | Reuse of column alias 'pct_using_both' from line 60.
                       | [aliasing.unique.column]
L:  72 | P:  67 | AL08 | Reuse of column alias 'pct_using_neither' from line 61.
                       | [aliasing.unique.column]
== [../sql/2022/css/meta_unknown_properties.sql] FAIL                                                                                             
L:  51 | P:  45 | AL08 | Reuse of column alias 'total' from line 48.                                                                              
                       | [aliasing.unique.column]
== [../sql/2022/media/video_source_formats.sql] FAIL                                                                                              
L:  11 | P:   5 | AL08 | Reuse of column alias 'video_source_format_type' from                                                                    
                       | line 9. [aliasing.unique.column]
== [../sql/2022/performance/lcp_element_data.sql] FAIL                                                                                            
L: 112 | P:  15 | AL08 | Reuse of column alias 'total' from line 100.                                                                             
                       | [aliasing.unique.column]
== [../sql/2022/security/feature_adoption_by_country.sql] FAIL                                                                                    
L:  28 | P: 116 | AL08 | Reuse of column alias 'pct_csp' from line 27.                                                                            
                       | [aliasing.unique.column]
== [../sql/2022/seo/lighthouse_unused_css_js_by_rank.sql] FAIL                                                                                    
L:  31 | P:   5 | CV09 | Use of blocked regex                                                                                                     
                       | '`httparchive.lighthouse.2022_07_01_*`'.
                       | [convention.blocked_words]
== [../sql/2022/seo/pages-canonical-stats.sql] FAIL                                                                                               
L: 167 | P:   5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.                                                                 
                       | [convention.blocked_words]
== [../sql/2022/seo/robots-text-size.sql] FAIL                                                                                                    
L:  40 | P:   5 | CV09 | Use of blocked regex '`httparchive.pages.2022_07_01_*`'.                                                                 
                       | [convention.blocked_words]
All Finished 📜 🎉!      

Should we maybe just ignore past years?

@tunetheweb
Copy link
Member

tunetheweb commented May 19, 2024

@rviscomi wasn't happy moving from this:

JOIN (
  SELECT
    _TABLE_SUFFIX AS client,
    COUNT(0) AS total
  FROM
    `httparchive.summary_pages.2022_06_01_*`
  GROUP BY
    _TABLE_SUFFIX)

to this style which v2 and above insists upon:

JOIN (
  SELECT
    _TABLE_SUFFIX AS client,
    COUNT(0) AS total
  FROM
    `httparchive.summary_pages.2022_06_01_*`
  GROUP BY
    _TABLE_SUFFIX
)

And I've had no luck so far getting the SQLFluff maintainers (of which I'm one btw, but don't do much with it these days) to support Rick's preferred bracketing, even optionally.

So that's a sticking point for the upgrade.

Other than that we'd need to update the noqa comments to avoid above flagging and should probably do more changes to the config (as noted in this branch)

@max-ostapenko
Copy link
Contributor

Thanks for the context.
So the issue is formatting of subquery end bracket.

  1. I see the formatting you're talking about is raising error on layout.indent rule:

Expected line break and no indent before ')'.

This rule is critical for indentation formatting.
But I don't see from it's description why is it taking care of newlines.

  1. layout.cte_bracket (another case of formatting the closing bracket) doesn't work as expected as well and has a conflict with layout.indent.
    See Conflict between layout.cte_bracket and layout.indent sqlfluff/sqlfluff#5893.
    This conflict may impact the subqueries brackets too.
    @tunetheweb What do you think?

  2. Additionally we could introduce custom rule layout.subquery_bracket that would take care of this scenario. But again, the conflict with indentation needs to be resolved.

@tunetheweb
Copy link
Member

Yeah the layout code is complex (which is why I haven't gone and fixed it myself!). I think we need to add line_position support to [sqlfluff:layout:type:end_bracket] config.

But to be honest I don't disagree with SQLFluff and think I actually prefer it on a new line (like we do with CTEs).

@rviscomi
Copy link
Member

@tunetheweb just to clarify, I actually prefer the closing parenthesis to be on its own line. What I said in #3364 (comment) was I wasn't (and still am not) a fan of changing from K&R:

FROM (
  SELECT
    _TABLE_SUFFIX AS client,
    hasHeading(payload) AS has_heading
  FROM
    `httparchive.pages.2019_07_01_*`
)

to GNU:

FROM
  (
    SELECT
      _TABLE_SUFFIX AS client,
      hasHeading(payload) AS has_heading
    FROM
      `httparchive.pages.2019_07_01_*`
  )

(Specifically, the extra indentation)

@tunetheweb
Copy link
Member

OK then I think we might be OK then as think that was made optional. Let me try and run fix and see what it looks like.

@tunetheweb
Copy link
Member

@rviscomi can you spot check a few files?

Will be good to get this upgraded as long been on my list to resolve this issue!

Comment on lines 19 to 22
(
REGEXP_CONTAINS(body, r'(?is)<meta[^><]*Accept-CH\b') OR
REGEXP_CONTAINS(respOtherHeaders, r'(?is)Accept-CH = ')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use K&R

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean move the opening bracket up to previous line?

WHERE
      date = '2019-07-01' AND
      firstHtml AND (
        REGEXP_CONTAINS(body, r'(?is)<meta[^><]*Accept-CH\b') OR
        REGEXP_CONTAINS(respOtherHeaders, r'(?is)Accept-CH = ')
      )

Personally I find that harder to read. We have three distinct AND clauses but decide to put the start of the third one at the end of line 2.

To me this is clearer:

WHERE
      date = '2019-07-01' AND
      firstHtml AND
      (
        REGEXP_CONTAINS(body, r'(?is)<meta[^><]*Accept-CH\b') OR
        REGEXP_CONTAINS(respOtherHeaders, r'(?is)Accept-CH = ')
      )

But the linter is fine with either (through prefers the latter, hence why the auto fix moved to that when it had to fix the file for other reasons) so can correct to that if that's the preference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and the wikipedia entry has this (emphasis my own):

A multi-statement block inside a function, however, has its opening brace on the same line as its control clause while the closing brace remains on its own line

So while it makes sense to have this:

while {
  ...
}

as the { is controlled by the while and is part of that clause, I'm less convinced about independent clauses.

Surely you wouldn't do this as arguments to a function that use brackets, when they are spread over separate lines:

myfunction(
  a, (
    b+c
  ),
  d
) {

But instead would do this?:

myfunction(
  a,
  (b+c),
  d
) {

And that's exactly what we have here: independent clauses separated by an AND (instead of a comma).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah since (b+c) can fit on one line that's exactly what I'd do. Otherwise, I'd prefer to be consistent and use K&R.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying if they couldn't fit on one line you'd do this:

myfunction(
  a, (
    b +
    c
  ),
  d
) {

That's just weird to me, and not K&R, but OK can do that if we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should it be like this?

myfunction(
  a,
  (b +
    c),
  d
) {

Confused as to what it wanted here?

Comment on lines 23 to 26
(
REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K&R

Comment on lines 28 to 33
(
SELECT domain
FROM `httparchive.almanac.third_parties`
WHERE date = '2020-08-01' AND
category != 'hosting')
category != 'hosting'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K&R

Copy link
Contributor

@max-ostapenko max-ostapenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some ideas after looking into config update (excluded rules in particular).

  1. I would re-enable LT09 - Select targets on new lines.
    There is a lot of variety in styles across chapters. Adding this would also make multicolumn selects more readable.

Examples re-enabled rule formatting:

  • back on same line
SELECT DISTINCT NET.REG_DOMAIN(domain) AS domain
FROM
  `httparchive.almanac.cname_tracking`,
  UNNEST(SPLIT(SUBSTRING(domains, 2, LENGTH(domains) - 2))) AS domain
SELECT *
FROM pages
  • now with newlines
SELECT
  client,
  url AS site,
  jsonToKeyValueArray(events) AS events_per_site
FROM pages_events
  1. LT08 - Blank line separators for CTEs.
    In case of CTEs it's much more readable with separators.
    And we currently have most of CTEs with blank lines. Except markup and fonts chapters in 2022.
    I would include it.

@tunetheweb
Copy link
Member

I would re-enable LT09 - Select targets on new lines.

While I agree with this for top-level queries, it breaks subqueries like this:

NET.HOST(url) IN (SELECT domain FROM `httparchive.almanac.third_parties` WHERE date = '2019-07-01' AND category != 'hosting')

And IIRC we have a lot of those subqueries and separating them all out would make the code very verbose.

LT08 - Blank line separators for CTEs.

This looks OK to enable. We used to have a lot of these:

WITH cte1 AS (
    SELECT a FROM table1
),
cte2 AS (
    SELECT a FROM table2
)

SELECT * FROM cte1

And we didn't want the space between the CTEs but looks like most of them have spaces now and I added when adding the linter! So looks like we can enable this one.

@max-ostapenko
Copy link
Contributor

Looked into LT09 again - what do you mean by 'breaks subqueries'?

I've found 4 files that will be definitely broken by this rule enabled, e.g. sql/2019/performance/07_03d.sql.

But in all other cases these inline subqueries are as if the query says 'no need to read this one, move on'
Their FROMs are partly obscured (and surely WHEREs, if present) due to width.

@tunetheweb
Copy link
Member

Yeah "breaks" is a bit strong. And it is a lot more readable with it enforced:

 FROM
   `httparchive.pages.2019_07_01_*`
 JOIN
-  (SELECT _TABLE_SUFFIX, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+  (SELECT
+    _TABLE_SUFFIX,
+    COUNT(0) AS total
+  FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
 USING (_TABLE_SUFFIX),
   UNNEST(getElements(payload)) AS element

So maybe it is OK to enable this rule?

For the few we don't want this enabled for we can ignore it with -- noqa: disable=LT09:

#standardSQL
# 07_03d: % fast FCP per PSI by geo
WITH geos AS ( -- noqa: disable=LT09
  SELECT *, 'af' AS geo_code, 'Afghanistan' AS geo, 'Asia' AS region, 'Southern Asia' AS subregion FROM `chrome-ux-report.country_af.201907`
  UNION ALL

@tunetheweb
Copy link
Member

Actually looking over the changes with sqlfluff fix for this rule I'm not loving the changes to be honest. The fact it ONLY works on SELECT targets and not the other clauses only half fixes the issues and doesn't really improve readability IMHO.

Examples:

   FROM
     `httparchive.almanac.requests`
   JOIN (
-    SELECT _TABLE_SUFFIX AS client, url AS page, app
+    SELECT
+      _TABLE_SUFFIX AS client,
+      url AS page,
+      app
     FROM `httparchive.technologies.2022_06_01_*`
     WHERE category = 'Ecommerce'
   )
 #standardSQL
 # Top JS frameworks and libraries combinations
-SELECT
-  *
+SELECT *
 FROM (
   SELECT
     client,
 third_party_domains AS (
-  SELECT DISTINCT
-    NET.HOST(domain) AS host
+  SELECT DISTINCT NET.HOST(domain) AS host
   FROM
     `httparchive.almanac.third_parties`
 ),

In fact in there's very few cases where I do think it improves things. Maybe this one cause it only has a FROM clause (but that's very much the exception and most will have WHERE clauses or GROUP BY)?

 FROM
-  (SELECT _TABLE_SUFFIX AS client, url AS page, getCompliantElements(payload) AS compliant_elements FROM `httparchive.pages.2019_07_01_*`)
+  (SELECT
+    _TABLE_SUFFIX AS client,
+    url AS page,
+    getCompliantElements(payload) AS compliant_elements
+  FROM `httparchive.pages.2019_07_01_*`)
 JOIN

@max-ostapenko
Copy link
Contributor

max-ostapenko commented May 22, 2024

Regarding indentation discussion, here's a config to get to the expected result:

[sqlfluff:layout:type:start_bracket]
spacing_before = single:inline

If seems OK, I'll push the changes and we'll have a fresh look.

And while I agree with @tunetheweb that inline opening bracket seems strange in expressions,
I don't know a way to distinguish between expressions and subqueries.
Moving opening bracket inline also decreases the overall amount of indents (GNU -> K&R).

So we'll get:

sql/2019/media/04_09c.sql b/sql/2019/media/04_09c.sql
-          `httparchive.almanac.summary_response_bodies`
-        WHERE
-          date = '2019-07-01' AND
-          firstHtml AND
-          (
-            REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
-            REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
-          )
+      `httparchive.almanac.summary_response_bodies`
+    WHERE
+      date = '2019-07-01' AND
+      firstHtml AND (
+        REGEXP_CONTAINS(body, r'(?im)<meta[^><]*Accept-CH\b') OR
+        REGEXP_CONTAINS(respOtherHeaders, r'(?im)Accept-CH = ')
+      )

Some more examples (a lot of things go inline):
1.

2019/accessibility/09_02.sql b/sql/2019/accessibility/09_02.sql
-FROM
-  (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
-JOIN
-  (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
-USING
-  (client, page)
-JOIN
-  (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
+FROM (SELECT _TABLE_SUFFIX AS client, url AS page, getMainCount(payload) AS main_elements FROM `httparchive.pages.2019_07_01_*`)
+JOIN (SELECT client, page, ARRAY_LENGTH(REGEXP_EXTRACT_ALL(body, '(?i)role=[\'"]?main')) AS main_roles FROM `httparchive.almanac.summary_response_bodies` WHERE date = '2019-07-01' AND firstHtml)
+USING (client, page)
+JOIN (SELECT _TABLE_SUFFIX AS client, COUNT(0) AS total FROM `httparchive.pages.2019_07_01_*` GROUP BY _TABLE_SUFFIX)
2020/markup/pages_element_count_by_device_and_custom_dash_elements.sql
-CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (
-  ROUND(SAFE_DIVIDE(freq, total), 4)
+CREATE TEMP FUNCTION AS_PERCENT(freq FLOAT64, total FLOAT64) RETURNS FLOAT64 AS (ROUND(SAFE_DIVIDE(freq, total), 4)
2019/performance/07_22.sql
   SELECT
-    _TABLE_SUFFIX AS client,
-    (
+    _TABLE_SUFFIX AS client, (
       CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.Paint']"), '0') AS INT64) +
       CAST(IFNULL(JSON_EXTRACT(payload, "$['_cpu.UpdateLayerTree']"), '0') AS INT64)
     ) AS paint_cpu_time
2020/markup/pages_markup_by_device.sql
-FROM
-  (
-    SELECT
-      _TABLE_SUFFIX AS client,
-      get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
-    FROM
-      `httparchive.pages.2020_08_01_*`
-  )
+FROM (
+  SELECT
+    _TABLE_SUFFIX AS client,
+    get_markup_info(JSON_EXTRACT_SCALAR(payload, '$._markup')) AS markup_info
+  FROM
+    `httparchive.pages.2020_08_01_*`
+)

Copy link
Contributor Author

dependabot bot commented on behalf of github May 23, 2024

A newer version of sqlfluff exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.

@max-ostapenko
Copy link
Contributor

Pushed all the file updates corresponding to new indents.
@tunetheweb @rviscomi thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants