-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null safe equality inconsistency #4289
Comments
Yes, agree this is unfortunate. Worse — So I think we need to consider that either columns are compared differently to scalars, or we have to go back to three-valued logic like SQL does... |
BigQuery supports it: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#is_distinct
It is not only the columns, it is any expression which differs from literal NULL: let b = null + 1
let c = null
from test
derive {
a != null,
a != b,
a != c,
} SELECT
*,
a IS NOT NULL,
a <> NULL + 1,
a IS NOT NULL
FROM
test Special-casing I agree implementing this feature for every database might be complicated, and the universally-supported syntax is not good, and might feel clunky in many cases, but maybe this comparison should not be the default behavior? Maybe the compiler should display error when it sees obviously wrong comparison (Which is currently being translated to IS NULL/IS NOT NULL), and suggest to use other method for NULL checking? Maybe even some different operator, I.e |
Thanks, my mistake
Yes, great examples.
Though how would be tell if a user is doing something obviously wrong?
Yup. the exact operator doesn't matter so much — I think this is just a tradeoff of "inherit small annoyances of SQL" vs. "try and create a better world, but then try and plug leaky abstractions"... |
Well, If == will become completly not null-safe (Which it already is, it only handles null-safety in one case), then the code which is currently being translated to
|
Oh yes totally! Agree if we change to another operator. So I think we're left with the decisions between:
Any thoughts @PRQL/prql-team ? |
Here is one of the past discussions I remember: #905 On a related note, does the PRQL need to consider the treatment of NaN? |
As I tried to argue in #905 (comment), I think having a representation of "missing" is key to data analysis and the default behaviour should be that My preferred solution would be: Introduce
|
I too think that let x = 1
let y = 2
let z = null
x == y # false
z == z # true
(x + z) == (y + z) # true!? |
An except about null handling from page 246 in The Third Manifesto.
Which suggest that the PRQL query: from Cats
filter Name == "Garfield" should translate into: SELECT * FROM Cats WHERE Name = 'Garfield' OR Name IS NULL; |
@vanillajonathan I think you have a typo in your example. I don't think what you've written should be true but I think that one could reasonably expect that from Cats
filter Name != "Garfield" should translate into: SELECT * FROM Cats WHERE Name != 'Garfield' OR Name IS NULL; For someone only familiar with boolean logic where the Law of the Excluded Middle holds, that would certainly be the expectation. Stated differently, in such a world you would expect the following query to give you back the original dataset let data = [{a=0}, {a=1}, {a=null}]
let where_true = (from data | filter a==1)
let where_false = (from data | filter a!=1)
from where_true
append where_false However if you run this in the playground you will find that that's not the case. The introduction of "missingness" into data analysis takes the view that when an observation is missing, we can't say much about it. So if we have a cat for which we don't know the name, we can't say whether the name is "Garfield" or not, but likewise we can't guarantee that it is not "Garfield". Therefore we need to introduce a 3 Value Logic where propositions are either TRUE, FALSE, or NULL. So in order to recover our orginal dataset let where_null = (from data | filter a==null)
from where_true
append where_false
append where_null Another way to think about it is that if some of the missing data is filled in over time, you might expect your For example, say you have a table of mushroom species with an attribute let good_to_eat = (
from mushrooms
filter is_poisonous != true
) You don't want your With the introduction of the from data
filter a===1
append (from data | filter !== 1) should recover the original set |
I like the mushroom example. 👍️ |
What happened?
PRQL is not consistent with null comparisons, it transforms
== null
and!= null
toIS NULL
andIS NOT NULL
, yet does not emit similar things for two column comparisons.Given that == is already null safe in some contexts, it should also be null-safe in others, I.e using
Sqlite's IS (https://www.sqlite.org/lang_expr.html#isisnot)it also supports standardIS NOT DISTINCT FROM
PRQL input
SQL output
Expected SQL output
MVCE confirmation
Anything else?
No response
The text was updated successfully, but these errors were encountered: