-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should addcol-window pad first list with None to indicate no rows above first row? #2279
Comments
I think you're right, @daviewales. Thanks for putting the windowing feature through its paces! |
Thanks! |
I relied on the old behavior to detect the first or last values in the sheet, checking when the window list becomes 1 element shorter. It's 1 element shorter at the first row for windows like An example where I used this was to rank columns by groups. However, that specific use case will be made unnecessary by #2417 So here is a good place to mention alternate methods to test for the first/last rows. |
@midichef, reminds me of: |
Is there any chance this could be undone? This behavior is not like other tools that provide window functions, for example neither postgres nor pandas pad the window. It makes it quite difficult to determine how many values were actually available for the window when it was created. You can rely on tricks like adding a row ID, and comparing the row ID to the windows size. However, it is dependent on the row id being kept in every instance where you want to evaluate the window function. Trying to inspect the window itself to figure it out is a problem as well, since we can't determine if padded value is there because of some conversion that resulted in None, or because of a missing value. Why it matters: Computations that rely on the number of items, (avg/mean in particular) will not work correctly. |
Well, we now have 2 votes for the old behavior (fewer values for the first and last rows), and 1 vote for the new behavior (None-padded to the same number of values for each row). I'm inclined to revert this change as requested (sorry @daviewales), and I'd like to come to a final decision before the release, which will happen within the next week. Please register your vote on this comment; 👍 to revert to the old behavior and 👎 to keep the new behavior. |
For my specific use case, I was trying to essentially create a lag function. For example, see the SQLite docs:
I can see the benefit of the window function returning only the actual values without padding, to ensure that aggregation functions work correctly. Perhaps we could add a lag (and lead) function to support my use case too? |
One more surprise with the current implementation is that if you specify a windows size larger than the total number of rows in the table, it will gladly pad up to your specified size. For example, to create a cumulative sum, I add a window >= the number of rows in the table. If you make the before rows a LOT bigger, it will be padded with that many Nones. |
@daviewales I was experimenting getting the delta between rows.
Does that work? It doesn't handle the partitioning part of a lag function, however. |
Yes. If we revert to the old behaviour, I could probably just do:
|
I had a play, and came up with a basic from visidata.features.window import WindowColumn
def lag(col):
return col[0] if len(col) > 1 else None
class LagColumn(WindowColumn):
def getValue(self, row):
window = self.windowrows.get(id(row), None)
return lag(window)
@Sheet.api
def addcol_lag(sheet, curcol):
before, after = 1, 0
newcol = LagColumn(
curcol.name+"_lag",
sourcecol=curcol,
before=before,
after=after
)
sheet.addColumnAtCursor(newcol)
Sheet.addCommand('', 'addcol-lag', 'addcol_lag(cursorCol)', 'add column where each row contains the value from the previous row of the current column') Naively hacked up from here: visidata/visidata/features/window.py Line 39 in 545886a
|
Okay, I'm reverting this commit, if only to have less churn from release to release. Thanks to everyone for chiming in! |
Small description
Given a column of values, I want to create a column containing the values of this column from the previous row.
I press
w
, and input1 0
to create a column with 1 previous value, and 0 subsequent values:I now create a new column with the previous values using
=test_window[0]
:This yields exactly what I want, except in the first row.
Rather than the previous (
NULL
) value, I get the value of the current row:Expected result
I expect the window list to always contain 2 values, and be padded with
None
when the previous value does not exist, such as for the first window record. This might look like the following:Steps to reproduce with sample data and a .vd
test-window.zip
Additional context
Please include the version of VisiData and Python.
saul.pw/VisiData v3.0.1
Python 3.10.12
The text was updated successfully, but these errors were encountered: