Should addcol-window pad first list with None to indicate no rows above first row? #2279

daviewales · 2024-01-31T00:25:56Z

Small description

Given a column of values, I want to create a column containing the values of this column from the previous row.

test
A
B
C
D

I press w, and input 1 0 to create a column with 1 previous value, and 0 subsequent values:

I now create a new column with the previous values using =test_window[0]:
This yields exactly what I want, except in the first row.
Rather than the previous (NULL) value, I get the value of the current row:

Expected result

I expect the window list to always contain 2 values, and be padded with None when the previous value does not exist, such as for the first window record. This might look like the following:

Steps to reproduce with sample data and a .vd

test-window.zip

Additional context
Please include the version of VisiData and Python.

saul.pw/VisiData v3.0.1
Python 3.10.12

The text was updated successfully, but these errors were encountered:

saulpw · 2024-09-05T04:51:26Z

I think you're right, @daviewales. Thanks for putting the windowing feature through its paces!

daviewales · 2024-09-05T07:47:50Z

Thanks!

midichef · 2024-09-09T09:34:26Z

I relied on the old behavior to detect the first or last values in the sheet, checking when the window list becomes 1 element shorter. It's 1 element shorter at the first row for windows like w 1 0, or at the last row for windows like w 0 1, or both, for windows like w 1 1.

An example where I used this was to rank columns by groups. However, that specific use case will be made unnecessary by #2417

So here is a good place to mention alternate methods to test for the first/last rows.
To identify the first row, use addcol-incr (by pressing i) and test for rows where A == 1, by making a column expression: = A == 1.
And to identify the last row: A == len(sheet.rows). After that you may want to run freeze-col by pressing '. Otherwise the True value for the last row will move or disappear when rows above it are inserted/deleted.

daviewales · 2024-09-09T09:49:33Z

@midichef, reminds me of:
https://xkcd.com/1172/

thejud · 2024-10-09T00:22:02Z

Is there any chance this could be undone? This behavior is not like other tools that provide window functions, for example neither postgres nor pandas pad the window. It makes it quite difficult to determine how many values were actually available for the window when it was created. You can rely on tricks like adding a row ID, and comparing the row ID to the windows size. However, it is dependent on the row id being kept in every instance where you want to evaluate the window function.

Trying to inspect the window itself to figure it out is a problem as well, since we can't determine if padded value is there because of some conversion that resulted in None, or because of a missing value.

Why it matters: Computations that rely on the number of items, (avg/mean in particular) will not work correctly.

saulpw · 2024-10-09T00:37:39Z

Well, we now have 2 votes for the old behavior (fewer values for the first and last rows), and 1 vote for the new behavior (None-padded to the same number of values for each row). I'm inclined to revert this change as requested (sorry @daviewales), and I'd like to come to a final decision before the release, which will happen within the next week. Please register your vote on this comment; 👍 to revert to the old behavior and 👎 to keep the new behavior.

daviewales · 2024-10-09T07:15:08Z

For my specific use case, I was trying to essentially create a lag function. For example, see the SQLite docs:

The first form of the lag() function returns the result of evaluating expression expr against the previous row in the partition. Or, if there is no previous row (because the current row is the first), NULL.

I can see the benefit of the window function returning only the actual values without padding, to ensure that aggregation functions work correctly.

Perhaps we could add a lag (and lead) function to support my use case too?

thejud · 2024-10-09T07:16:20Z

One more surprise with the current implementation is that if you specify a windows size larger than the total number of rows in the table, it will gladly pad up to your specified size. For example, to create a cumulative sum, I add a window >= the number of rows in the table. If you make the before rows a LOT bigger, it will be padded with that many Nones.

thejud · 2024-10-09T07:23:55Z

@daviewales I was experimenting getting the delta between rows.
I created a window with a of size 1, 0, which puts both the current and previous (if present) value into the window. Then, for instance for a delta (lag of 1), I add an expression column (with a window named win) like:

=win[1] - win[0] if len(win) > 1 else None

Does that work? It doesn't handle the partitioning part of a lag function, however.

daviewales · 2024-10-09T09:27:27Z

Yes. If we revert to the old behaviour, I could probably just do:

win[0] if len(win) > 1 else None

daviewales · 2024-10-09T11:35:23Z

I had a play, and came up with a basic lag function for my .visidatarc:

from visidata.features.window import WindowColumn

def lag(col):
    return col[0] if len(col) > 1 else None

class LagColumn(WindowColumn):
    def getValue(self, row):
        window = self.windowrows.get(id(row), None)
        return lag(window)

@Sheet.api
def addcol_lag(sheet, curcol):
    before, after = 1, 0
    newcol = LagColumn(
        curcol.name+"_lag",
        sourcecol=curcol,
        before=before,
        after=after
    )
    sheet.addColumnAtCursor(newcol)

Sheet.addCommand('', 'addcol-lag', 'addcol_lag(cursorCol)', 'add column where each row contains the value from the previous row of the current column')

Naively hacked up from here:

visidata/visidata/features/window.py

Line 39 in 545886a

class WindowColumn(Column):

This reverts commit bbfbb45.

saulpw · 2024-10-10T22:49:26Z

Okay, I'm reverting this commit, if only to have less churn from release to release. Thanks to everyone for chiming in!

daviewales added the bug label Jan 31, 2024

saulpw added a commit that referenced this issue Sep 5, 2024

[window] pad lists with None so all are same size #2279

bbfbb45

saulpw closed this as completed Sep 5, 2024

saulpw added fixed wart Little warts and quirky bumps labels Sep 5, 2024

saulpw reopened this Oct 9, 2024

saulpw added 3.1 on deck and removed bug fixed wart Little warts and quirky bumps labels Oct 9, 2024

saulpw changed the title ~~addcol-window should pad first list with None to indicate no rows above first row~~ Should addcol-window pad first list with None to indicate no rows above first row? Oct 9, 2024

saulpw added a commit that referenced this issue Oct 10, 2024

Revert "[window] pad lists with None so all are same size #2279"

a24ef18

This reverts commit bbfbb45.

saulpw closed this as completed Oct 10, 2024

saulpw removed the on deck label Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should addcol-window pad first list with None to indicate no rows above first row? #2279

Should addcol-window pad first list with None to indicate no rows above first row? #2279

daviewales commented Jan 31, 2024

saulpw commented Sep 5, 2024

daviewales commented Sep 5, 2024

midichef commented Sep 9, 2024

daviewales commented Sep 9, 2024

thejud commented Oct 9, 2024

saulpw commented Oct 9, 2024

daviewales commented Oct 9, 2024

thejud commented Oct 9, 2024 •

edited

Loading

thejud commented Oct 9, 2024

daviewales commented Oct 9, 2024 •

edited

Loading

daviewales commented Oct 9, 2024

saulpw commented Oct 10, 2024

Should addcol-window pad first list with None to indicate no rows above first row? #2279

Should addcol-window pad first list with None to indicate no rows above first row? #2279

Comments

daviewales commented Jan 31, 2024

saulpw commented Sep 5, 2024

daviewales commented Sep 5, 2024

midichef commented Sep 9, 2024

daviewales commented Sep 9, 2024

thejud commented Oct 9, 2024

saulpw commented Oct 9, 2024

daviewales commented Oct 9, 2024

thejud commented Oct 9, 2024 • edited Loading

thejud commented Oct 9, 2024

daviewales commented Oct 9, 2024 • edited Loading

daviewales commented Oct 9, 2024

saulpw commented Oct 10, 2024

thejud commented Oct 9, 2024 •

edited

Loading

daviewales commented Oct 9, 2024 •

edited

Loading