Skip to content
This repository has been archived by the owner on Sep 7, 2022. It is now read-only.

Handle Missing Data #20

Open
sharmaashish opened this issue May 3, 2017 · 6 comments
Open

Handle Missing Data #20

sharmaashish opened this issue May 3, 2017 · 6 comments
Assignees
Milestone

Comments

@sharmaashish
Copy link
Contributor

We need to setup our filters and visualizations, so that we can exclude missing/invalid data

@birm birm self-assigned this May 3, 2017
@birm
Copy link
Member

birm commented May 3, 2017

I'm not sure how we'd mark missing data, but I'm sure there's a standard. My thought is to use dataDescription.json to signify whether to enforce datatype and how to handle missing data.
Currently, for example, "datatype":"int" seems to allow decimal numbers.

@sharmaashish
Copy link
Contributor Author

So here's what happens. Let's say we have an attribute called age. It's a numeric attribute. But missing data is labeled as NA or null or has no value. So we start treating it as a string. That is a mistake. It should be treated as a number. Open to suggestions

@birm
Copy link
Member

birm commented May 4, 2017

It seems like this should be done in dataDescription.json, I was envisioning something like:

...{
    "attributeName": "Fare",
    "attributeType": [
        "filtering",
        "visual"
    ],
    "dataProvider": "",
    "datatype": "integer",
    **"enforce": {"present"|"integer"|"numeric"|{some regex}}**
}...

where I'm adding the enforce field.
A solution like this seems necessary, because there may be cases where what is invalid is different or very specific.
I'm not sure how people will think of missing/invalid data here, though. Maybe it makes more sense to describe what missing data looks like instead of present data?

@birm
Copy link
Member

birm commented May 8, 2017

From the scrum, it seems as though we need a bit of a more nuanced approach than filtering out data that doesn't meet a specified pattern.
So, it sounds like this issues now has two parts, marking data as missing/invalid, and rendering that data. In my "missing data" branch, I'm working on this strategy, starting with marking the data.

@birm
Copy link
Member

birm commented May 8, 2017

As not to flood this issue with comments, I'm making a document: here

@birm
Copy link
Member

birm commented May 23, 2017

Need to have a distinction between missing and Not Applicable.

@birm birm added this to the 0.3 milestone May 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants