Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column names are not set correctly (MIMIC III) #1738

Open
1990PACO opened this issue Apr 20, 2024 · 1 comment
Open

Column names are not set correctly (MIMIC III) #1738

1990PACO opened this issue Apr 20, 2024 · 1 comment

Comments

@1990PACO
Copy link

Prerequisites

Description

I had implement MIMIC III as Postgres SQL localy.
When fetching the column names in Python with pyodbc they partially dont match to the .csv files in compairsion.

def query(table, sql):
"""
table: input tablename to query \n
sql: input SQL query (select * from mimiciii.table) \n
return --> Pandas Dataframe
"""
cnstring = f'DRIVER={{PostgreSQL ODBC Driver(UNICODE)}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
cnxn = pyodbc.connect(cnstring)
cursor = cnxn.cursor()
colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall()
rows = cursor.execute(sql).fetchall()
df = pd.DataFrame.from_records(data=rows, columns=[colname[0] for colname in colnames])
df.columns = [str(i).upper() for i in df.columns]
cursor.close()
return df

Example:
Wrong -- > ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME',
'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'HOSPITAL_EXPIRE_FLAG',
'HAS_CHARTEVENTS_DATA', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS',
'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION',
'DISCHARGE_LOCATION', 'INSURANCE']

Correct --> ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME',
'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'DISCHARGE_LOCATION',
'INSURANCE', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS',
'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION',
'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA']

@alistairewj
Copy link
Member

The query:

colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall()

Returns column names in a non-deterministic order, so you wouldn't expect the order to match exactly. The two groups look the same, just in a different order. I would verify that the sets of columns are equal, in which case this is expected and you just need to re-order your columns as necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants