-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column names are not set correctly (MIMIC III) #1738
Comments
The query: colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall() Returns column names in a non-deterministic order, so you wouldn't expect the order to match exactly. The two groups look the same, just in a different order. I would verify that the sets of columns are equal, in which case this is expected and you just need to re-order your columns as necessary. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
Description
I had implement MIMIC III as Postgres SQL localy.
When fetching the column names in Python with pyodbc they partially dont match to the .csv files in compairsion.
def query(table, sql):
"""
table: input tablename to query \n
sql: input SQL query (select * from mimiciii.table) \n
return --> Pandas Dataframe
"""
cnstring = f'DRIVER={{PostgreSQL ODBC Driver(UNICODE)}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'
cnxn = pyodbc.connect(cnstring)
cursor = cnxn.cursor()
colnames = cursor.execute(f"SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{table}';").fetchall()
rows = cursor.execute(sql).fetchall()
df = pd.DataFrame.from_records(data=rows, columns=[colname[0] for colname in colnames])
df.columns = [str(i).upper() for i in df.columns]
cursor.close()
return df
Example:
Wrong -- > ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME',
'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'HOSPITAL_EXPIRE_FLAG',
'HAS_CHARTEVENTS_DATA', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS',
'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION',
'DISCHARGE_LOCATION', 'INSURANCE']
Correct --> ['ROW_ID', 'SUBJECT_ID', 'HADM_ID', 'ADMITTIME', 'DISCHTIME',
'DEATHTIME', 'EDREGTIME', 'EDOUTTIME', 'DISCHARGE_LOCATION',
'INSURANCE', 'LANGUAGE', 'RELIGION', 'MARITAL_STATUS',
'ETHNICITY', 'DIAGNOSIS', 'ADMISSION_TYPE', 'ADMISSION_LOCATION',
'HOSPITAL_EXPIRE_FLAG', 'HAS_CHARTEVENTS_DATA']
The text was updated successfully, but these errors were encountered: