Skip to content

Experimental Data Guidelines

Randy McDermott edited this page Feb 21, 2020 · 12 revisions

Guidelines for experimental data files

The MaCFP repository stores both experimental data and computational results. There are many files to be processed and compared. In order to make the processing scripts as simple as possible, here are some simple rules for submitting data.

  1. Submit all data as ASCII comma-delimited column format (.csv files).

  2. Do put spaces in the filenames. Use underscore _ or hyphen -.

  3. Include a README.md (Markdown format) to briefly describe the files in a given directory. This file will be automatically interpreted by GitHub as Markdown text and serve as a website for your data.

  4. Use unique header names. For example, the following may seem well organized in an Excel spreadsheet, but it is problematic for script processing. Please do not do this:

    T (K) T(K) T(K) T(K)
    ht (m) 1D 2D 1D 2D
    X O2 0.15 0.15 0.21 0.21
    Time (s)
    0 300 300 300 300
    1 310 320 330 350

    Instead, do this:

    s K K K K
    Time 1D_p15 2D_p15 1D_p21 2D_p21
    0 300 300 300 300
    1 310 320 330 350

    It is fine for units to have their own row, but other information needs to be compressed into a single unique header name. The file should be labeled to identify the data, "temperature" in this case. It is also fine to break the data up into multiple files, so long as the column headers are unique in each file.

  5. Please do not use commas as part of a column header name, even if contained inside quotes. Excel may be fine with this, but most other parsers are not. So, something like "Radius 1 cm, Height 1 m" will look like two columns to Python, for example (unless more complicated scripts are written). Other characters that may need to be escaped in a parser should also be avoided: quotation marks, apostrophes, ampersands, etc., should be avoided. Keep it simple. "Radius 1 cm Height 1 m" is fine. Parentheses and hyphens are also fine, for example, "Temp (K)-Z=1 m".

  6. Be reasonable with the precision of the data that you submit. If the data are time series, for example, what frequency do you need? Minimize as much as possible without compromising the utility of the data.

  7. If possible, set line endings in text files to Unix (LF). Here is an article describing ways to do this. This can also be done using the vim editor (available in the Git BASH shell that comes with Git for Windows, which is highly recommended over other Git platforms for Windows). The Git BASH shell operates like a linux terminal. You can look at the file with $ vi -b filename and if you see "^M" at the end of the lines, you have the wrong line endings. This can be converted using an sed command as follows, $ sed -i "s/^M//g" filename, where ^M must be entered using ctrl-v followed by ctrl-m. Finally, do not stress too much about the line endings. If none of the above work for you, just forward the files and we will take care of it before adding to the repo.