Spreadsheets and data entry

Mike’s rules (suggestions) for data entry with spreadsheets. Applies to Microsoft Excel, LibreOffice Calc, and Google Sheets. General principles outlined apply to all spreadsheet apps, including Apple Numbers.

Before collecting data, discuss and come to agreement — what is to be measured? How do we record data (paper lab notebook, digital notebook, spreadsheet)?

If multiple reporters of data are involved, then spend time with a data entry form. Google Forms is easy to use and it permits linking to Google Sheets. Once the form is done, simply add data via the form and the spreadsheet file is automatically populated with your data.

Because this page is about data entry and spreadsheets, we’ll move past general discussions about how to measure and to record data to focus on how to do data entry.

  1. Before working with spreadsheet data, turn off
    • automatic completion/input features. LibreOffice Calc: Tools → AutoInput; Microsoft Excel: File → Options → Advanced and clear the Enable AutoComplete for cell values check box; Google Sheets: Tools → Autocomplete, then deselect Enable autocomplete.
    • automatic correction features. LibreOffice Calc: Tools → Auto Correct options → Options and Local options tabs, then deselect selections; Microsoft Excel: File → Options → Proofing and clear the various check boxes; Google Sheets apparently lacks auto correct (?).
    • Why? Because you need complete control of data entry. Auto complete functions GUESS your intent; Auto correct may change entries like DAta to Data. Yes, that’s probably a mistake, but these “predictive text” features are known to cause data entry problems in science (see Ziemann and Abeysooriya 2021).
      • So yeah, proofing your data entries needs to be done and so it’s a good idea to share data entry with some one — one  person types the other proof reads.
      • After data entry is complete, you can turn back on Spell check etc and partially automate proof reading.
  2. Spend time with the meta data, also known as the data dictionary or codebook.
    • Enter names of observers and their contact information
    • Enter date and time of observations
    • Include statement about the time format used, e.g., month-day-year is common in USA, but not other countries.
    • Similarly for time, include statement of time format. Spreadsheets are notorious for
    • List the variable by name (remember, short names are best, no spaces, no special characters) and include a description of the variable name, even if it seems obvious.
    • Declare the data type and units for each variable.
    • Enter date and time of data entry — which can be different than date/time of observation and data collection.
    • Provide a brief overview of what the data is about — what purpose was the data collected for?
    • Record names of persons who did data entry and proofing.
  3. After creating the sheet with meta data, create a second sheet for the raw data or primary data.
    • Set up the spreadsheet as rows and columns. Columns are for variables, rows are for observations. Observations typically are the sampling unit.
      • For example, repeat observations of blood pressure on a subject, the row starts with the subject id, then each column is a single reading of blood pressure.
        • For composite measures like blood pressure, there should be a Systolic column and a Diastolic column — avoid writing composite measures as 120/80.
    • Do not mix type: a column should have either numbers or text, not both.
      • For example, 120 mmHg is not a good entry because both number and text are included. Thus, the entire column is treated as text.
        • And besides, you reported units in the meta data, didn’t you?
  4. Create a third worksheet for data wrangling, the process of converting raw data into a usable form. For example, our raw data for calorimeter includes mass of water, initial and final water temperatures, initial and final mass of food sample. We want the kilocalories per gram of food stuff; formulas to accomplish these calculations should be in this third worksheet.
    • Continuing on this theme, plots belong in their own worksheets, not the sheets with raw data.
  5. This is just a start, and rather than reading me, go to Preparing Data in Excel at College of Public Health at University of Nebraska Medical Center. Their site, and many others, provide good guidelines that will stand you well.