Spreadsheets and data entry

This page describes Mike’s rules (suggestions) for data entry with spreadsheets.

These suggestions are general and apply to Microsoft Excel, LibreOffice Calc, and Google Sheets. General principles outlined apply to all spreadsheet apps, including Apple Numbers.

As a team, decide and record roles for each team member: roles may include reporter, data entry, data verification, data calculator

Before collecting data, discuss and come to agreement — What is the primary aim (question) of the project? What is to be measured? How do we record data (paper lab notebook, digital notebook, spreadsheet)?

If multiple reporters of data are involved, then spend time with a data entry form. Google Forms is easy to use and it permits linking to Google Sheets. Once the form is done, simply add data via the form and the spreadsheet file is automatically populated with your data.

General scheme for a spreadsheet data file

A general scheme for a spreadsheet data file with five sheets (workbooks). For this example, the workbooks consist of a single metadata sheet and sheets for data of the two projects. Each project is organized into one sheet for raw data and one sheet for any processed data.

Figure 1. A general scheme for a spreadsheet data file with five sheets (workbooks). For this example, the workbooks consist of a single metadata sheet and sheets for data of the two projects. Each project is organized into one sheet for raw data and one sheet for any processed data.

Data entry and data file checklist

Note: Lots of items here and clearly, some or more essential to include than others.

DescriptionPertains toCriteria
File extensionFileFile saved as proper file type
FilenameFileFollows proper file naming protocols
One project per worksheetFileIf multiple projects, then each reported in its own sheet
WorksheetsFileEach sheet renamed
Censored dataMetadataHow are censored data marked
Describe measureMetadataFor each measure or observation, include brief description of how measure was made
Instruments usedMetadataInclude with description the instrument used, including manufacturer if relevant
Metadata sheet includedMetadataIncludes one data dictionary sheet with appropriate documentation
Missing value definedMetadataReport how missing values treated. Do not use 0 for missing value
Numerical entry
MetadataReport, cell type (decimal, fraction)
Project dateMetadataDate of project recorded
Project descriptionMetadataIdentify the project and briefly describe purpose or question(s) asked
Project timeMetadataIf relevant, start and end time of project recorded
Primary data recordedMetadataClearly identify raw vs processed numbers
Record keepingMetadataIdentify format of original data and archiving strategy.
Significant figuresMetadataSpecifies number of significant figures used
Subject IDMetadataData entry used subject ID
Team membersMetadataMetadata contains list of personnel and roles
Truncated dataMetadataHow are truncated data marked?
Units definedMetadataMetadata contains list of units for each observation or measurement type
Consistent data organizationSheetData are organized in stacked (narrow) or unstacked (wide) format
No formattingSheetNo borders, no color, no bold type, etc.
No mixed data typesSheetEach column contains numbers or text, not mix of both
Observation CommentsSheetInclude notes needed to clarify observations
Observation dateSheetIf relevant, date observation recorded
Observation timeSheetIf relevant, time observation recorded
Rows by ColumnsSheetData oriented by rows (individual subject) and columns: (variables)
Subject ID vs ObserverSheetObserver and subject clearly defined
Variable namesSheetProper variable names, one word or acronym, no spaces

Tips for using spreadsheet apps for data entry

Because this page is about data entry and spreadsheets, we’ll move past general discussions about how to measure and to record data to focus on how to do data entry. Scroll to end of the page for a checklist (rubric).

  1. Before working with spreadsheet data, turn off
    • automatic completion/input features. LibreOffice Calc: Tools → AutoInput; Microsoft Excel: File → Options → Advanced and clear the Enable AutoComplete for cell values check box; Google Sheets: Tools → Autocomplete, then deselect Enable autocomplete.
    • automatic correction features. LibreOffice Calc: Tools → Auto Correct options → Options and Local options tabs, then deselect selections; Microsoft Excel: File → Options → Proofing and clear the various check boxes; Google Sheets apparently lacks auto correct (?).
    • Why? Because you need complete control of data entry. Auto complete functions GUESS your intent; Auto correct may change entries like DAta to Data. Yes, that’s probably what you intended, but these “predictive text” features are known to cause data entry problems in science (see Ziemann and Abeysooriya 2021).
      • So yeah, proofing your data entries needs to be done and so it’s a good idea to share data entry with some one — one  person types the other proof reads.
      • After data entry is complete, you can turn back on Spell check etc and partially automate proof reading.
  2. Spend time with the metadata, also known as the data dictionary or codebook.
    • Enter names of observers and their contact information
    • Enter date and time of observations
    • Include statement about the time format used, e.g., month-day-year is common in USA, but not other countries.
    • Similarly for time, include statement of time format. Spreadsheets are notorious for
    • List the variable by name (remember, short names are best, no spaces, no special characters) and include a description of the variable name, even if it seems obvious.
    • Declare the data type and units for each variable.
    • Enter date and time of data entry — which can be different than date/time of observation and data collection.
    • Provide a brief overview of what the data is about — what purpose was the data collected for?
    • Record names of persons who did data entry and proofing.
  3. After creating the sheet with metadata, create a second sheet for the raw data or primary data.
    • Set up the spreadsheet as rows and columns. Columns are for variables, rows are for observations. Observations typically are the sampling unit.
      • For example, repeat observations of blood pressure on a subject, the row starts with the subject id, then each column is a single reading of blood pressure.
        • For composite measures like blood pressure, there should be a Systolic column and a Diastolic column — avoid writing composite measures as 120/80.
    • Do not mix type: a column should have either numbers or text, not both.
      • For example, 120 mmHg is not a good entry because both number and text are included. Thus, the entire column is treated as text.
        • And besides, you reported units in the metadata, didn’t you?
  4. Create a third worksheet for data wrangling, the process of converting raw data into a usable form. For example, our raw data for calorimeter includes mass of water, initial and final water temperatures, initial and final mass of food sample. We want the kilocalories per gram of food stuff; formulas to accomplish these calculations should be in this third worksheet.
    • Continuing on this theme, plots belong in their own worksheets, not the sheets with raw data.
  5. This is just a start, and rather than reading me, go to Preparing Data in Excel at College of Public Health at University of Nebraska Medical Center. Their site, and many others, provide good guidelines that will stand you well.