Spreadsheets and data entry
This page describes Mike’s rules (suggestions) for data entry with spreadsheets.
These suggestions are general and apply to Microsoft Excel, LibreOffice Calc, and Google Sheets. General principles outlined apply to all spreadsheet apps, including Apple Numbers.
As a team, decide and record roles for each team member: roles may include reporter, data entry, data verification, data calculator
Before collecting data, discuss and come to agreement — What is the primary aim (question) of the project? What is to be measured? How do we record data (paper lab notebook, digital notebook, spreadsheet)?
If multiple reporters of data are involved, then spend time with a data entry form. Google Forms is easy to use and it permits linking to Google Sheets. Once the form is done, simply add data via the form and the spreadsheet file is automatically populated with your data.
General scheme for a spreadsheet data file
Figure 1. A general scheme for a spreadsheet data file with five sheets (workbooks). For this example, the workbooks consist of a single metadata sheet and sheets for data of the two projects. Each project is organized into one sheet for raw data and one sheet for any processed data.
Data entry and data file checklist
Note: Lots of items here and clearly, some or more essential to include than others.
Description | Pertains to | Criteria |
---|---|---|
File extension | File | File saved as proper file type |
Filename | File | Follows proper file naming protocols |
One project per worksheet | File | If multiple projects, then each reported in its own sheet |
Worksheets | File | Each sheet renamed |
Censored data | Metadata | How are censored data marked |
Describe measure | Metadata | For each measure or observation, include brief description of how measure was made |
Instruments used | Metadata | Include with description the instrument used, including manufacturer if relevant |
Metadata sheet included | Metadata | Includes one data dictionary sheet with appropriate documentation |
Missing value defined | Metadata | Report how missing values treated. Do not use 0 for missing value |
Numerical entry | Metadata | Report, cell type (decimal, fraction) |
Project date | Metadata | Date of project recorded |
Project description | Metadata | Identify the project and briefly describe purpose or question(s) asked |
Project time | Metadata | If relevant, start and end time of project recorded |
Primary data recorded | Metadata | Clearly identify raw vs processed numbers |
Record keeping | Metadata | Identify format of original data and archiving strategy. |
Significant figures | Metadata | Specifies number of significant figures used |
Subject ID | Metadata | Data entry used subject ID |
Team members | Metadata | Metadata contains list of personnel and roles |
Truncated data | Metadata | How are truncated data marked? |
Units defined | Metadata | Metadata contains list of units for each observation or measurement type |
Consistent data organization | Sheet | Data are organized in stacked (narrow) or unstacked (wide) format |
No formatting | Sheet | No borders, no color, no bold type, etc. |
No mixed data types | Sheet | Each column contains numbers or text, not mix of both |
Observation Comments | Sheet | Include notes needed to clarify observations |
Observation date | Sheet | If relevant, date observation recorded |
Observation time | Sheet | If relevant, time observation recorded |
Rows by Columns | Sheet | Data oriented by rows (individual subject) and columns: (variables) |
Subject ID vs Observer | Sheet | Observer and subject clearly defined |
Variable names | Sheet | Proper variable names, one word or acronym, no spaces |
Tips for using spreadsheet apps for data entry
Because this page is about data entry and spreadsheets, we’ll move past general discussions about how to measure and to record data to focus on how to do data entry. Scroll to end of the page for a checklist (rubric).
- Before working with spreadsheet data, turn off
- automatic completion/input features. LibreOffice Calc: Tools → AutoInput; Microsoft Excel: File → Options → Advanced and clear the Enable AutoComplete for cell values check box; Google Sheets: Tools → Autocomplete, then deselect Enable autocomplete.
- automatic correction features. LibreOffice Calc: Tools → Auto Correct options → Options and Local options tabs, then deselect selections; Microsoft Excel: File → Options → Proofing and clear the various check boxes; Google Sheets apparently lacks auto correct (?).
- Why? Because you need complete control of data entry. Auto complete functions GUESS your intent; Auto correct may change entries like
DAta
toData
. Yes, that’s probably what you intended, but these “predictive text” features are known to cause data entry problems in science (see Ziemann and Abeysooriya 2021).- So yeah, proofing your data entries needs to be done and so it’s a good idea to share data entry with some one — one person types the other proof reads.
- After data entry is complete, you can turn back on Spell check etc and partially automate proof reading.
- Spend time with the metadata, also known as the data dictionary or codebook.
- Enter names of observers and their contact information
- Enter date and time of observations
- Include statement about the time format used, e.g., month-day-year is common in USA, but not other countries.
- Similarly for time, include statement of time format. Spreadsheets are notorious for
- List the variable by name (remember, short names are best, no spaces, no special characters) and include a description of the variable name, even if it seems obvious.
- Declare the data type and units for each variable.
- Enter date and time of data entry — which can be different than date/time of observation and data collection.
- Provide a brief overview of what the data is about — what purpose was the data collected for?
- Record names of persons who did data entry and proofing.
- After creating the sheet with metadata, create a second sheet for the raw data or primary data.
- Set up the spreadsheet as rows and columns. Columns are for variables, rows are for observations. Observations typically are the sampling unit.
- For example, repeat observations of blood pressure on a subject, the row starts with the subject id, then each column is a single reading of blood pressure.
- For composite measures like blood pressure, there should be a Systolic column and a Diastolic column — avoid writing composite measures as 120/80.
- For example, repeat observations of blood pressure on a subject, the row starts with the subject id, then each column is a single reading of blood pressure.
- Do not mix type: a column should have either numbers or text, not both.
- For example, 120 mmHg is not a good entry because both number and text are included. Thus, the entire column is treated as text.
- And besides, you reported units in the metadata, didn’t you?
- For example, 120 mmHg is not a good entry because both number and text are included. Thus, the entire column is treated as text.
- Set up the spreadsheet as rows and columns. Columns are for variables, rows are for observations. Observations typically are the sampling unit.
- Create a third worksheet for data wrangling, the process of converting raw data into a usable form. For example, our raw data for calorimeter includes mass of water, initial and final water temperatures, initial and final mass of food sample. We want the kilocalories per gram of food stuff; formulas to accomplish these calculations should be in this third worksheet.
- Continuing on this theme, plots belong in their own worksheets, not the sheets with raw data.
- This is just a start, and rather than reading me, go to Preparing Data in Excel at College of Public Health at University of Nebraska Medical Center. Their site, and many others, provide good guidelines that will stand you well.