One of the most important parts of the scientific process is communicating results, and a major aspect of that is displaying the data. Often, that means some sort of graph. However, it’s not always easy to decide which type of graph to use to best convey the message you want it to. Graphs should tell the story of the data, and it takes attention to the type of graph as well as formatting and style choices to ensure that the story is told well. You may need to try different graphs displaying the same data before you figure out the most effective way to communicate your story.
Here are a few examples to illustrate those concepts using GLOBE data collected during the total solar eclipse that occurred on August 21st, 2017, all created with commonly available spreadsheet software. You can find out more about how to access this dataset on the GLOBE Observer data analysis page. We also encourage teachers and students to use the eclipse data for their own projects, whether for the International Virtual Science Symposium and/or North American Student Research Symposia or just on your own, and you can get special recognition if you do. Find more information about that on the data analysis page, or email me with questions.
Bar or column graphs are used to compare values between different groups, including showing changes over time. If your x-axis will be categories or ranges, rather than a specific number, this type of graph is a good choice.
Consider the two graphs below. The first is a fairly standard column graph showing the cloud cover reported in different regions of Nebraska before, during, and after the eclipse. The color scale was chosen so that clear sky observations appear in lighter color, while a more overcast sky appears darker. You can get a sense that it was clearer to the west and cloudier to the east. However, because of the different numbers of observations, that message is still a bit hard to interpret.
Now look at the second graph. This one combines all of the observations from one time period into a single column divided proportionally, making it much clearer which type of cloud cover was reported most in each area for each time window. Also, the color scale has been tweaked to make the contrast between clear and cloudy even easier to see. From this, you can tell that western Nebraska was overall much clearer on the day of the eclipse, but also that all three regions got cloudier over the course of the day.1
Scatterplots are useful when you are trying to determine the relationship between two different variables. If both the x-axis and the y-axis are specific points, not ranges, and especially if you have multiple points that have the same x-axis value, this is a good type of graph to use.
In the case of the eclipse, we may want to know how the temperature varied during the day compared to when the maximum obscuration (maximum darkness) occurred. Multiple observers in a region may have been taking measurements at the same time, so we can’t use a standard line graph. The scatterplot below displays data collected in western Nebraska, with air temperature on the y-axis and the time in minutes before or after the maximum eclipse on the x-axis. (In this location, the maximum eclipse was 100% obscuration, or totality.) The scattered points by themselves begin to tell the story of the dip in temperature during the eclipse, but adding a line with the temperature averaged over ten-minute periods helps make that trend even clearer.
Like bar graphs, line graphs are useful for comparing changes over time, but can be better to use if the data is continuous (not in distinct, separate categories), meaning it makes sense to connect the dots. In addition, stacking lines on a graph can visually compare data over the same period of time from different groups or locations.
In the final graph I’ll talk about today, you can see the average temperatures from three different regions in Nebraska, plotted against the time relative to maximum obscuration. You might notice that the minimum temperature for all three regions was not at the point of maximum eclipse, but actually a few minutes after that, something that is easier to see with the multiple lines all on the same graph. This type of graph helps highlight that detail of the story.
I’ve added a text box giving the drop in average air temperature during the eclipse itself for all three regions (a shorter time that the full data appearing on the graph), calculated from my data table. This information is useful but not easy to read directly from the graph. In this case, it helps tell the story that even though western Nebraska was coolest overall, the drop in temperature there was actually the largest among the three regions. Based on our previous analysis, that region also had the clearest skies. Could there be a connection? One location isn’t enough to make the determination one way or another, but by examining other data collected during the eclipse, I could see if my hypothesis is supported or not. The possibilities for data exploration are practically endless, all ready to be conveyed with just the right graph!
References:
National Center for Education Statistics, “How to Choose Which Type of Graph to Use?”, https://nces.ed.gov/nceskids/help/user_guide/graph/whentouse.asp
TED Ed Lesson, “Choosing the Right Graph,” https://ed.ted.com/on/GV5hkNIA
Notes:
1. It’s worth noting that some information is lost in this depiction of the data – we no longer can tell that significantly more observations were reported during the eclipse than before or after, meaning each of those observations gets more weight in its column. That means an inaccurate observation is more likely to skew the data. Trade-offs in what gets emphasized is part of choosing between graphs.