A data scientist’s views on data literacy
Kirk Borne on the value of universal understanding of data and its effect on society
Jeff Alford, SAS Insights Editor
As we continue to battle the coronavirus pandemic, it’s difficult to recall a time when we have been more bombarded with facts, figures and statistics. Statistical curves (and the hoped-for flattening of them), infection rates and, sadly, mortality rates have been difficult to compare, contrast and parse.
This means being data literate is more important than ever. Without some understanding of the ways data can be presented, it’s difficult to distinguish between good and bad analyses. For instance, how do you know which of the differing reports to believe about how long the virus lives on different types of surfaces or even in the air? Being data literate provides us with another weapon to bring to bear on COVID-19 and its effects on us all.
Become a certified data scientist
Interested in data curation, advanced analytics, AI and machine learning? Data scientists are in demand in an ever-expanding world of data. We can help prepare you to become a credentialed data scientist.
We recently had a chance to pick Kirk Borne’s brain on this important topic of data literacy – what it is, why it’s important in our lives and how you can live a better life by being more data literate.
He is a data scientist and astrophysicist and has worked at global technology and consulting firm Booz Allen Hamilton since 2015. His roles include principal data scientist, data science fellow and executive advisor. He provides thought leadership, mentoring, training and consulting activities in data science, machine learning and AI across multiple disciplines. Previously, he was a professor of astrophysics at George Mason University for 12 years in the graduate and undergraduate data science programs. Prior to that, he spent nearly 20 years supporting data systems activities for NASA space science programs, including a role as NASA’s Data Archive Project scientist for the Hubble Space Telescope and as a contract manager in NASA's Astronomy Data Center and Space Science Data Operations Office.
What is data literacy?
Data literacy has several components, which together add up to someone becoming a data-literate person. One of the formal definitions states that data literacy is "the ability to read, work with, analyze, and argue with data." Being data literate means possessing an understanding of what data is and its characteristics (sources, types, formats and data features), data applications (for analysis, business intelligence, data science, decision support, artificial intelligence, automation and analytics), data techniques (such as pattern discovery, pattern recognition and prediction), and data communication (for instance, storytelling, evidence-based reasoning, decision support and visualization).
The first thing to do is to recognize that data is everywhere, that nearly everything is digital and that those digital things produce and consume data. Kirk Borne Data Scientist
Why is data literacy gaining in importance and why are we hearing more about it now?
Data literacy is growing in importance for multiple reasons. I group these reasons into three categories:
Individuals. There are huge career opportunities and numerous job openings. My own experience in teaching also shows me that most students become fascinated with the topic once they understand what it is and why it is important.
Organizations. For organizations, there is enormous pressure to use their massive stockpiles of data for business insight, innovation and value creation. An organization's data is now one of its most valuable assets, and it is a renewable asset, that is, the same data can be used and re-used in different applications to fuel various projects and to enrich multiple value streams.
Market forces. In addition, market forces are rewarding organizations that are data-driven and that have a data-literate workforce. Organizations that lag in these areas are also beginning to lag in competitiveness, recruiting top talent and market value.
What are good first steps to becoming data literate?
The first thing to do is to recognize that data is everywhere, that nearly everything is digital and that those digital things produce and consume data.
Examples include chatbots, online recommendations, autonomous vehicles, predictive modeling, predictive maintenance, fraud detection, claims processing, social sentiment analysis, fake news detection, facial recognition (a nice feature for touch-free login on your smartphone) and text message auto-complete, just to name a few. Awareness of how much data and data applications permeate our daily life is the first step toward data literacy.
The next step is to realize that nearly every person, thing and activity in the world is producing data, and those data sources are the input to processes that are generating value (e.g., products, decisions and actions) for someone or for some organization and for almost every single industry, job, and market. Hopefully people can start to envision themselves as contributors and consumers of data.
The third step to becoming data literate is that people must see that they can learn about and be part of the digital transformation of the world. I teach the broad concepts of data science, machine learning and AI to general audiences by comparing these "complex" things to the analogous normal cognitive abilities of pattern detection, pattern recognition and evidence-based decision-making.
My audiences are amazed that it is really that simple for them to reach the first level of understanding of what otherwise appears to be unreachable, complex topics. If those steps take place, then people are motivated to learn more. If that does not work, then I try to motivate them into reading and viewing targeted content on these topics within the context of things that they personally care about. That could be health, finance, online shopping, sports, entertainment, recreation, travel, science, or anything. For example, when I taught a graduate course on data science at George Mason University, I had a unit on geospatial databases and spatial analytics. As part of the material, I covered geographic information systems (GIS). GIS can be a highly technical topic for those unfamiliar with it, so I asked the students to complete a simple exercise: open a web browser, and search for "GIS geospatial" plus anything else that interests them (within the domain of science and technical topics, preferably), then report back what they found. I taught that course every year for over 10 years -- each year, my students and I were always surprised and entertained by what we discovered.
How can we use data literacy as responsible citizens?
I taught a course on data ethics at GMU. I could just have easily renamed the course "Data Literacy". In my instruction, I included excerpts from three books How to Lie with Statistics, How to Lie with Maps, and Visual and Statistical Thinking. The idea behind my choice of these books was to demonstrate how we can either intentionally or accidentally be a producer of biased data results and the consumer of biased results.
I used good and bad examples of charts, graphs and statistical results to demonstrate to my students how they should start to think about these things as responsible citizens. Responsible citizenship these days hinges on having some data, statistical and information literacy, in order to fight bias, misinterpretations and misleading hypotheses from uses of data in public discourse.
The famous author H.G. Wells said it best more than 100 years ago: "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." That statement would now include data literacy and analytic thinking. One of my most amusing exercises in that data ethics course was the first-day class activity. I asked the students to give me their reaction to a statement that I heard in the news more than 20 years ago from a famous politician, who said: "I am shocked that half the students in this country score below average on their standardized tests." This class exercise led to some very interesting conversations about statistics and averages (means, medians, and modes) in different types of data distributions.
I must admit that whenever a new student approached me (in my role as undergraduate advisor for the data science degree program), asking me whether they should take my data ethics class or the general university ethics class, I told them that the general ethics class was fine but in my class I would teach them how to lie (humor intended!). They were hooked, they signed up for my class every time! Specifically, I taught my students the various ways in which people and organizations can and do lie with statistics, whether intentionally or inadvertently. I explained to my students that I do this for three reasons:
- Help students to recognize statistical biases and fallacies in the world.
- Show them how to address these issues when they encounter them.
- Demonstrate how to avoid similar problems in their own data-related activities.
These exercises blended statistical literacy with data literacy because there are common biases in applications of data for statistics, data science, machine learning and AI.
How does data literacy affect the success of an organization?
Data literacy is an essential component of the larger concept of Data democratization. Data democratization affects the success of organizations in at least five dimensions:
Data awareness – employees grow in their awareness of the ubiquity and the types of data that the organization uses (or can use).
Data relevance – employees begin to see the connection between the data and their own role in the business.
Data literacy – employees learn to read, work with, analyze and argue with role-appropriate data sources.
Data science – most (if not all) employees then learn how to gain insights and infer understanding from data (pattern discovery, pattern recognition, pattern exploration and pattern exploitation).
Data imperative – employees ultimately realize that the inability to use and analyze data is crippling businesses (and potentially their own career longevity).
Do you think that businesses are truly understanding the importance and are they offering data literacy training opportunities to their employees?
Many organizations are now at that stage, but many more are not. Fortunately, such programs are springing up everywhere. Those who are not yet on board need to see all of the benefits that can come from having a data-literate workforce. I have a direct personal experience with this. Several years ago, I was invited by a small company (under 100 employees) to present a two-day training course on data science, which actually covered the five dimensions of data democratization that I just described. What was impressive about this event was that the owners of the company required every one of their employees to attend, not only the technical and business staff. One of the most excited attendees was the front office receptionist who loved all the new things she was learning. Those business owners truly understood the importance of this data literacy training opportunity for their employees and for their business. The validation came a couple of years later when they successfully sold their company to a larger corporation.
What cultural changes need to happen in society to address data literacy?
First, society needs to realize that data has value. What I mean is that data is often presented as something invasive, destructive or far too complicated for the common person. Second, there needs to be more positive examples, for example data hackathons for social good, business analytics examples, and examples in the palm of our hand (our smartphones). Data should be advocated in education, the news, business communications and normal conversation. Third, there needs to be discussion about how businesses are creating jobs, markets, opportunities and new benefits to society with data. Fourth, the education system must introduce numbers, stats, data, pattern detection and scientific hypothesis generation from evidence much more intensely, deliberately and creatively in all courses and curricula (at an age-appropriate level, of course) because the world is digital, and it will only become more digital.
Parting thought
I offer this reflection. Data permeates our daily lives through all conceivable digital technologies, handheld devices, business activities and personal activities. Through data, the world is computable. The focus of data literacy should not be on the mathematics, the algorithms or the engineering. Instead, the focus should be on demonstrating that data science and analytics are universally appealing, data literacy is accessible and data fluency is achievable for all. The democratization of data assets and data literacy is essential for all organizations. Teams of data-literate professionals have the power to understand numerous, diverse data sources, to understand what the data is telling them and to drive new outcomes, successes, and value for any organization. Data literacy is not a math skill – it is a life skill.
Recommended reading
- Unemployment fraud meets analytics: Battle lines are clearly drawnMany fraudsters seized opportunities presented by the COVID-19 pandemic. During the crisis, unemployment fraud became a battleground between international criminal networks and government agencies. Learn how analytics can save billions – and deliver benefits to those truly in need.
- Will health care be fundamentally changed post-COVID-19?Many market forces are supporting the transformation to virtual digital health programs -- the COVID-19 pandemic is just the latest compelling event.
- The Humanity in Artificial IntelligenceCould artificial intelligence be the change agent we need to solve many problems around the globe? Read how AI could accelerate our ability to have a a positive, lasting impact.
Ready to subscribe to Insights now?