Data Analytics Terminology, Part 2

Introduction

This is the second in a series describing data analytics terminologies. Data analytics is important to understand as without a firm understanding of data and data analytics, there can be no firm understanding of Artificial Intelligence and how best to maximise its value.

This topic can be somewhat ‘dry,’ so to introduce a lighter element to this article, why not test your knowledge of how AI affects life now and its possible impacts in the near future?

The BBC website recently ran 8 multiple choice questions on AI and there’s some very interesting answers!

Much of this two-part article is based on an excellent blog post by Pratibha Kumari Jha, Director of Digital Strategy and Analytics at DataThick. The original article can be accessed here: https://www.linkedin.com/pulse/data-analytics-terminologies-pratibha-kumari-jha/

Qualitative Data

Qualitative Data: Qualitative data refers to non-numerical data that is descriptive in nature. It captures qualities, characteristics, opinions, and subjective information. This type of data is often obtained through methods such as interviews, observations, surveys with open-ended questions, and focus groups.

Qualitative data can be in the form of text, images, audio recordings, or video. It is typically analysed using techniques like thematic analysis, content analysis, or narrative analysis to identify patterns, themes, and underlying meanings.

Subgroups of Qualitative Data

Within qualitative data, there are several subgroups or types that help categorise and analyse the data. Some common subgroups of qualitative data include:

Interviews: Data obtained through one-on-one or group interviews, where participants provide detailed responses to open-ended questions. This can include structured, semi-structured, or unstructured interviews.
Observations: Data collected through direct observations of individuals, groups, or events. It involves systematically recording behaviours, interactions, and contextual information.
Focus Groups: Data gathered from a small group of individuals who participate in a guided discussion or conversation on a specific topic. The interactions and discussions among group members generate qualitative data.
Textual Data: Data in the form of written or printed text, such as documents, articles, books, transcripts, emails, social media posts, or any other textual sources.
Visual Data: Data in the form of visual content, including photographs, drawings, videos, or any other visual representations.
Audio Data: Data captured through audio recordings, such as interviews, conversations, or recordings of events or meetings.
Case Studies: In-depth investigations of specific individuals, organisations, or events, often involving multiple sources of qualitative data, such as interviews, documents, and observations.
Diaries or Journals: Data obtained from personal diaries or journals, where individuals record their thoughts, experiences, or reflections over a period of time.
Artefacts: Data derived from physical objects or artifacts, such as artwork, historical artifacts, or any tangible items that hold meaning and relevance to the research topic.

Qualitative Data Summary

These subgroups represent different sources and formats of qualitative data and may require specific methods and techniques for analysis. Researchers often employ multiple subgroups to gather a rich and comprehensive understanding of a phenomenon or research question.

Quantitative Data

Quantitative Data: Quantitative data refers to numerical data that can be measured and analysed using statistical techniques. It involves collecting data through structured methods such as surveys with closed-ended questions, experiments, or measurements.

Quantitative data provides information about quantities, amounts, frequencies, or statistical relationships. It can be analysed using various statistical methods, including descriptive statistics (mean, median, standard deviation), inferential statistics (t-tests, regression analysis), and data visualisation techniques (bar charts, scatter plots, histograms).

Quantitative data allows for statistical comparisons, generalisations, and predictions based on numerical patterns and relationships.

Subgroups of Quantitative Data

Quantitative Data refers to numerical data that can be measured and analysed using statistical methods. However, there are no subgroups within quantitative data itself. Instead, quantitative data can be categorised based on the type of variables being measured. The common types of variables in quantitative research are:

Continuous Variables: These variables can take any numerical value within a specific range. Examples include height, weight, temperature, or time.
Discrete Variables: These variables can only take specific numerical values. Examples include the number of children in a family, the number of cars in a parking lot, or the number of correct answers on a test.
Nominal Variables: These variables represent categories or groups with no inherent numerical value. Examples include gender, ethnicity, marital status, or political affiliation. Nominal variables can be represented using numerical codes, but the numbers themselves do not have any mathematical meaning.
Ordinal Variables: These variables represent categories or groups that have a natural order or ranking. Examples include rating scales (e.g. Likert scales), educational levels (e.g. primary, high school, college), or income brackets (e.g. low, medium, high).

Quantitative Data Summary

These categorisations help determine the appropriate statistical analyses and methods to apply to the quantitative data.

Data Levels of Measurement in Data Analytics

In data analytics, data is often classified into different levels of measurement, which determine the types of statistical analyses and operations that can be performed on the data. The four commonly recognised levels of measurement are:

Nominal Level: Data at the nominal level consists of categories or labels with no inherent order or numerical meaning. Examples include gender (male, female), marital status (single, married, divorced), or eye colour (blue, green, brown). Nominal data can only be categorised and compared based on equality or inequality. Statistical measures used with nominal data include frequency counts and mode.
Ordinal Level: Data at the ordinal level represents categories or labels that have a natural order or ranking. However, the intervals between the datas categories may not be equal or meaningful. Examples include rating scales (e.g. Likert scales), educational levels (e.g. primary, high school, college), or satisfaction levels (e.g. very dissatisfied, neutral, very satisfied). With ordinal data, you can compare the relative rank or order of the categories. Statistical measures used with ordinal data include median, mode, and rank correlation.
Interval Level: Data at the interval level have ordered categories with equal intervals between them. However, there is no meaningful zero point. Examples include temperature measured in Celsius or Fahrenheit. With interval data, you can perform addition and subtraction operations, calculate measures like mean and standard deviation, and compute the differences between values. Statistical measures used with interval data include mean, standard deviation, and correlation.
Ratio Level: Data at the ratio level have ordered categories with equal intervals between them, and they possess a meaningful zero point. Examples include height, weight, time, or income. Ratio data allow for all mathematical operations, including multiplication and division. Statistical measures used with ratio data include mean, standard deviation, correlation, and ratio comparisons.

Level of Measurement Summary

The level of measurement determines the appropriate statistical techniques and visualisations that can be applied to analyse and interpret the data effectively.

Source: Pratibha Kumari Jha, Director of Digital Strategy and Analytics at DataThick. The original article can be accessed here.

Summary

This article focused on the two main types of data: qualitative and quantitative and the levels of measurement. A famous author summarised the importance of data and its measurement, more than a century before AI had become such an important influence and driver of business and modern society:

“Data! Data Data! I can’t make bricks without clay!”

Arthur Conan Doyle, Author and Physician.

About the Author

Stephen John Leonard is the founder of ADEPT Decisions and has held a wide range of roles in the banking and risk industry since 1985.

About ADEPT Decisions

We disrupt the status quo in the lending industry by providing lenders with customer decisioning, credit risk consulting and advanced analytics to level the playing field, promote financial inclusion and support a new generation of financial products.