Data Analytics Terminology, Part 1

Introduction

Just a few years ago “blockchain” and “Bitcoin” were two words on many peoples lips. This year, with the rise of ChatGPT and others, the new phrase is “Artificial Intelligence” (“AI”).

A good friend of mine travels to the USA twice a year on business and contrasted the difference between the last trip in Q4 of 2022 and the most recent trip in Q2 of 2023. Previously CEO’s would talk generally about the importance of AI and how they “needed to do something about it” towards the end of each meeting.

On the most recent trip “AI” was always the first topic of conversation, as CEO’s always have one eye on the share price and stockholders questions. Vendors have not been slow to catch on to this ‘AI buzz’ and so we now see a proliferation in the use of the term.

AI is here to stay and the impact on everything that we know will be profound. However, the purpose of this article is to look at the building blocks behind it and examine what the myriad of terms mean. Without a firm understanding of data and data analytics, there can be no firm understanding of Artificial Intelligence and how best to maximise its value.

Much of this two-part article is based on an excellent blog post by Pratibha Kumari Jha, Director of Digital Strategy and Analytics at DataThick.

What is Data Analytics?

Data Analytics refers to the process of extracting, transforming, and analysing raw data to uncover valuable insights, patterns, and trends that can inform decision making and drive business outcomes. It involves utilising various techniques, tools, and methodologies to make sense of data and derive meaningful information from it. Data analytics encompasses several stages, including data collection, data cleaning and preparation, data analysis, and data visualisation. (This process is often also referred to as ETL – extract, transform and load).

Data analytics involves both quantitative and qualitative approaches, depending on the nature of the data and the research objectives. It leverages statistical analysis, machine learning, data mining, and other computational methods to explore and interpret patterns within the data. The goal of data analytics is to extract actionable insights, discover hidden relationships, make predictions, and support evidence-based decision making.

Businesses use data analytics to gain a competitive edge, optimise operations, improve customer experience, and drive innovation. It is applied in areas such as business intelligence, marketing analytics, financial analysis, risk management, supply chain optimisation, healthcare analytics, and many other domains. Data analytics plays a vital role in enabling data-driven decision making and uncovering valuable insights that can lead to improved business performance and outcomes.

10 Data Analytics Terms

Data Analytics: The process of examining raw data to extract meaningful insights and draw conclusions.
Big Data: Refers to extremely large and complex datasets that cannot be easily managed or analysed using traditional data processing techniques.
Data Mining: The practice of exploring large datasets to discover patterns, relationships, and useful information.
Descriptive Analytics: Analysing historical data to gain insights into what has happened in the past.
Predictive Analytics: Using historical data and statistical techniques to make predictions about future events or outcomes.
Prescriptive Analytics: Utilising advanced analytics techniques to recommend or prescribe actions that can optimise future outcomes.
Machine Learning: A subset of artificial intelligence (AI) that involves training computer systems to learn and make predictions or decisions without being explicitly programmed.
Artificial Intelligence (AI): The field of computer science that focuses on creating intelligent machines capable of performing tasks that typically require human intelligence.
Data Visualisation: The representation of data in a visual format, such as charts, graphs, and maps, to facilitate understanding and communication of insights.
Business Intelligence (BI): A set of tools, technologies, and practices used to collect, integrate, analyse, and present business information to support decision-making processes.

Data Analytics Additional Terms

Data Cleansing: The process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in datasets.
Data Warehouse: A centralised repository that stores structured and organised data from various sources, enabling efficient querying, analysis, and reporting.
ETL (Extract, Transform, Load): The process of extracting data from different sources, transforming it into a consistent format, and loading it into a target destination, such as a data warehouse.
Data Governance: The overall management and control of data assets within an organisation, including policies, procedures, and responsibilities for data quality, security, and privacy.
Key Performance Indicators (KPIs): Quantifiable measures used to assess the performance or success of an organisation, department, or individual against specific objectives.
Data-driven Decision Making: Making informed decisions based on analysis and interpretation of data rather than relying solely on intuition or personal judgment.
Exploratory Data Analysis (EDA): An approach to analysing data that focuses on understanding its main characteristics, patterns, and relationships through visualisations and summary statistics.
Regression Analysis: A statistical technique used to identify and model the relationship between a dependent variable and one or more independent variables.
Clustering: The process of grouping similar data points together based on their characteristics or proximity in order to discover patterns or segments within the data.
Natural Language Processing (NLP): A branch of AI that focuses on the interaction between computers and human language, enabling machines to understand, interpret, and generate human language text or speech.
Observation: An individual data point or record collected during a study or analysis. It represents a specific instance or measurement within a dataset.
Data Sampling: The process of selecting a subset of data from a larger population or dataset for analysis. Sampling is often done to reduce the computational complexity or cost of analysing the entire dataset while still maintaining statistical represendtativeness.
Data Set: A collection of related data points or observations. It refers to the entire set of data that is used for analysis or modelling, including all variables and records.
Prediction: The process of using historical data and statistical or machine learning techniques to make an estimate or forecast about future events or outcomes. Predictions are based on patterns, trends, and relationships discovered in the data.
Structured Data: Data that is organised and formatted in a consistent and predefined manner. It is typically stored in databases or structured file formats, such as tables with clearly defined rows and columns. Structured data is highly organised and easily searchable, making it suitable for traditional relational databases and analysis using SQL queries.
Unstructured Data: Data that does not have a predefined structure or organisation. It includes text documents, emails, social media posts, images, videos, audio recordings, and other forms of data that lack a standardised format. Unstructured data is typically large in volume and requires advanced techniques, such as natural language processing (NLP) or image recognition, for analysis and extraction of insights.
Semi-Structured Data: Data that has some structure but does not fit neatly into a traditional structured format. It contains elements of both structured and unstructured data. Semi-structured data may have tags, labels, or markers that provide some organisation, but the overall format may vary. Examples of semi-structured data include XML files, JSON files, log files, and web pages with HTML tags. Analysing semi-structured data often involves a combination of techniques, such as parsing and extracting relevant information using regular expressions or specialized tools.

Source: Pratibha Kumari Jha, Director of Digital Strategy and Analytics at DataThick.

Summary

The second part of this article will be published later this month and will focus on Qualitative and Quantitative Data and their levels of measurement.

About the Author

Stephen John Leonard is the founder of ADEPT Decisions and has held a wide range of roles in the banking and risk industry since 1985.

About ADEPT Decisions

We disrupt the status quo in the lending industry by providing lenders with customer decisioning, credit risk consulting and advanced analytics to level the playing field, promote financial inclusion and support a new generation of financial products.