Aug 16, 2023

Technical facts reference library

This is a collection of statistics that ercule maintains on behalf of its own SEO and technical writing content. It focuses mainly on the following areas:

  • Data and data management
  • Compliance
  • Digital transformation
  • IT security
  • Cloud platform adoption, usage, and growth
  • Artificial Intelligence (AI)
  • Software Development Lifecycle (SDLC)
  • Open-source software
  • Software quality

This list isn’t meant to be comprehensive. It’s merely a selection of the most relevant statistics for our clients, whose businesses are spread across the categories listed above.

How to use this resource

You may peruse or use your browser’s search function to find a relevant fact to cite in your technical content or technical marketing materials.

Every report is linked to the original source of the statistic instead of an intermediary page. The Report metadata section at the bottom contains information about the reports themselves, including:

  • The methodology used to collect or calculate data
  • How often the report is updated
  • Any other pertinent facts that might be relevant. For example, whether a report breaks out their answers further by industry, region, etc.

We endeavor to keep these statistics up to date. If you notice a discrepancy or have a link to a later version of a cited report, please let us know.

Guidelines for using statistics to engage readers and build trust

  • Use a couple of key statistics relevant to your article topic in the introduction and throughout the article to keep readers engaged. Cite relevant statistics related to to problem your product solves. For example, if your product is data-related, cite statistics around the growing volume of data, data quality issues, compliance penalty costs, etc.
  • Reference authority links, i.e., the original source of the statistic. Referencing authority links is a tried-and-true SEO tactic that boosts search engine placement. But it also builds trust with readers, as they can see where you obtained your information from.
  • Don’t cite statistics without references. In particular, don’t cite Web sites that publish statistics without links to the original sources.
    • In particular, avoid citing Statista as a “source.” Statista is a statistics collation cite and its information is based on other sources, which it hides from non-subscribers. Only reference one of their sources if you have a Statista subscription and can see the underlying source, or can find the primary source elsewhere.
  • Always check the sample size and regionality of the data to ensure the information is relevant to your target audience. How many people were surveyed to produce a statistic? Were they all sampled from a single geographic region? Do the numbers look different if you break them down by region instead?

Data and data management

Volume of data

IDC data report (November 2018)

  • Amount of data by 2025: 175 zetabytes

IDC - Global Data Report 2022 (March 2022)

  • Global datasphere to double between 2022 and 2026

Matillion - Matillion and IDG Survey: Data Growth is Real, and 3 Other Key Findings (January 2022)

  • Data volumes growing an average 63 percent per month or more.
  • Companies drawing from an average of 400 or more sources, with 20 percent or more drawing from 1,000 or more sources.

Acumen Research and Consulting - Big Data Market Size Set to Achieve USD 473.6 Billion by 2030 growing at 12.7% CAGR - Exclusive Report by Acumen Research and Consulting (December 2022)

  • One trend driving the growth of big data is the rapid growth of social media.

WaveStone - DATA AND ANALYTICS LEADERSHIP ANNUAL EXECUTIVE SURVEY 2023 (2023)

  • 87.8% of orgs surveyed reported increasing their investment in data in 2022
  • However, only 23.9% of companies consider themselves “data-driven”; only 20.6% say they have created a data culture in their organization.

Cost of data

Our World in Data (via computer scientist John McCallum) (2022)

In 1985, storing 1TB of data would have cost you $31.39M. In 2022, it cost an average of $14.30 to store 1TB on magnetic media ($49.50 for solid-state storage).

Data maintenance

Enterprise Strategy Group - 2022 State of Data Governance and Empowerment (July, 2022)

  • 42% of respondents indicated at least half of their data is dark data.

Veritas - Dark data (August 2023) [NOTE: not well-sourced]

  • Over 50% of a company’s data is “dark” - i.e., not used and not maintained.

Matillion - Matillion and IDG Survey: Data Growth is Real, and 3 Other Key Findings (January 2022)

  • Over 90% of IT leaders surveyed said it was challenging transforming data for analytics.

Data governance and compliance

Data governance management and investment

WaveStone - DATA AND ANALYTICS LEADERSHIP ANNUAL EXECUTIVE SURVEY 2023 (2023)

  • 82.6% of companies have a Chief Data Officer - explosive growth from 2012, when only 12% did.

Gartner - Gartner Identifies Top Five Trends in Privacy Through 2024 (May 31st, 2022)

  • By 2024, 75% of world’s population will have its personal data covered under privacy regulations
  • Up from 10% in 2020 and an estimated 65% by EoY 2023

Compliance rates

Flexera - State of the Cloud Report 2023 (2023)

  • Around 70% of all businesses say that compliance and governance are top concerns they have with the cloud. 71% of Small and Medium Businesses (SMBs) say that Compliance is a concern with cloud systems.

Fortinet 2023 Cloud Security Report

  • 30% of respondents say legal & regulatory compliance are holding back their plans for software cloud usage.

DataOps

Enterprise Strategy Group - 2022 State of Data Governance and Empowerment (July, 2022)

  • 90% of organizations asked believe that DataOps is improving their data quality

Value of data

Enterprise Strategy Group - 2022 State of Data Governance and Empowerment (July, 2022)

  • 46% of respondents said identifying the quality of source data is a major impediment in effectively using it.

Gartner - How to Improve Your Data Quality (July 2021)

  • Poor quality data (bad data) costs organizations $12.9M annually

Costs of data governance and compliance

Investopedia - compliance costs

  • It costs a company $5.5 million to achieve compliance
  • The cost of non-compliance averages to $15 million.
  • It costs an average of $5.5 million to get compliant but an average of $15 million for noncompliance - a savings of $9.5 million in the long run.

VMWare - The Escalation: From Heist to Hijack, From Dwell to Destruction (2021)

  • 130 financial sector CISOs were set to expand their spending on compliance by 20 to 30% between 2021 and 2022.

Regulatory fines for mismanaged data

Meta GDPR fine (May 22nd, 2023)

  • Meta was fined 1.2 billion Euros by the Irish Data Protection Commission for transferring EU data to the US for storage and processing. Also ordered to stop processing all such data within six months.

Amazon GDPR fine (July 30th, 2021)

  • Amazon was fined 746 million Euros by the Luxembourg National Commission for Data Protection (CNDP). The ruling came after 10,000 people said that Amazon did not obtain proper consent for the processing of certain data in the EU.

Danske Bank (April 4th, 2022)

  • The Danish Data Supervisory Authority fined the bank 1.3 million Euros (DKK 10 million) for not deleting consumer personal data after it no longer had a legitimate business reason to process it.

Data breaches

Capital One Data Breach - largest ever? Insider data breach (July 20th, 2019)

  • Paige Thompson of Seattle broke into a Capital One server and stole 140,000 US social security numbers, 1 million Canadian Social Insurance numbers, and 800,000 bank account numbers. She was a former employee of Amazon Web Services (AWS) who used her insider knowledge to exploit a misconfigured firewall. This incident show the risk that insiders can pose to a company, even after termination.

IBM - Cost of a data breach in 2023 (2023)

  • The global cost of a data breach in 2023 was $4.45 million.
  • 82% of data breaches (in 2022?) involved data stored in the cloud.

Ermetic - Ermetic Reports Nearly 100% of Companies Experienced a Cloud Data Breach in Past 18 Months (June 2021)

  • “…98% of the companies surveyed had experienced at least one cloud data breach in the past 18 months compared to 79% last year. Meanwhile, 67% reported three or more such breaches, and 63% said they had sensitive data exposed.”

IT

Digital transformation projects

McKinsey - digital transformation failures (April 11th, 2023)

  • Only 30 percent of banks successfully implemented their digital transformation projects.
  • 70 percent of all digital transformation projects went over budget.

McKinsey - Digital transformation stats (December 7th, 2021)

  • 70 percent of all digital transformation projects fail.
  • BUT 70 percent of digital transformation projects SUCCEED when people feel a sense of ownership over the process

MuleSoft - 2023 Connectivity Benchmark Report (2023)

  • 80% of IT leaders say integration issues hinder their digital transformation projects.

Finding information

Coveo - Workplace Relevance Report 2023 (2023)

  • Employees spend 3.6 hours/day looking for information. IT employees spend 4.2 hours.
  • 89.6% of workers say they have to search 1 to 6 separate sources to find information. For 52% of tech/IT workers, it’s between 4 to 6.
  • 44% of workers say what slows them down the most is that information is stored across multiple applications. 31% say that outdated company Intranet information slows them down the most.
  • 45% of respondents say information they find internally is irrelevant.

Artificial intelligence

CapGemini - AI’s impact on knowledge workers (July 6th, 2023)

  • 74% of executives believe the benefits of AI will outweigh the associated concerns
  • 70% think AI will bolster productivity for knowledge workers
  • 71% of executives think AI will make customer experience more active and engaging

McKinsey - Generative AI’s impact to the economy (June 21st, 2023)

  • Generative AI could add up to $4.4 trillion dollars to the global economy
  • Generative AI could boost worker productivity between 0.1 and 0.6 percent through 2040
  • Generative AI could automate activities that consume 60 to 70 percent of worker’s time today

Capital One - Straight-Through Receivables Reconciliations: AI and Machine Learning Boost Efficiency and Working Capital (2018)

  • Capital One uses predictive analytics to catch potential regulatory infringements before they happen - a use of AI combined with active data governance.

IBM - Cost savings from using AI to improve security (2023)

  • Organizations that employ security AI and related automation can save an additional $1.76 million over those that don’t.

Redhat - The State of Enterprise Open Source 2022 (February 22nd, 2022)

  • 71% of IT leaders say they are leveraging Machine Learning (ML) and Artificial Intelligence (AI) technologies.

MSNBC - ChatGPT and generative AI are booming, but the costs can be extraordinary (March 13th, 2023)

  • ChatGPT-3 reportedly cost an estimated $4 million to train

Business Insider - ChatGPT could cost over $700,000 per day to operate. Microsoft is reportedly trying to make it cheaper. (April 20th, 2023)

  • An analyst told Business Insider’s The Information that CHatGPT may cost up to $700,000 a day to run.

Stack Overflow - Developer Sentiment Around AI/ML (2023)

  • 77% of developers feel favorably toward AI tools.
  • 42% of developers trust the output from AI tools.
  • Majority see the greatest benefit of AI tools in the development process as increasing productivity.

IT Security

Consortium for Information & Software Quality - The Cost of Poor Software Quality in the US: a 2022 Report (2022)

  • Cybercrime losses rose 64% between 2020 and 2021.

Forbes - One Stolen Password Took Down The Colonial Pipeline — Is Your Business Next? (September 14th, 2021)

  • A hacker with credentials held Colonial Pipeline ransom for $2 million using stolen credentials and a VPN connection. The case shows the importance of using Multi-Factor Authentication (MFA).

Sophos - The Active Adversary Playbook 2021 (May 18th, 2021)

  • Remote Desktop Protocol (RDP) was used in about 30% of all successful attacks. In 41% of cases, it was used mostly for internal, lateral movement around the network.
  • RDP used in Equinix ransomware breach.

Fortinet 2023 Cloud Security Report

  • 59% of respondents say the biggest threat to cloud security is misconfiguration of the cloud platform or improper setup. The next biggest threats are insecure APIs and exfiltration of sensitive data (both 51%) and unauthorized access (49%).
  • 60% of organizations surveyed planned to increase their budgets for cloud security.

Internal security threats

Netskope - Hey You Get Out of My Cloud (July 2021)

  • Statistics show a spike in data downloads from 1/3rd of departing employees - a sign the employees were taking data with them as they left.
  • 97% of cloud apps used - primarily sharing and collaboration apps - are unmonitored Shadow IT.

Cloud Platforms

Flexera - State of the Cloud Report 2023 (2023)

  • 87% of organizations are embracing a multi-cloud strategy.
  • The most used public cloud service is the data warehouse (51%).
  • 82% of all organizations surveyed said their top challenge was cloud spend. The 2nd challenge was security (79%) followed by Lack of resources/expertise (78%). Only 47% of SMBs (Small to Medium Businesses) said expertise was a problem; 71% said it was compliance.

CloudZero - State of Cloud Cost Intelligence (2022)

  • Half of organizations say their cloud spend is too high - but only 3 out of 10 know what they’re actually spending money on.
  • 41% of respondents say cloud costs have disrupted work by one week or more; 11% say it’s disrupted an entire sprint.

Fortinet 2023 Cloud Security Report

  • 58% of respondents plan to run over 50% of their workloads in the cloud in 2023 and beyond.
  • 69% of respondents are multi-cloud - i.e., they use two or more cloud providers.
  • 53% of respondents say the greatest benefit of the cloud is more flexible capacity/scalability.

Software development

Software marketplace

Fortune Business Insights - Software as a Service (SaaS) market size (June 2023)

“The global Software as a Service (SaaS) market is projected to grow from $273.55 billion in 2023 to $908.21 billion by 2030, at a CAGR of 18.7%.”

IEEE Spectrum - The Top Programming Languages 2023 (August 29th, 2023)

Python remained #1 but in terms of jobs SQL is in greatest demand.

Untitled png

Languages and frameworks

Stack Overflow Developer Survey 2023

  • JavaScript, HTML/CSS, Python, SQL, TypeScript top languages
  • However, top-paying languages are Zig, Erlang, F#, Ruby, and Clojure
  • Docker the top-used “other” tool

Low-code and no-code software development

Gartner - Gartner Forecasts Worldwide Low-Code Development Technologies Market to Grow 20% in 2023 (December 13th, 2022)

  • The total low code development market will increase to $26.9 billion in 2023, a 19.6% increase from 2022, predicts Gartner.
  • Investment in hyperautomation to increase to $720 billion.

Software quality

Consortium for Information & Software Quality - The Cost of Poor Software Quality in the US: a 2022 Report (2022)

  • Accumulated software Technical Debt (TD) is now $1.2 trillion.
  • Software developers on average spend 33% of their time every week addressing Technical Debt.

Rand Group - How much does 1 hour of downtime cost the average business?

  • One hour of downtime can cost anywhere from $100,000 to between $1 million and $5 million.

Software Development Lifecycle (SDLC)

NIST - Cost of fixing bugs in production (January 5th, 2023) (NOT ORIGINAL REPORT)

  • It can be 30x to 100x more expensive to fix a bug in deployment/maintenance phase of the SDLC than before

IBM System Science Institute - Relative cost of fixing bugs in maintenance (January 2010) (NOT ORIGINAL REPORT)

  • It can be 100x more expensive to fix a bug in maintenance than earlier in the SDLC

Software security

Argon - software supply chain attacks in 2021 (2021) (NOTE - find original report)

  • Software supply chain attacks increased by 300% in 2021

Consortium for Information & Software Quality - The Cost of Poor Software Quality in the US: a 2022 Report (2022)

  • Between 2020 and 2021, open parts of the supply software chain saw a 600% increase in attacks.

Open source software

Consortium for Information & Software Quality - The Cost of Poor Software Quality in the US: a 2022 Report (2022)

  • In 2021, the number of organizations using open-source software rose 77%.

Redhat - The State of Enterprise Open Source 2022 (February 22nd, 2022)

  • 89% of IT leaders see open source software as secure as or more secure than enterprise software.
  • Enterprise open-source software is expected to grow from 29% to 34% in two years.
  • 36% of IT leaders say that concerns over support of open-source software limits its use.
  • 32% of industry leaders say that the benefits of open-source software include both better security and higher-quality software.

Containerization/Kubernetes

Redhat - The State of Enterprise Open Source 2022 (February 22nd, 2022)

  • 68% of IT leaders say they are leveraging containers and containerization technology
  • 70% of IT leaders say they work in an organization that uses Kubernetes.
  • 43% of IT leaders say that they lack the necessary skills to adopt containers.
  • 39% of IT leaders say they don’t have the necessary staff to adopt container technology.

Report metadata

IDC Data Report - 2018

How often updated: There appears to be a new 2022 report that supersedes the one above. Access costs $4,500. You may be able to get some of these statistics using a Statista subscription. Many sites quoting the zetabytes prediction figure appear to still reference the 2018 statistic.

Flexera - State of the Cloud Report 2023 (2023)

Who they asked: 750 cloud decision-makers

Redhat - The State of Enterprise Open Source 2022 (February 22nd, 2022)

Who they asked: 1,296 interviews with IT leaders, most in Europe and the United States.

Other attributes of the report: Answers are broken out by region of the world so you can see differences between regions.

Consortium for Information & Software Quality - The Cost of Poor Software Quality in the US: a 2022 Report (2022)

Who they asked: Methodology broken down per question within the report, including how they calculated technical debt costs.

How often updated: Every two years. Previous reports were in 2018 and 2020.

Mattilion and IDG - Optimizing Business Analytics by Transforming Data in the Cloud (January 2022)

Who they asked: “The survey polled more than 200 IT, data science, and data engineering professionals at North American organizations with at least 1,000 employees. Respondents work across several industries, including technology, finance, retail, and healthcare.”

Stack Overflow Developer Survey

How often updated: Yearly

Who they asked: 90,000 developers who use StackOverflow

CloudZero - State of Cloud Cost Intelligence (2022)

Who they asked: 1,000 engineering and finance professionals

How often updated: Yearly?

Netskope - Hey You Get Out of My Cloud (July 2021)

No clear sourcing information found.

Fortinet 2023 Cloud Security Report

How often updated: Yearly

Who they asked: 782 cybersecurity professionals

Other resources

G2 - Big Data Statistics

NOTE: Facts in this resource are not aligned well with their sources. In addition, some sources are dated. Mine for data as we have here but use cautiously.

DLA Piper Data Protection Laws of the World

Not an official resource but a useful world overview with a visual map and a ranking of privacy law strictness.

Official resources for key standards and regulations

General Data Protection Regulation (GDPR) - https://gdpr.eu/

California Consumer Privacy Act (CCPA) - https://oag.ca.gov/privacy/ccpa

PCI Payment Security Standards - https://www.pcisecuritystandards.org/ PCI DSS overview: https://listings.pcisecuritystandards.org/documents/PCI_DSS-QRG-v3_2_1.pdf

Health Insurance Portability & Accountability Act (HIPAA)

Home page: https://www.hhs.gov/hipaa/index.html

Technical specifications

JavaScript Object Notation (JSON) Data Interchange Format

YAML (YAML Ain’t Markup Language)

OpenAPI Initiative (Related: Swagger toolset)

HTTP/2 Specification (RFC 7540)

QUIC and HTTP/3 (QUIC Working Group)

We’re *actually* here to help

We’re marketers who love spreadsheets, algorithms, code, and data. And we love helping other marketers with interesting challenges. Tackling the hard stuff together is what we like to do.

We don’t just show you the way—we’re in this with you too.

Background image of a red ball in a hole.