Concentric Report Shows Huge Increase in Oversharing Sensitive Documents

Concentric Inc., a vendor of intelligent AI-based solutions for protecting business-critical data, announced the availability of its Q1 2021 quarterly Data Risk Report, which highlights a continued rise in oversharing business-critical and sensitive documents.

The report revealed the number of overshared files rose 450 percent compared to the same quarter in 2020, highlighting the significant impact of the pandemic and remote work on data security.

Using its Semantic Intelligence solution, Concentric captured user data in production deployments from companies in the technology, financial and health care sectors to reveal how organizations create, use, and manage data.

The company scanned more than 110 million unstructured data files to discover business-critical and sensitive documents that are overshared via link sharing, inappropriate external sharing, internal permission misconfigurations and incomplete/incorrect document classifications. Oversharing increases the risk an organization will lose data, violate compliance or privacy mandates, or experience cybercrime.

Statistics highlighted in the report reveal that organizations average 439,000 files at-risk due to oversharing. That translates to 210 at-risk files per employee (up significantly from 38 files per employee in Q1 2020, a 452-percent increase). Link-based risky sharing is up to 65,000 documents per enterprise, up from 56,000 in Q4 2020. (The company started tracking link sharing risks in Q3 2020.)

Concentric is the first company to identify and quantify risk in structured and unstructured data using deep learning. Its solution autonomously provides an accurate and detailed semantic understanding of the millions of contracts, financial documents, payroll, M&A plans, product roadmaps and source code files used by organizations every day. Like previous Data Risk Reports, this new study analyzed production data and reflects actual user practices and real-world data risk exposures. Additional statistics in the Q1 2021 report include:

  • Nearly 35 percent of unstructured data is business-critical – that’s 3.1 million files in an average organization. Of those business-critical files, 14 percent can be seen by internal or external users who should not have access.
  • 229,000 business-critical files were classified erroneously and inappropriately accessible by other employees. To illustrate, nearly 23 percent of all unstructured data contained personally identifiable information (PII) and were not marked appropriately.
  • More than 33 percent of files processed were duplicates (15 percent) or near-duplicates (20 percent). Maintaining multiple variant copies of sensitive information (often with insecure file permissions, prohibited locations or improper file classifications) can create legal and regulatory risks, as well as significant unnecessary storage costs.
  • 85 percent of at-risk files were overshared with users or groups within the company, while 15 percent of business-critical files were overshared with external third parties.

Concentric’s Risk Distance analysis evaluates business criticality based on contextualized content, file ownership, document meta-data, presence of PII, and peer file comparisons. Business criticality is vital to security assessment and understanding which files must not be overshared. Product files accounted for the leading share of business-critical documents (44 percent) analyzed for this report, followed by financial files (27 percent), legal files (13 percent), and partner documents (10 percent).

To compile the report, Concentric leveraged its Semantic Intelligence solution to categorize and assess documents created and managed by end users. The full report is available from Concentric free of charge at

Concentric’s Semantic Intelligence automates unstructured and structured data security using deep learning to categorize data, uncover business criticality and reduce risk. Its Risk Distance analysis technology uses the baseline security practices observed for each data category to spot security anomalies in individual files. It compares documents with peers in the same category to identify risk from oversharing, third party access, and wrong location or misclassification. Organizations benefit from the expertise of content owners without intrusive classification mandates, with no rules, regex or policy maintenance needed.