We have improved upon standard Differential Privacy frameworks to anonymize and show individual data, whether that data be numerical or categorical. Our algorithm allows us to anonymize small and large datasets in a way that satisfies Differential Privacy while preserving the statistical integrity and analytical utility of the dataset. This anonymization process takes minimal computational power and does not require large datasets. In our demo, we can show that it can anonymize datasets of a 1000 rows, with both numerical and categorical variables, within seconds while preserving its statistical integrity, yielding high classification and regression accuracy with respect to the results that the original sensitive dataset would produce.
In a nutshell, our software takes in raw individual data and outputs raw individual data that minimizes risk of de-anonymization for individuals and preserves significant value of the data. We break away from the Differential Privacy norms of providing statistics on groups of data and can show not only individual data but also all the columns that can be used for various analysis (excluding columns that pertain to unique identifiers such as Social Security Numbers). Furthermore, our algorithm can beat Synthetic Data methods as it does not require big data or a lot of computational power.
So it's a game-changer in terms of privacy-preserving data
1. We preserve a lot of value for data while making it privacy/GDPR-compliant.
2. We are computationally efficient and can work on any size of a dataset.