Enabling privacy and choice for customers in data system design
You may have heard the expression “data is the new oil” or remember the Economist1 cover stating, “The world’s most valuable resource is no longer oil, but data.” While these may be true in the general macro sense, for many organizations, their data is more akin to their lifeblood. Their data is precious, absolutely essential for their functioning, and the consequences of data loss or leakage can severely threaten the health of the company.
This article addresses privacy in the context of hosting data and considers how privacy by design can be incorporated into the data architecture. This is particularly relevant when the data potentially includes user information, and the architecture must ensure hosting of the data complies with customer preferences or regulatory requirements regarding where the data is hosted.
When architecting data systems, a key philosophy is keeping customer privacy front and center both in the design choices made and and options presented to the user, while ensuring the ability to meet business needs and service criteria.
What is privacy?
In the first article in this series, the topic of privacy by design is covered in some detail, particularly regarding the design of products for enterprise security and data hosting.
This article addresses practical implementations of data systems to maximize privacy and put control in the hands of the customer/user of the tool. Enabling the customer to configure what information they wish to host or transfer, and where, empowers them to make choices that align with their business objectives and regulatory requirements on their business. This is particularly relevant to businesses operating in jurisdictions with strong privacy rules (e.g., EU, UK, and Switzerland), and in regulated markets including healthcare information (protected under the Health Insurance Portability and Accountability Act [HIPAA] in the US) or financial information (protected under Gramm-Leach-Bliley Act [GLBA] in the US).
What regulations require privacy by design?
The GDPR requires privacy by design and by default. Specifically, Article 25 of the GDPR, titled “Data protection by design and by default,” and Recital 78 — which notes that “the controller should adopt internal policies and implement measures which meet in particular the principles of data protection by design and data protection by default” — clearly state this.
Why is this relevant to customers of security providers?
Companies and organizations that are established in the European Union, or that target residents of the European Union, are subject to the GDPR. This is referred to as the extraterritorial effect2 of GDPR, where Article 3 defines the extraterritorial effect under the Establishment criterion via Article 3(1) and under the Targeting criterion via Article 3(2).
This means that a provider of tools or services that are used by European Union residents, or process the personal information of European Union residents, are subject to the GDPR. Similarly, this also applies to UK residents under the UK GDPR3 .
What regional data requirements or preferences should be considered?
Many customers have preferences as to where and how their data is hosted. In some cases these preferences may be due to regulatory requirements, and in other cases they may be due to the customer’s own risk appetite, privacy decisions, and/or internal guidelines. In addition, there may be business reasons such as picking a close region to minimize network latency, or to ensure redundancy across regions in the event of an outage or force majeure disruption of service. Either way, service providers can accommodate these preferences by providing configurability around data hosting, and how and where this is accomplished.
How can privacy by design be incorporated into these internal processes?
By giving customers the options and granularity to select what data is hosted, where it is hosted, and for how long the data is held, privacy can be built into the product by design. The default settings should be privacy-preserving, i.e., default configuration should be as protective of privacy as reasonably possible, then allow the customer to change the setting to be less restrictive in accordance with the customer’s preferences.
What is the Lacework philosophy on privacy by design for our internal data access?
Our philosophy is to provide our customers with the power to specify where and how their data is hosted, with a default configuration of strong privacy. Customers can then make their own selection as to what specific data elements they want hosted within their own region, and what (if any) they want hosted outside their region.
How is this philosophy implemented in practice?
At Lacework, customer database (CDB) information sits in multiple separate data warehouse instances (a.k.a. shards) dispersed across the US, EU, and Australia. By default, EU customer data is hosted in the warehouse EU shard (based in Frankfurt, Germany). Similarly, APAC customers have their data hosted in the warehouse Australia region. In many cases we see that customers prefer to have their data stored and managed locally in their home region, both for reasons of regulatory compliance and also business preference.
Using a data orchestration tool, for example such as Apache Airflow4 , an open source5 tool for managing data engineering pipelines, customer data (CDB) is parsed in a local data warehouse within the customer home region. This local parsing involves identifying and either removing or masking any user identifiable information. As Lacework knows which data fields are potentially present in the original customer data, Lacework can either exclude such data fields, or mask the field using a SHA2566 one-way hash to anonymize the data field and ensure it cannot be tied back to an identifiable data subject. This is privacy preserving by design and by default, as the SHA256 one-way hash cannot be reversed to unmask the original data in cleartext (human readable) format. The SHA256 hashing algorithm is considered secure for commercial purposes, with NIST stating,7 “NIST encourages application and protocol designers to implement SHA-256 at a minimum for any applications of hash functions requiring interoperability.”
In some scenarios, after the data is hosted and sanitized locally as described above, the sanitized data is replicated to a central warehouse account based in the US. The data replication may be performed leveraging the warehouse recovery tool, which is performed over a secure infrastructure using end to end encryption.
Giving the right users access to the right data
As an extension to this already private by design architecture described above, to ensure the different internal team members can only access data for their respective customers, Lacework implemented segregation of curated data and marts, identity groups and row level security controls.
Curated data are highly granular points of data, which while useful individually, are difficult to derive a “big picture” view from. In addition the high granularity of curated data can potentially result in performance bottlenecks.
A mart is a group of aggregated tables (e.g., with counts of resources and vulnerabilities) derived by aggregating curated data in a higher level “big picture” or “zoom-out” view which enables effective reporting and analysis using business intelligence tools such as Thoughtspot or other similar tools.
Using the privacy by design approach described above, limited roles are assigned to business users who need to derive business insights, without having access to the underlying granular data. Thus, the business users are empowered to make effective use of insights from the data in performance of their business roles and goals, but in a way that does not expose the underlying granular data to them. This is a “need to know” segregation of control and protects the privacy and confidentiality of the underlying granular customer data. The business role described herein is highly customizable to ensure that business users only can access data required for their role, and as a side benefit enhances their user experience by providing a fast and responsive engagement with the dataset.
Role based access controls for enhanced security and privacy
As a further demonstration of our commitment to protecting the privacy and confidentiality of our customer data, an additional control point encompasses implementing role based access controls in Lacework business intelligence (BI) tools, which are governed by our identity and access management applications.
In practice, access to all the applications is provisioned and controlled via central SSO using groups based on user role and org. This ensures that as members join or leave the team, or change roles within the team, the identity and access granted to each member is appropriately scoped for their present role. This ensures that Lacework users are automatically provisioned under the right groups with the relevant permissions applied by default.
As a further enhancement, Lacework has implemented row level security to ensure that two team members of the exact same team will only see data for their respective customers. This prevents role crossover and enforces “need to know” in terms of data access.
These steps taken together enable Lacework to provide closely-governed access to data that improves the overall customer experience, while protecting the privacy and confidentiality of the underlying customer data.
What are the customer privacy benefits of doing this?
By providing a customer with choices regarding masking any potential personal information (or other important information fields) contained within data, the customer is empowered to make a determination about what to transfer (or not transfer) outside of their environment. This permits a customer to select the data being transferred, in accordance with the customer’s preference and any regulatory requirements the customer may be subject to.
What are suggested best practices for teams considering data hosting in different geographies?
As a best practice, the authors recommend incorporating privacy by design into all aspects of product and data planning, to ensure that only the minimum amount of personal data is transferred, only for the minimum duration required, and that the user of the tool (customer) has clear and granular control over what data to transfer and to where it may be transferred.
Conclusion
This article describes the steps Lacework takes to ensure strong data governance controls utilizing a privacy by design approach to protect the privacy and confidentiality of our customers. These steps go beyond the initial product design phase, and form a core element incorporated throughout the processes and internal technology stack at Lacework.
At Lacework, we believe that strong data governance controls should be a core consideration of every system design: the diversity of use cases and internal stakeholders that utilize data on a regular basis demands a thoughtful approach to data management and control. A key element in the privacy by design system architecture approach is allowing customers to configure what information they host or transfer to ensure alignment with both customers’ business objectives and regulatory requirements. This is especially relevant for businesses operating in regulated industries such as in healthcare and financial services, or under data protection regulations such as GDPR.
By giving customers options over how and where their data is hosted, Lacework provides them with confidence and control over the privacy of their key asset. This is how Lacework puts customers first — by ensuring protection and control of customer data is at the core of our technology and data architecture design.
About the authors
Michael Moore: As Vice President of Privacy and IP, Michael is responsible for privacy and cybersecurity, procurement, product counseling, transactional support, patents and intellectual property strategy, open-source software, and other matters. Michael is a seasoned attorney with more than a decade of privacy, cloud, transactional, software and hardware counseling and patent and IP experience, which follows his technical career in logic design and software engineering. Michael holds the IAPP privacy qualifications of CIPP-US, CIPP-E, CIPP-C, CIPM, and CIPT.
Diogo Ribeiro: As Vice President of Data, Diogo is responsible for data science, analytics and data engineering functions at Lacework. Diogo steers the company’s data strategy and vision, upholds stringent data governance and compliance standards, and cultivates a data-driven culture underpinning product innovation and operational excellence. Diogo has over a decade of experience as a data leader, having served as VP of Data at ThousandEyes (acquired by Cisco Systems), and led the Google for Education data team.
Alan Mulvaney: As Senior Manager, Legal at Lacework, Alan has more than 20 years experience in the IT industry, specializing in the negotiation of commercial contracts with a particular focus on Europe. Alan, who has practical day-to-day experience and knowledge in the highly privacy sensitive markets of Germany and France, in particular holds an insight into the key concerns of privacy sensitive customers, especially in highly regulated industries. Alan holds a Practitioner Certificate in Data Protection, in addition to his Bachelor of Business Studies and his Associated Membership of the Chartered Institute of Personnel and Development.
Kanak Chavda: Kanak is a versatile data professional with 22+ years experience, driven by product mindset and passion for creating impact through intrinsic value of data. Kanak has delivered some of the most impactful and complex data projects at Lacework, Netflix, Intuit, NHS England, and Mastek, spanning US, UK and IN geographies. Kanak has built data warehousing, feature engineering, machine learning, streaming, privacy and protection, business intelligence, data integration solutions in cloud and data center environments. Kanak’s previous roles include data engineer, architect, modeler, analyst, profiler, visualization engineer, performance specialist, and data management practice head.
4 https://airflow.apache.org/docs/apache-airflow/stable/index.html
5 https://github.com/apache/airflow
6 https://csrc.nist.gov/glossary/term/sha_256
7 https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions