Contract Accord 14: Data
Accord Revision Date: April 2019
Page Updated: January 2020
©2020 University-Industry Demonstration Partnership (UIDP). Please refer to the copyright and disclosure statement for UIDP Contract Accords usage and rights.
BACKGROUND AND OVERVIEW
Data is often a necessary component or the result of a research project. For purposes of this Contract Accord, Data means a set of recorded information that is provided by one party to another party for use under defined conditions. Data can take many forms, originate from various sources, and have different levels of sensitivity due to a variety of factors. Data may be governed by the provisions of a separate agreement often referred to as a Data Use Agreement (DUA)or relevant provisions contained in another agreement, such as a Sponsored Research Agreement (SRA) or materials transfer agreement (MTA). For ease of reference, both the separate DUA and Data provisions in other agreements are referred to as Data Clauses. In either case, the provisions will, at least, define or describe the Data, the terms and conditions of use of the Data, and the rights and obligations of the parties related to the use of the Data.
Data Clauses are both a means of informing the recipient or user of the Data (“User”) of requirements regarding the Data and a means of obtaining the User’s agreement to abide by these requirements. Virtually, any organization can be a provider (“Provider”) or User of Data depending on the situational context. Data may be unstructured or structured. Examples include technical Data pertaining to the operation of a motor, device, system, financial Data, economic Data, proprietary business information, records from governmental agencies or corporations, student record information, human research subject Data, and healthcare Data. Data could refer to the source Data, a set of Data, or compilations of Data (databases).
Data is often proprietary to the Provider, so many of the same elements and concerns that are present in confidential disclosure agreements (CDAs) or MTAs are also commonly addressed in Data clauses.1 Unlike CDAs and MTAs, however, where the Provider claims ownership or exclusive control of the confidential information being disclosed or materials being transferred, Data ownership may be difficult to ascertain, particularly if the Data is in the form of a database that contains raw or source Data obtained from different sources. In those cases, Data Clauses often do not contain Data ownership provisions, and the Provider takes on a stewardship role that assumes control of the Data and the right to enter into Data Clauses regarding its use.
PROVIDING AND RECEIVING DATA
To determine whether Data can be shared, the Provider needs to know:
- Where the Data came from (e.g., derived from laboratory tests, results of interviews with human study participants, or provided by others);
- Who needs or wants the Data (e.g., students, foreign nationals, clinicians; academic researchers);
- What the User wants to do with the Data (e.g., comparative research, validation, marketing, patient support);
- What the proprietary value of the Data is to the Provider (e.g., a database that would be costly to replicate or that has unique copyrightable formatting); and
- What institutional, legal or regulatory requirements apply to the provision of the Data (e.g., HIPAA2, Common Rule3, Export Controls4, General Data Protection Regulation5, FERPA6).
The User also needs to consider the following to determine if they can receive and use the Data:
- What Data is needed to accomplish a desired purpose (e.g., aggregate Data or source Data, personally identifiable or de-identified Data);
- What is the scope of the intended use (e.g., research only, commercial use, redistribution, public access);
- Whether consent from the owner, controller, research participant or someone else is needed to allow use of the Data as the User intends, and if so, whether the consent contains conditions or restrictions applicable to disclosure and use;
- Who will need access to the Data (e.g., requestor only, students, other researchers, subcontractors, Companies, journal editors);
- What legal or regulatory requirements related to use of the Data are in place (e.g., Institutional Review Board (IRB) approval, Privacy Board approval under HIPAA, license under export control regulations); and
- Which portions of the Data will need to be disclosed if the User publishes results of their use of the Data.
DATA SECURITY PROVISIONS
Once the Provider and User have agreed on what Data is being shared and the scope of the intended use, they can address questions about terms needed to protect the sensitivity of the Data. Data security provisions fall into four general categories:
- Authorization or privilege management—identification of individual Users who are allowed to use the Data;
- Authentication or identity management—confirmation that the authorized User is really the authorized User;
- Monitoring and enforcement—validation and assurance that use of the Data is consistent with authorized use and conditions of use, such as keeping the Data separate from other Data, in a secure location or enclave, or not on a linked computer or laptop; and
- Data protection—instructions regarding any special infrastructure required to store and restrict access to the Data (e.g., dedicated and isolated servers and locked cabinets); special control processes to protect the integrity of the Data, track the location(s) of the Data, track the release of the Data and the reasons for its release; and archiving or disposing of the Data at the prescribed time.
Security requirements of the User should be proximately related to the value of the Data and to the harm that could be caused by improper use or release of the Data. Data security requirements are often referenced to standards and guidance promulgated by the National Institute of Standards and Technology.7
Additional considerations are needed in projects involving Data derived from humans. The Data Clauses assure that the use of the Data is consistent with the informed consent obtained from the human participants or the confidentiality assurances provided to non-clinical human participants. Data Clauses help to prevent the inappropriate use of protected or confidential information that could cause harm to the research participants providing the Data (or that the Data is about), or to the research participant’s family.8
Some Data is made available for secondary research, which is research done using Data by someone other than the original person who collected the Data. The Common Rule preamble (not the rule itself) defines secondary research as “reusing identifiable information and identifiable biospecimens that were collected for a different, primary purpose” and allows for some secondary research to be conducted without the subject’s consent under certain conditions.9
Some considerations for secondary research use:
- If individual authorization was obtained initially, is the secondary use encompassed in the initial authorization?
- Is a waiver of authorization for research appropriate?
- Can a limited Data set10 be used for the secondary purpose?
- Can de-identified Data be used for the secondary purpose instead of identifiable Data?
- Does a business associate agreement authorize a registry operator that is a business associate of the Provider to make the secondary use or disclosure (e.g., to create de-identified Data for a research purpose)?11
Data about students may be subject to federal laws that protects a student’s right to privacy about grades, behavior, and other factors. Student or parental consent may be required prior to disclosure of such information by a Provider even if the User’s intended purpose was non-profit research.12 This regulation gets particularly complex for Universities when a student is the User of Data that is part of their coursework, (i.e., evaluation of the student’s use of the Data becomes part of the student’s education record).
Databases are collections of Data that are organized to allow for easier access to Data that has some factors that make the Data cohesive.13 Databases are used internally by organizations (e.g., personnel records) but, in the context of research, are developed for use by investigators who share a common interest in the Data, even though it may be used for different projects and purposes.
Though Data as facts or ideas is generally unprotectable under copyright law,14 databases may be protected by copyright law as compilations (defined as a collection and assembling of preexisting materials or of Data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship).15 Databases are often coupled with software (called a Database management system) that permits ease of use. Access to these databases may require a license to the database management system.
Databases may be repositories of Data contributed by various Providers who are members of a particular group that agrees to contribute and use Data pursuant to a common DUA.16 Others are compilations of information from other Data sets.17 Archives of Databases allow Providers to deposit Data and retrieve Data. Some archives require membership and are widely used for the convenience they present, as well as the tools, access, and curation services they make available.18
The term registry is commonly used to refer to a database that is focused on a particular disease or condition.19 Some registries collect Data that is self-reported by patients. Others collect Data derived from medical or research records.
Some databases are maintained by a government agency, and Data deposit is required under federal regulations. Perhaps the best known of these is clinicaltrials.gov, a publicly available database containing information on publicly and privately supported clinical studies on a wide range of diseases and conditions.20
Providers and Users with shared interests may enter into consortium membership agreements that describe the conditions under which members may deposit, access, use, store, and share Data. Membership in a Data consortium may be free or may charge for membership or services. Various Data consortia are maintained by for-profit and non-profit organizations as well as by federal and some state governments.21
Data consortia provide an expedited way to provide and receive Data among a limited group of trusted colleagues. The Data consortia often also describe conditions for review of publications resulting from use of the Data and occasionally address intellectual property rights resulting from use of the Data. The consortium may be subject to bylaws or other use provisions referenced in the terms of service or other conditions of use on a related website but not fully contained in the consortium agreement. Providers intending to join a Data consortium should read all relevant conditions of membership and Data use before making a commitment so they fully understand how membership facilitates their objectives as well as to avoid conflicting obligations in other agreements regarding the Data.
FEATURES OF DATA CLAUSES
Data Clauses generally address the following points, at minimum. The parties should consider these provisions in the context of the anticipated Data transfer and use and include terms that are relevant in the context of the Provider’s and User’s considerations described above:
- A clear description of Data to be provided;
- Permitted uses for the Data and any regulatory requirements that the Provider needs to have in place;
- Names or general descriptions of individuals who can access or receive the Data;
- Conditions under which the User can provide the Data to other Users;
- Length of time the Data is to be made available by the Provider and retained or used by the User;
- Method of Data disposal at the end of the use period (returned or destroyed);
- The User’s obligations regarding new Data generated that is based on the Data originally provided;
- Management of new intellectual property created using the Data;
- Instructions on how the Data should be aggregated, encrypted, anonymized, or de-identified;
- Safeguards required to protect confidential, private, and sensitive information;
- Process for review by the Provider of publications resulting from use of the Data;
- Practical aspects of the Data transfer (e.g., where, when, how); and
- Statement of ownership of the Data if it is proprietary, and the provenance and authenticity of the Data if that requires confirmation.
DATA CLAUSES MAY NOT BE REQUIRED IN SOME CIRCUMSTANCES
When Data is publicly accessible and in the public domain (i.e., the Provider has dedicated any copyrights that may exist in the compilation of the Data to the public), it may be downloaded from the internet or received from a Provider without restrictions on use or redistribution.22 It should be noted that public accessibility is not equivalent to being in the public domain, and Users should be careful to read the copyright, privacy terms, and other information about potential use restrictions that may be described on a website before using or redistributing such Data. Data that is not subject to legal, regulatory, or other restrictions of use may be made available by the Provider without a Data Clause.
- The Data Provider is responsible for analyzing the source, sensitivity, legal, and regulatory aspects of the Data to determine what provisions are needed in the DUA and its related obligations in providing the Data for the User’s intended purpose.
- The User is responsible for assuring that Data use is compliant with applicable regulations and can meet the requirements imposed by the Provider in the Data Clauses.
- The User should clearly explain the intended use of the Data to the Provider.
- Data Clauses that involve performance of research by a University should include a process to allow the Provider to preview publications before public disclosure to identify and modify or remove any sensitive Data that the Provider does not want published. (See UIDP Contract Accord 3: Publication.)
- The Provider and User should clearly describe any special requirements (e.g., privacy, confidentiality, information security standards) that the User is expected to meet.
The following topics are not covered in this Contract Accord:
- Classified Data, technical Data, and export restrictions on Data (See UIDP Contract Accord 7: Export Controls);
- Residual information (information retained in unaided memory);
- Fair Use of copyrighted Databases (See UIDP Contract Accord 8: Copyrights.);
- Data management plans and federal agency requirements;
- Data storage and information security requirements;
- Implications for University facilities and administrative costs;
- Tangible material or samples, e.g., geological samples (See UIDP Contract Accord 10: Material Transfer Agreements and UIDP Contract Accord 4: Other Research Results.); and
- Biometric identifiers as Data.
 Some of the provisions of DUAs are similar to confidentiality agreements, and in some cases, a Confidential Disclosure Agreement (CDA) format may be used to transfer Data. See UIDP Contract Accord 9: Disclosure and Protection of Confidential Information. Other provisions may be similar to those in Materials Transfer Agreements (MTAs). See UIDP Contract Accord 10: Materials Transfer Agreements.
 Health Insurance Portability and Accountability Act of 1996 (HIPAA; Pub.L. 104–191, 110 Stat. 1936, enacted August 21, 1996); HIPAA overview: https://www.hhs.gov/hipaa/for-professionals/covered-entities/index.html
 The EU General Data Protection Regulation. 95/46/EC See https://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX%3A31995L0046
 Family Educational Rights and Privacy Act, 20 U.S.C. § 1232g; 34 CFR Part 99, see https://www2.ed.gov/policy/gen/guid/fpco/ferpa/index.html
 See NIST Special Publication 800-series General Information, “Publications in NIST’s Special Publication (SP) 800 series present information of interest to the computer security community. The series comprises guidelines, recommendations, technical specifications, and annual reports of NIST’s cybersecurity activities.” https://www.nist.gov/itl/publications-0/nist-special-publication-800-series-general-information
 See National Human Genome Research Institute, “Privacy in Genomics,” for a summary of use of genomic information, https://www.genome.gov/about-genomics/policy-issues/Privacy
 Secondary research for which consent is not required: Secondary research uses of identifiable private information or identifiable biospecimens, if at least one of the following criteria is met:
(i) The identifiable private information or identifiable biospecimens are publicly available;
(ii) Information, which may include information about biospecimens, is recorded by the investigator in such a manner that the identity of the human subjects cannot readily be ascertained directly or through identifiers linked to the subjects, the investigator does not contact the subjects, and the investigator will not re-identify subjects;
(iii) The research involves only information collection and analysis involving the investigator’s use of identifiable health information when that use is regulated under 45 CFR parts 160 and 164, subparts A and E, for the purposes of “health care operations” or “research” as those terms are defined at 45 CFR 164.501 or for “public health activities and purposes” as described under 45 CFR 164.512(b); or
(iv) The research is conducted by, or on behalf of, a Federal department or agency using government-generated or government-collected information obtained for nonresearch activities, if the research generates identifiable private information that is or will be maintained on information technology that is subject to and in compliance with section 208(b) of the E-Government Act of 2002, 44 U.S.C. 3501 note, if all of the identifiable private information collected, used, or generated as part of the activity will be maintained in systems of records subject to the Privacy Act of 1974, 5 U.S.C. 552a, and, if applicable, the information used in the research was collected subject to the Paperwork Reduction Act of 1995, 44 U.S.C. 3501 et seq.
 A limited data set (LDS) excludes 16 of the direct identifiers included in the definition of protected health information under HIPAA and permits research use of the LDS for research purposes if the Provider and User enter into a DUA that meets the criteria in the regulation. 45 CFR 164.514(e)
 SACHRP Updated FAQs on Informed Consent for Use of Biospecimens and Data https://www.hhs.gov/ohrp/sachrp-committee/recommendations/attachment-c-faqs-recommendations-and-glossary-informed-consent-and-research-use-of-biospecimens-and-associated-data/index.html
 See FERPA, FN 6, The Family Educational Rights and Privacy Act (FERPA) requires a written agreement to disclose Personally Identifiable Information (PII) from educational records without consent. These written requirements must meet DFR 99.31(1)(6)(iii)(C) or 99.35(a)(3).
 Babbi, G., Martelli, P.L., Profiti, G. et al .eDGAR: a database of Disease-Gene Associations with annotated Relationships among genes. BMC Genomics 18,554 (2017) doi:10.1186/s12864-017-3911-3 “eDGAR, a database of gene-disease associations, supplemented with the annotations of intergenic relationships in heterogeneous and polygenic diseases.”
 See e.g., the Inter-university Consortium for Political and Social Research (ICPSR), “ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.” https://www.icpsr.umich.edu/icpsrweb/
 For more discussion, see https://www.nih.gov/health-information/nih-clinical-research-trials-you/list-registries
 https://clinicaltrials.gov/ct2/about-site/background, 42 CFR Part 11
 Examples of Data consortia include: IXI Services database; Higher Education Data Sharing Consortium; Linguistic Data Consortium; Inter-university Consortium for Political and Social Research (ICPSR); The Public Health Data Standards Consortium (PHDSC); The Encyclopedia of DNA Elements (ENCODE) Project Consortium; The International Cancer Genome Consortium (ICGC); The Material Data Management Consortium (MDMC); National Institute on Drug Abuse Genetics Consortium.
 Acknowledging the source of the copyrighted material does not substitute for obtaining permission. The safest course is to get permission from the copyright owner before using copyrighted material regardless of the form of the Data. See UIDP Contract Accord 8: Copyrights.