Real Names Discovery Pilot

Executive Summary

 

The Real Names Discovery Pilot is a US/EU collaboration being coordinated by Stephen Friend at Sage Bionetworks and Jamie Heywood at PatientsLikeMe and involving the ParkinsonNet and Ashoka organizations. (All) patients involved in the Pilot will agree to contribute their data in affiliation with their name and will be enthusiastic to play an active role on the project.
The goals of the pilot are to enroll 50-200 patients from the Parkinson’s and Amyotrophic Lateral Sclerosis (ALS) communities, as well as potentially additional disease communities, and collect the following data:
  1. time series phenotypic data from patient-contributed questionnaire responses
  2. whole genome sequence for each patient (from whole blood)
  3. serial draw whole blood transcriptomics data
  4. serial draw blood serum proteomics data
  5. serial draw blood serum metabolomics data

Attributes of this pilot that make it unique:

The current project will be run as a pilot to access:

  1. The enthusiasm and willingness of patients to contribute their personal data, to have their name associated with that data, and play an active role on the project.
  2. The success of the Real Names Discovery Pilot will allow the subsequent “Real Names Pilot” RNP to serve as a robust hypothesis generator when it is uniquely structured and populated with massive amounts of patient-correlated data so that all participants are able to play multiple roles on the project (data provider, data analyzer and project funder).
  3. The educational and consent requirements that are needed to run a Real Names Discovery Pilot: to ensure that patients are clearly informed and that their data is handled in an ethically sound manner.
  4. That the benefits of sharing Real Names out-weighs the risks.
  5. The technical feasibility to generate data that consists of patient-contributed phenotypic data that is associated with patient-specific molecular data across multiple technology platforms and then made available for public research.
  6. The timing aspects for longitudinal sampling that provide a meaningful picture of a particular disease or patient status.
  7.  

The first iterations of RNDP will focus on understanding what happens when people affiliate their real names with their data and then have an active voice in determining what happens with that data. We will include patients who intend on joining to act as co-designers of the project to help ensure that this is an actual jointly led project. In its early stages, we do not expect RNDP to provide robust scientific insights but we will learn important things about the risks, benefits and economic implications of democratizing biomedical research.

 

If deemed successful, larger efforts involving hundreds or even thousands of patients across many diseases would follow.

 

The pilot is currently in the planning stages and will formally be described at a thematic level at the April 20-21, 2012 Sage Congress and begin as soon as operational details and IRB approval for the trial are finalized. Initial funding will be secured to support a launch of the pilot. Mt. Sinai School of Medicine, BGI and New York Genome Centre have offered to help underwrite a portion of the costs of human genome sequencing for the first 200 patients.

 

RNDP Project Proposal

 

Overview:

The goal is to connect motivated individuals living with Parkinson’s, Amyotrophic Lateral Sclerosis (ALS) and potentially other diseases to Scientists/Clinicians with the skills and interests to work on understanding disease by way of analyzing an unprecedented multivariate dataset.
This would function in a non-traditional, game-changing manner in that patients would be “involved” and in control at all times of the research that is done with the samples and information they provide through a “Portable-Legal Consent (PLC)” mechanism that would aim to make longitudinal and cross-sectional information generated by the project broadly available.

Concept:

Open Issues to be resolved:

Recruitment from the extended Parkinson's Community

The idea is not to recruit through institutions/hospitals, but via well-established patient communities such as ParkinsonNET (The Netherlands) and PatientsLikeMe (US).
In the Netherlands, there are two main communities to mention, both of which are hosted by MijnZorgNET. The first is the Parkinson Community, which has over 900 active members that include both patients and caregivers. The second is the ParkinsonNET community, which has over a 1000 members that are all health care professionals working with PD patients. Both communities can be used to recruit patients for the REALNAMES project.
To assess the willingness of patients to contribute to this project, a blog was recently posted on Parkinson Community. Within a day, 19 patients responded and most were interested to participate, although some of them raised issues with regard to restricted access to the data or to the repeated visits to the hospital. Interestingly, the 20th respondent was a speech therapist who asked all the PD patients she saw that day and also forwarded a generally positive response; this illustrates that also the allied health care workers that are part of ParkinsonNET can be used to get patients involved.
Out of over 6,000 individuals in the Parkinson’s Community on PatientsLikeMe, over 800 identify as “public patients.” PatientsLikeMe believes a large number of these patients, if eligible for the research project described as the Real Names Pilot, would be willing to consent and participate in a study.

Portable Legal Consent (PLC)

Summary: Portable Legal Consent (PLC) is a standardized informed consent system for anyone who has obtained data relevant to their health and would like to donate that data for research purposes. PLC works by running volunteers through a short process in which they learn about informed consent, sign an IRB-approved informed consent form, and then upload the data they have chosen for donation. The existing PLC system does not transmit “identified” data, donors must indicate that they understand there are some risks of re-identification and harm in volunteering for donation. For the purposes of the RNDP it will be necessary to rewrite the PLC to recognize that all RNDP participants will willingly provide their own names and genomic data.
Opportunity: The capacity of an individual to gather data about himself or herself has exploded. All of us are now capable of being sensors, or interfaces, or both, to data that could in theory be used to understand and intervene in health: our genotypes, our medical records, information about our lifestyles gleaned from phones, online surveys, and more. At the same time, the ability of an individual research scientist to generate data from individual biological samples has exploded as well, with new assays and machines launching yearly, drawing us ever closer to the $100 genome and related services. We have the opportunity to draw all this big data together, at scale, and begin meaningful analysis of which genetic variations in a population correlate to diseases, to drug response, to long life. The cost of creating, storing, querying, and distributing that data is dropping to a commodity price level, one within the reach of both traditional scientists in research institutions and a growing legion of citizen scientists. This means we can vastly increase the number of people seeking solutions and creating knowledge about our genomes and our health, without massive increases in research investment – simply as a carried benefit of the internet.
Problem: The opportunity faces a hard reality: privacy regulations. Privacy laws, on health data as well as other kinds of individually identifiable information, strictly regulate the kinds of actors who can use data about individuals as well as the kinds of uses they can perform. These laws were built to protect people from harm, a laudable goal. But they are increasingly being left behind by the technical capacity to re-identify individuals in massive data sets, and they are also stifling the ability of researchers to creatively leverage computational infrastructure to begin making correlations between genomes and health. This is a problem that emerges from a mixture of good intentions and rapid technological development, but it’s a problem nonetheless. Scientists currently must create consent approaches acting as artisans, a new consent system for every study, at a very high cost in time and money. Worse, the outputs of one consent system are almost never interoperable with the outputs of another consent system – meaning that the sort of exploratory computational inquiry we take for granted in the consumer world, where we can Google the entire web again and again, is not possible.
Solution: The solution here is actually quite simple. First, we must act within the existing consent system (at least until the laws change) but there is no reason we cannot create a single, standardized consent form that we give away for free. Second, we must find a body of volunteers who are willing to be named and would like to donate their data without restrictions beyond those of “do not harm me” – a set of people who understand the uncertainty of genomic research, and want to donate anyway. Third, we must make it easy for the researcher to find, access, and leverage the data that is donated for research purposes.
The consent form itself is embedded in a web-based process, akin to a software “wizard” – the user encounters the key ideas behind PLC in an easy to understand framework, and can only move forward from step to step after checking key elements on the screen. This is based on the real world experience a citizen would have enrolling in a more traditional study, where a physician or clinician explains the process and consent. PLC is gathering data on the efficacy of the web-based process to ensure that no volunteer donates data without understanding the implications, for good and for bad, of donation.

Phenotypic Data Collection: Parkinson's Example

The actual starting point of recruitment for the initial Parkinson’s component of the RNDP will be an accurate diagnosis of PD and merely asking patients from the abovementioned communities to participate is not sufficiently reliable. We therefore need movement disorder experts involved who a) confirm the diagnosis, and b) re-assess patients during infrequent hospital visits (i.e. every six months) for clinimetrics such as the UPDRS, timed up-and-go test etc. Central to the phenotyping process, however, should be self-assessment scales and questionnaires that the patients can complete online at a time and place that suites them best. This can be supplemented by telemetrics, such a data from an electronic pegboard test and accelometers. One the one hand, the exact list of phenotypic data to collect depends on the pre-specified research questions. On the other hand, the analysis of ‘bluntly’ collected data can also generate new hypotheses and ideas and one therefore should not start out too restricted. A preliminary poll of Parkinson’s patients did indicate that frequent hospital visits might be a barrier for patients to participate, and therefore the actual hospital visits should not be more than twice a year. However, the self-assessment tools, which include not only known instruments for sleep disorders and depression, but also questionnaires for side-effects, activities and diet (amongst others), that could be completed at a monthly interval. The hospital visits can also be used to gather samples for biomarker data such as serum and CSF (e.g. for alpha-synuclein measurements) and to acquire structural brain MRIs. However, the final and detailed phenotypic assessment protocol should be generated by a small scientific think tank that advises the project coordinators.

Longitudinal Blood Sample Collection

Collecting data to execute a “first in kind” statistical vector field analysis applied to mixed-cross sectional and longitudinal data across profiling technology platforms and phenotypic databases will require a robust means to collect blood samples of multiple types without creating insurmountable or undesirable hurdles to patients who have enrolled in the Project. As stated above, frequent visits to hospitals and clinics can be challenging for patients. One of the central tenets to this project is that it aims to allow patient participation to occur in times and places of their choosing to the greatest degree that it is possible without sacrificing sample and subsequent data integrity. This project will establish fixed protocols for trained and licensed phlebotomists to draw blood samples for analysis across multiple profiling platforms. Central to this arm of the Project is the integrity of the data generated. Pilot experiments will be run to validate sample collection, storage, and shipping protocols prior to enrolled Parkinson’s patient sample analysis across any of the selected technology platforms. We will need to identify a group of advisors with expertise across the technology platforms to work closely with the project coordinators to help guide this process.

"-omics" Data Generation

Technology platforms and potential contract or collaborative partners will be selected to execute the generation of genome, transcriptome, proteome, or metabolome datasets. The Project advisors will include a team of individuals with both specific multidisciplinary experiences in profiling technologies to work closely with project coordinators to make the decisions judiciously.

Scientific Questions

Part of the disruptive nature of this pilot project is the concept that the patients have control of the scientific questions that could and should be addressed with regards to diseases such as Parkinson’s Disease, and as such, the focus should be on those things that deliver direct benefit to these patients. The idea is to build a community of patients who are answering key questions about their illness but more importantly raising additional questions worth studying/solving through the Traitwise technology. The question generation phase will be open for a defined period of time to aggregate insights from the community on key questions that need solving.
As a starting point below are some high level ideas about the types of fundamental questions/issues that PD patients face that are not supported by current knowledge:
Diagnosis: There is no currently available objective test that can diagnose PD. Given what we know about the genetics of PD it is unlikely that WGS will deliver a definitive test for diagnosis based on primary DNA-level changes; although it is possible that epigenetics might. More likely this information would come from “biomarker” focused studies that would monitor transcripts/proteins/metabolites/brain images, etc– preferably over time from pre-symptomatic stages – and identify molecular or structural brain changes uniquely associated with PD.
Prognosis and Progression: There is currently no way of predicting how rapidly a newly diagnosed patient with PD will progress or which aspects of the disease they will express. As for diagnosis, progress in this area is most likely to be achieved by prospective longitudinal monitoring from early or pre-symptomatic PD individuals to identify molecular or brain imaging markers that correlate with and therefore can be used to predict progression. This could and should be a part of the prospective aspect of this project. Note there are other initiatives under way to address this question such as the Michael J Fox Foundation for Parkinson’s Disease sponsored PPMI project.
Therapies: With regard to current therapies, there is a potential need for identifying patients who will respond and/or show side-effects with particular available drugs. To the extent response/side-effect information is available for the cohort that is enrolled for WGS, we can readily perform analyses to look for genetic predictors (there are notable successes for identifying genetic predictors of drug response such as for example Warfarin). The broader opportunity however will be in identifying new drug targets through a better molecular understanding of the disease patho-physiology gained through the WGS. In this case the development of specific “molecular phenotypes” from longitudinal collection and analysis of bio-specimens will provide a framework for relating DNA changes to downstream molecular changes that drive disease phenotypes and allow hypotheses to be generated around pathways that are affected in disease and targets that might be candidates for drug development.

The Time For RNDP is Now

Powerful advances in compute technology, social media and molecular data collection have set the stage for the emergence of revolutionary approaches to science and medicine. Citizens and scientists can work together to form communities, around diseases and lifestyles and therapies: this would not have been possible even five years ago. Citizens can get their own data, quickly and cheaply, on the consumer market. Infrastructure like Synapse, a computational platform from Sage Bionetworks where donated data can be accessed and queried by researchers, means that donated data has a place to go where citizens know it will be used. And a growing media awareness of “That’s My Data,” a campaign urging citizens to ask for a copy of their own data whenever they provide a sample, means that the movement into which PLC fits will not stay long in the shadows.
As has already occurred for physics, medical sciences are now poised to enter a new and important phase of maturity: one where massive molecular data collection is distinct and separate from data analysis. Our ultimate vision is that by applying a statistical vector field analysis to the data of many patients across many diseases, a new approach to medicine emerges: one where each of us has access to our own living “contour map” of health. We can update the “map” with new information from ongoing treatments, and we can compare our map to the aggregated maps of others like us and in a real sense, map out regions of the future that could emerge for us based upon the decisions we take today.