By Dr. Prakriteswar Santikary
Vice President and Global Chief Data Officer
Today, more than ever, pharmaceutical and biotech companies are under a great deal of pressure to make drug development processes more efficient and cost-effective so they can bring life-saving therapies to the market faster. But the costs and time needed to commercialize a new drug or therapy continue to escalate – currently exceeding 8 years and over $2 billion. A big factor in these staggering rates is the increasing complexity of clinical trials, driven in large part by trial sponsors needing to evaluate more endpoints to demonstrate product value. Collecting, ingesting and analyzing high quality data at scale and in near real time from the disparate clinical systems that collect these endpoints is therefore a huge focus area as sponsors aim to minimize data quality issues and the current point-to-point data integration nightmare.
Clinical trial sponsors and pharmaceutical companies are also facing increasing challenges based on the need for more complex study protocols and larger digitized data sets to support the next medical breakthrough. When we couple this with the geographic growth of clinical trials, many of which spread across multiple countries to target just the right patient populations, it’s no wonder that we’ve reached a point where humans are struggling to keep up. And it’s not just increasing data volume that is keeping trial sponsors awake at night; data velocity, data variety and data veracity are contributing to this challenge as well.
Against this backdrop, the clinical trials industry needs disruption more than ever before. This is where the dynamic trio of modern data platform, cloud and AI come in.
Data Integration Challenges in the Clinical trials Industry:
A clinical trial, in its simplest term, is defined as a study of human subjects and volunteers that is designed to answer specific questions related to a drug or therapy. Clinical trials are essential to the development and safety of these new drugs, therapies and medical devices.
Conducting a clinical trial, however, is a very complex process, consisting of activities such as protocol preparation, patient recruitment, and site selection, the meticulous collection, analysis, reporting and management of data, and approval of various regulatory authorities.
Today’s modern clinical trials also generate vast amounts of data – clinical data, study design data, device and logistics data, imaging data, activity-related data, lab data, and operational data – from disparate sources, clinical sites and vendors. New data types are constantly being added to increase the safety and efficacy of drugs. These data types include mhealth (mobile health) devices, Internet-of-Things (IoT), home activity monitors, and wearables, just to name a few. New frontiers such as precision medicine and site-less trial are adding additional complexity to data collection and integration. This cumulative data set is not just voluminous, it is also fast data coming to the platform at tremendous velocity, requiring a new approach to architecting data platforms that can scale on demand, both horizontally and vertically, as the business grows and complexity compounds. Then, when we add unstructured data from EHR (Electronic Health record) and EMR (Electronic Medical record) systems, as well as lab, image, and social data to this ever-increasing mix of data variety, this truly becomes a big data problem that is screaming for scale and performance.
Since data in clinical trials is coming from myriad sources at various volumes, velocity, variety and veracity, and given that these systems and vendors are “information siloes” with virtually no interoperability, data integration at scale is a real challenge, requiring an innovative platform-centric approach.
Trial monitoring, risk analysis and patient compliance are also huge issues during the course of clinical trials. Any preemptive way to raise potential risks associated with patient compliance, visit compliance and data quality is a huge win, which is why AI and predictive analytics are playing a pivotal role in this space.
Patient retention is another big challenge. Patients need to be engaged throughout the clinical trial process. This is another area where conversational AI such as Voicebots and Chatbots are already playing a large role in terms of helping patients with visit reminders, medicine and site visit schedules, interacting with patient communities, etc.
Last but not least, effective clinical trials must have an advanced data management and centralization mechanism to ensure accurate data collection, data entry, reporting/analytics, data management, data quality measurements, trial monitoring, risk assessments and validation ─ not just for compliance and regulatory reasons, but also to drive trial success within budget and accelerate time-to-market. An integrated, harmonized view of clinical data from various sources ─ integrated with operational data ─ is of paramount importance when it comes to measuring compound safety and efficacy and achieving these clinical development goals.
Our Approach at ERT – Modern cloud-based data and AI platform that scales as business scales.
ERT’s modern data platform collects, ingests, integrates and analyzes any type of data of any velocity. It employs modern microservices architecture and serverless computing. The platform scales dynamically as demand increases and enables AI and machine learning based on high quality, curated, enriched and integrated data housed inside. The architecture is event-driven and has Open data API, enabling not only rapid data ingestion at scale but also rapid data out at scale for data consumers as well. Table 1 and Figure 1 summarize the quality attributes of ERT’s modern cloud-based data and AI platform.
Figure 1: Quality attributes of ERT’s Modern Data Platform
Best Practices from ERT’s Data Office on the execution of enterprise data, AI and cloud strategy:
Executing an enterprise data lake, AI and Cloud strategy is a journey. Our data office at ERT has learned a lot while building our enterprise data and AI platform using cloud technologies, as part of our architecture modernization initiative. Below we summarize our lessons-learned. These best practices can be applied irrespective of industry.
- View enterprise data as a shared asset and treat it as such – Enterprises such as ERT that start with a vision of data as a shared asset ultimately outperform their competition. Instead of allowing departmental data silos to persist, ERT ensures that all stakeholders have a complete view of the company, their customers and partners. The result is improved corporate and operational efficiency, cost savings across multiple fronts and increased customer satisfaction. Every organization wants to monetize its data but very few companies end up succeeding. Data is the new oil and viewing it as an asset is critical to determining its true value.
- Provide the right interfaces to users to consume the data– Putting data in one place in a scalable cloud-based data store isn’t enough to achieve the vision of a data-driven organization. In order for people (and systems) to benefit from such data platform investment, we need to provide the interfaces that make it easy for users to consume that data. In the end, it’s about letting your people work with the tools they know and are right for the job they need to perform. A data platform must support every channel of data access including API.
- Ensure data security, privacy, protection and access controls– Ensure data security is built into the data architecture from the get go and does not end up becoming an afterthought. Often times, it’s not easy to retrofit data security so the best path to deal with it is to incorporate security best practices as part of data platform architecture. With emergence of GDPR, it’s even more important that it’s tackled right at the gate. Access control and security must be in place as part of the data architecture, as is data protection and privacy. At ERT, our platform houses clinical data and our industry is highly regulated therefore our modern cloud-based data platform takes data privacy, security, protection and access control very seriously – it’s all baked into our platform, not an afterthought.
- Ensure master data management, data cataloguing and data governance as core pieces of your enterprise data architecture – You can have the most scalable data platform in the world but if the data inside that data lake is not governed via tools, processes and technologies, that data lake will soon become a data swamp, so invest in data governance and master data management. Not to mention, invest in a data catalog for documenting metadata and establishing common data vocabulary as well. Data governance is key to maintaining sustained data quality within the platform. Data owners and stewards must be enrolled into the enterprise data strategy initiative, and their roles must be defined. At ERT, we have rolled out a cross-functional data governance team that has team members from individual product lines as well as R&D and operation. We have also established a data architecture council as a cross-functional team to ensure data silos are minimized and the right database technologies are employed to meet business demands.
- Eliminate data copies and data movement – document data lineage: – Every time data is moved from one database to another there is an impact; cost, accuracy, quality and time. The fewer times data has to be moved, the better. By eliminating the need for additional data movement, a modern enterprise data architecture can reduce cost (time, effort, accuracy), increase “data freshness” and optimize overall enterprise data agility. Data lineage becomes increasingly harder when data hops from one place to another. Therefore reducing and/or eliminating data copies should be a big part of the enterprise data strategy. Metadata management is a big piece to this puzzle, as are master data management, data catalog and data governance.
- Choose a fit-for-purpose technology stack and take a platform-centric approach – There was a time when one database technology did the trick, but with data volume, variety and veracity increasing and access to integrated information transitioning from static batch-oriented processing to real-time data ingestion, integration and decision making, scale, agility and performance are critical quality attributes to architecture modernization. Choose the tech stack that meets your business problem. For the clinical trials industry, in particular, since we deal with unstructured and binary data, traditional relational databases are no longer the only option for us. We employ Microservices architecture and while doing so, each of our services is a full stack application with its own UI, middle tier and data store. We employ polyglot persistence and programming. Since our platform is cloud-hosted, we take advantage of managed services by the cloud vendor a lot as well as employ serverless computing for scale and cost containment. Such an approach helps us scale our platform across all dimensions, not just the data tier alone. This keeps our engineers engaged and motivated as they get chances to innovate using modern techniques, tools and technologies.
- Provide Flexible and nimble data architecture – Our data office views data architecture to be built not to last for ever but to change. This is primarily because business demands are changing all the time and business requirements are constantly evolving. Mergers and acquisitions are a big part of many growth strategies. It’s therefore naïve to think that the architecture will support all your present and future needs without any changes. Since changes are inevitable, the architecture must therefore be agile and nimble so it can accommodate such changes quickly and efficiently. Such flexibility, nimbleness and agility in the architecture also give opportunities to employ cloud technologies and open source technologies and different types of backend databases for scale and performance to enable rapid innovation.
- Enable AI, ML and Data Science – The architecture must support analytic sandbox, enabling quick data integration and data preparation for predictive modeling and AI. This means, the architecture must support a scalable data pipeline that can quickly move data from one place to another, transform it and provide a platform for algorithms to train, test and validate multiple models quickly and efficiently. The architecture must enable test automation, continuous integration and delivery pipeline, establishing DevOps and DataOps, for efficiency and innovation. Time to market is key to meeting market demands and to stay ahead of the competition.
- Provide Data visualization – As data complexity increases, the importance of data visualization increases with it. Data visualization is a way to communicate results to all stakeholders. It allows us to take complex findings and present them in a way that is informative and engaging. Data visualization, for example, becomes more important while explaining complex AI models for transparency. A modern platform must support tools and technologies for advanced data visualization.
- Regulatory Compliance – Since data is the king of today’s decision making, the data architecture must not only implement data security, privacy and access control, it does so to ensure the platform is compliant from regulatory perspective. This is particularly critical for clinical trials industry. Proper control and data protection must be in place as part of platform architecture.
Regardless of your industry, the role you play in your organization or where you are in your journey to modernize your company’s data and analytic foundation, I encourage you to adopt and share these principles as a means of establishing a sound foundation for building a modern data architecture. ERT’s modern data architecture foundation with AI enablement built-in is a perfect example of this. While the path can seem long and challenging, with the right framework and principles, you can successfully make this transformation sooner.