When Brian Martin co-founded Rare Hope NFP, a nonprofit focused on giving the public access to hypotheses for rare disease treatment, the organization needed a way to fulfill its purpose despite lacking the millions of dollars and resources of big pharmaceutical companies.
“For any nonprofit to be able to do this type of thing is generally an unreasonable proposition,” Martin said in an interview at the Gartner Data & Analytics Summit in Orlando last week. He noted that the well-known nonprofit Every Cure, which seeks to use FDA-approved medicines to treat rare diseases, has raised about $76 million in funding, underscoring the significant capital needed for organizations with a similar mission.
However, with Martin already having experience with the hybrid data and AI vendor Cloudera, he felt the vendor might be able to help Rare Hope execute on its mission without the high costs that big pharmaceutical companies incur when releasing such hypotheses on rare diseases to the public. Martin did not disclose the amount Rare Hopes spends on using the Cloudera platform.
“It’s an opportunity to do something and to put that type of content in patients’ and doctors’ hands that we couldn’t ever do without millions and millions of dollars,” Martin said.
The Cloudera Effect
One way Cloudera was instrumental in helping Washington, D.C.-based Rare Hope fulfill its mission is that the nonprofit used the data and AI platform to gain insight from diverse types of data.
With the platform, Rare Hope was able to extract knowledge from research papers, medical images, and other documentation, identifying correlations and patterns that would have taken years to discover, Martin said.
Using Cloudera, Rare Hope created data pipelines that processed unstructured data, such as scientific papers, and transformed it into structured data. Using a tool in Cloudera called PySpark (for building data engineering and machine learning pipelines), Rare Hopes can extract knowledge from scientific data, transform that information from unstructured to structured, and then use the transformed data in tools and platforms outside Cloudera or run analysis and find correlations between concepts such as a disease and a drug. Rare Hopes brings the hypothesis back into the Cloudera platform and continues to conduct further studies. In that case, Rare Hopes uses a large language model (LLM) to generate an analysis or hypothesis that the organization will present to the public.
“That data information knowledge, insight, wisdom and impact chain, that’s a pretty well-established hierarchy,” Martin said. “We use Cloudera to automate that base part, that human axis, that wisdom link, to deliver the impact.”
Cloudera and Models
As for generative AI models, Rare Hopes is not committed to any specific model.
For its part, Cloudera does not require its customers to use a specific model. However, the vendor has integrated Nvidia NIM microservices into its infrastructure, enabling it to deploy and manage LLMs. Nvidia NIM microservices is a suite of prebuilt, packaged containers that include an AI model, inference engines, standard APIs, and other tools enterprises need to deploy AI models.
“Cloudera doesn’t make a model and sell it to you,” said David Dichmann, vice president of product marketing and evangelism at Cloudera. “Choose your model, choose your model well, and we recognize you want freedom of choice. Use the right model for the right use case. Do not try to fit everything into one kind of model.”
Rare Hope also recognizes that because different models work better for different tasks and applications, it is important to have access to a range of models. Model choice in Cloudera is an added benefit to the nonprofit, Martin said. The nonprofit does not have to build the infrastructure to access the models, provide them with data, and then bring the results back into the Cloudera platform.
“The Nvidia NIM infrastructure gives us the ability to run some of that stuff directly natively,” Martin said.
While Cloudera already helps Rare Hope save a significant amount of time by helping deliver different hypotheses on various diseases to the public by publishing its research and white paper findings, the nonprofit is now looking at how to monitor changes to the data when a new research paper is published.
“How do we handle different change events within those pipelines to know what the different downstream effects are?” Martin said. “Those types of things save an immense amount of time because instead of rerunning the entire process over again every time there’s new data, we can run incremental processes to analyze the changes and the differences.”

