Coffee Science methodology Episode 13: Evidence Hierarchies - the healthy scientific dialogue and progression model Artwork

Coffee Science for CoffeePreneurs by CoffeeMind

This podcast is our playground for discussing how Coffeepreneurs can leverage scientific methods to lead successful businesses which enriches the lives of everybody involved inside and outside the business.When running a business you have a committed purpose. You need to spend your time where it matters for yourself in order to lubricate your organization to deliver the best products to your audience. If you spend time on something that slows you down or misleads you it is precious time wasted. Unfortunately the global coffee roasting education tradition is a big patchwork with more focus on storytelling than scientific simplicity. In CoffeeMind we live and breathe scientific simplicity and the founder, Morten Münchow, has a masters degree in theory of science and more than 5 years of experience teaching research design and statistics at the University of Copenhagen. CoffeeMind's approach to coffee science and sensory science builds on this solid foundation of theory of science and research design in everything we do and we focus on simple and actionable models for skills improvement in product development and quality control.This podcast for our audience who sets aside the time to hang out with us to understand our scientific approach at a deeper level and who intuitively understands that spending this extra time on understanding methodology is rewarded by you making better decisions which make you a better servant for your audience with less time wasted on things that does not matter neither to you nor your audience. We will take you behind the scene on all of the why's and how's of our scientific projects and business practices so that you can implement our way of thinking in your own organization

All Episodes

Coffee Science for CoffeePreneurs by CoffeeMind

Coffee Science methodology Episode 13: Evidence Hierarchies - the healthy scientific dialogue and progression model

June 24, 2024 • Morten • Episode 22

0:00 | 1:15:49

Welcome to this episode of Coffee Science for CoffeePreneurs! In today's episode, we delve into an essential concept that has been overlooked in our previous discussions on the theory of science, research design, and statistics— the "Evidence Hierarchy." I’m Morten Münchow, and I’m excited to explore how this model can transform our education systems and collaborations with research institutions in the specialty coffee business.

Highlights
Introduction to Evidence Hierarchy: Understand the relationship between expert experience, observation, and scientific scrutiny. Discover how applying this model correctly can revolutionize our approach to coffee education and research.

World of Coffee Seminars:
If you're attending the World of Coffee event in Copenhagen in June 2024, don't miss our free seminars in Room 6, Hall B. Topics include "How to Start a Roastery," "Cupping Form Confusion," "Rate of Rise Irrelevance," and "Improving Sensory Skills." If you can't make it in person, tune in to our live stream or catch the recorded sessions on our YouTube channel.

Expert Opinions vs. Scientific Research:
Learn why the current reliance on expert opinions in coffee education is problematic and how transitioning to a more evidence-based approach can benefit the community.

Case Studies and Research:
We discuss our own research, including the controversial topic of organic acids in coffee tasting and the Rate of Rise theory in roasting. Discover the gaps between anecdotal evidence and scientific validation and why it’s crucial to address these issues.

Challenges in Coffee Science:
Explore the barriers to collaboration between the specialty coffee business and universities, and how bureaucracy and politics hinder scientific progress in our community.

Call for Collaboration:
We invite educators and researchers to join us in a critical, open dialogue about the state of scientific methodology in the global coffee education community. Let's push for progress and refine our knowledge and practices.

Show Notes and Links:

UC Davis article: Acids in coffee: A review of sensory measurements and meta-analysis of chemical composition
SCA’s article about UC Davis’ work: Acids in Coffee: A Review of Sensory Measurements and Meta-Analysis of Chemical Composition
CoffeeMind’s article: Acids in brewed coffees: Chemical composition and sensory threshold
Creativity and commerce: a shifting balance for specialty foods and beverages
Kat Melheim’s (roasterkat) Instagram post ”Same coffee roasted four (very similar) ways.”
Comparison of Chemical Compounds and Their Influence on The Taste of Coffee Depending on Green Beans Storage Conditions
“Quality does not sell itself”: Divergence between “objective” product quality and preference for coffee in naïve consumers. British Food Journal and also summarized by Morten at ReCo in Gothenburg in 2015 in this YouTube Video

By recognizing and applying the evidence hierarchy, we can better navigate the complex landscape of expert claims and scientific evidence, ultimately improving our educational systems and collaborative efforts in the coffee community. Join us in this insightful episode to become part of this transformative dialogue.

Evidence hierarchy

Welcome to this episode of Coffee Science for CoffeePreneurs, where we will dive into another subject from research design that was never mentioned in the previous series episodes, “the theory of science, research design, and statistics”. I’m Morten Münchow, and today we’re exploring an exciting topic is an extension of the Coffee Science methodology series you will find in previous episodes of this podcast, which is an exciting concept called “The evidence hierarchy”. This is a valuable model for understanding the relationship between expert experience, observation, and subsequent scientific scrutiny. If applied correctly, it will revolutionise our education systems and collaboration with research institutions in our speciality coffee business, which, to my big surprise, hardly exists as it seems bureaucracy and politics have captured our community. And as always, in CoffeeMind, we pride ourselves on not inventing anything. Our mission is to bring the best of what has already been developed over the last 2000 years in science to the specialty of coffee community.

To the listeners listening to this before World of Coffee in Copenhagen in June 2024, please make sure not to miss out on the free seminars we will do in Room 6 Hall B if you are there or tune in on the live stream of these seminars. To attend, please go to our website and pick either a free onsite ticket or a free virtual ticket where we will discuss in-depth topics like “How to start a roastery”, “Cupping form confusion”, “Rate of Rise irrelevance and what to do instead” as well as “The irrelevance of Organic Acids in coffee tastings” as well as “How to improve your sensory skills”. All topics where we have done actual scientific research that backs it up, so consider this podcast a behind-the-scenes research community as topics like the above are explored. We have recorded all seminars, so if you listen to this after WOC 2024, you can find the lectures on our YouTube channel. Go to our website and register for free, or find the seminars on YouTube, where you can get deeper insights into these topics.

Just a little caveat about what is to come. CoffeeMind is a small organisation with few individuals and no earmarked funding, organisational, or community responsibility constraints other than we decide ourselves, so we can do whatever we find most interesting and valuable for the community. On the other hand, we have no impact other than what we, by coincidence, succeed in getting through our educational and marketing campaigns, which are not very efficient nor dominating on an international scale. This makes it easy for us to do what we want and point fingers at methods and organisations that might not have the same freedom of thought as they are already committed to community and/or organisational funding. So, in many cases, the conservatism we criticise might be a prudent and pragmatic strategy to make only slow progress, not to tear apart relationships and organisational momentum just because somebody comes with an ideological point of view that might change the next day across the stakeholder network. So, rather than hearing what is to come as a universally valid personal criticism of specific people and organisations, look at it as an attempt to offer our voice into the mix of a global dialogue about the state of scientific methodology in the global coffee education community to support progress rather than stagnation. Suppose we are not direct and try to back it up with state-of-the-art methodology. In that case, progress is slower than if annoying people like us don’t just speak up and are ready for the justified pushback from the people and organisations we are constructively critical of. Part of the reason for making this podcast is that of all the controversial findings we have had and tried to communicate, we rarely get constructive subject matter criticism, but most often, we don’t get any feedback. Still, the silence and the few comments or feedback we have gotten are more on a personal level or simply wrong counterclaims seen from a research design perspective. I have reached out personally to all the people and organisations involved with these theories, which we are sceptical about, and offered our time and resources to clarify the subject in a scientific study. Still, until now, few have wanted to collaborate. Even if people don’t like us personally (maybe even for good reasons!), investing in clarifying relevant research areas should supersede that kind of personal dislike for the sake of the community. So, honestly, we miss the qualified dialogue that could bring the community to the next level, and there is a structure that fosters a certain kind of dynamics in the scientific tradition that we think should be the hallmark of this dialogue. This is called the Evidence Hierarchy. As you will see, this progressive critical approach is needed to refine the knowledge and practices in any scientific discipline and corresponding community. This system is designed to refine thoughts and claims in a dynamic, critical, but respectful dialogue in a community through the steps in the evidence hierarchy. After hearing this podcast episode, you can be part of this dialogue because you will develop a constructively critical way of thought that helps you navigate the jungle path of expert claims and scientific evidence.

Speciality coffee education has traditionally been rooted in expert opinion, formed over the last 50 years. As a scientist, it has struck me that it is often based on anecdotal evidence lacking critical scientific concepts and scientific examination. The problem is not only that most of these subjects have not been through expensive scientific experiments but rather that the fundamental concepts used in these anecdotal ‘theories’ are not aligned with scientific methodology in the first place, which you can hear more about in earlier episodes of this podcast where I talk about Ockham’s Razor, Positivism and Critical Rationalism and how these classical approaches to the theory of science can help us understand why the following really widespread training focus areas are directly in conflict with scientific methodology:

1) The SCA 2004 cupping form scoring ‘quality’ parameter

2) the claim that the Rate of Rise of the bean curve during roasting is a good reference point to optimise the flavour of a roast profile.

3) training and testing students of sensory skills training to identify individual organic acids in coffee.

For a claim to be tested, it must align with fundamental research design principles, which involves formulating it using the simplest possible concepts with the fewest unclear assumptions (Ockham’s Razor). This is crucial to make it completely clear what cause and effect are investigated, which is important to define later the independent and dependent parameters in the project or theory behind it, which is explicitly required by both Positivism and Critical Rationalism.

Considering how much the community talks about science and demands it, I think it is striking how slow the collaboration between the speciality coffee business and the universities is. Slow in making explicitly relevant research projects to investigate and clarify the concepts and methods used in education and the daily lives of coffee people. Having been part of establishing the research initiative in SCAE in 2013 (before the merger of the American and European organisations into SCA), it has been clear to me that the reason for the slow development is that it is driven by a blind spot in the setup with the establishment of a big organisation that can only keep going by getting big money from big companies. If this is how it operates, there is little incentive to scrutinise the education systems dealing with the practices and challenges of small companies and eliminate all the time-wasting concepts and practices that we, as trainers and students, spend our time with daily. Time is your most valuable non-renewable asset, and if we care for our community, we care for how we make each other spend our time. We at CoffeeMind feel that creating concepts or even leaving the community using time-wasting concepts is unethical as it weakens the small companies from a resource perspective and, hence, makes our community weaker. Of course, exciting research is coming out of the community and the established universities and organisations already, but I do think that the above blind spot is a barrier to better and more frequent relevant research for our community.

I might miss a point that focusing on big companies with big budgets and establishing a foundation will leave us better off in 20 years than the grassroots approach I have to scientific research. Over the last 16 years, we in CoffeeMind have completed more than 30 research projects fuelled by our spare time and pocket money, 15 of which have resulted in scientific publications. This might not be a sustainable model to scale globally across different organisations. So, I’m not trying to paint a picture of CoffeeMind as the only entity that is relevant to the community's research. Still, I am getting increasingly impatient with the slow progress of the propagation of the scientific method in our daily lives in the speciality coffee community. We do, however, feel a bit alone in our focus on clarifying the science behind the day-to-day problems that small companies face in their daily lives with product development and quality control on our mission to create the best education evidence for our courses when we teach small companies. I want it to happen now and wonder where all the collaborators in this mission are.

What does it mean to pursue the truth? What dynamic are we juggling by exploring what is the case and what is not? There is a quite specific way we can look at this. Something can be the case, or it can’t be the case, and we can think it is the case, and we can think it is not the case. The challenge is we can be wrong about both. We can think it is the case, and we can be both right or wrong, and we can think it is not the case, and we can also be right and wrong about that. What we are striving for is to be right about the things that are actually the case, and we want to be right about the things that are actually not the case. This way of fleshing out the model of thinking in statistics along with the mathematics to investigate this outcome space was introduced formally with Neyman’s paper published in 1933 with the title “On the Problem of the Most Efficient Tests of Statistical Hypotheses”. Formally the situation where you think something is the case and in reality it is not, is named a Type 1 error, and the situation where you think something is not the case and you are wrong is called at Type 2 error, and Neyman systematised the mathematics behind exploring the probability calculations of either Type 1 and Type 2 error.

Most people recently have COVID-19 in mind as a great medical example. You can either have it or not. But you can think you have it and be both right and wrong, and you can also think you don’t have it and be. Here, a Type 1 error is a false positive, and a Type 2 error is a false negative test.

Anyone who has had kids knows that there can either be an actual need to change the nappy or no need. However, you can also think there is something only to find that it was volatiles, which is a Type 1 error, or you can miss that the nappy needs changing, which is a Type 2 error.

In the legal system, somebody can be guilty, and it should be straightforward to just ask the person if this is the case or not. But since people are not always truthful, we spend enormous amounts of time and money to question people, but even despite all this effort, sometimes innocent people are prosecuted (Type 1 error) or guilty people go free (Type 2 error)

Either ghosts exist or they don’t, and if they do and we don’t acknowledge it, we are close-minded and guilty of a Type 2 error. If they don’t exist, and we think they do, we are superstitious. But of course, there can also be the case that they don’t, and we don’t believe it, in which case we are right, or we can think they exist and do, in which case we are also right.

There is no daily life example without exploring marriage: If your partner is faithful but you don’t think it is the case, you are a jealous partner and make a Type 1 error. However, if you don’t suspect anything and your partner is not faithful, you can miss that an affair is going on, in which case you make a Type 2 error.

An excellent example of how I think relevant research for educators and small businesses is not reaching the community is our scientific publication published in March 2023 about the questionable procedure of teaching and testing students about individual organic acids in coffee tasting, which is done by the most prominent education organisations we have. Now, more than a year later, the only reaction we got was some angry comments on our Instagram, claiming that students passing the exam proved the relevance of training and testing sensory skills in identifying organic acids. I can’t think of a better definition of biased data to claim that students passing a test as a scientific proof! Unless all the science behind the test is clearly stated and publicly documented, and despite our making an enormous effort to gather scientific projects on this for 10 years, it is now clear to us that nothing was made at all on this subject before we published our work. I proposed to present our work at the SCA seminars at World of Coffee in Athens last year, but it was not selected for the presentation series, and the promoter of the idea and seller of the organic acids, Joseph Rivera from Coffee Chemistry, says this in a blog post at Barista Hustle in April 2023: “I think the study was well done and comprehensive,” he says. “However, the biggest issue is that neither in the Q course nor in my course [the Coffee Science Certificate (CSC)] do we grade students on their ability to identify each individual acid.”. “There is a blank line [in the CQI and CSC tests] where students can write in their guess on acid, but it’s not part of the final grading,” he adds. “For me as an instructor, I find it is more important that students understand the underlying chemistry than grade their sensorial abilities.”

It was a bit surprising then to find the article re-printed in the Book of Roast Volume 2 from May 2024, which clearly says in the introduction in the article “Using Organic Acids as a Training Tool (originally published in Roast Magazine July 2020)“…without the proper training and understanding of science, a cupper will not be able to understand taste chemistry. In this column, we will explore how to use organic acids as an additional tool to further your knowledge and fine-tune your tasting skills.

In the article, the background for doing this is explained: “I used years of prior research to develop the module on organic acids and chemistry of coffee. This module opened a discussion and never-ending exploration of how these seemingly simple molecules play a significant role in modulating flavour and providing a foundation for more advanced sensory training”. The article then explains the difference between discriminative and descriptive sensory analysis as if this methodology is behind the research. But no sensory test data are shown. Significance can have two meanings: The first meaning is practical in the sense that it would be an obvious sensory fact for anybody tasting it. The second meaning is statistical, where you can find statistically significant differences in a group of cuppers even if the individual cupper was not too sure about perceiving a difference. According to our research, organic acids carry none of these possible significant differences. So, even if it is only statistical significance in a group of people tasting, I would like to see the data!

For around 10 years, we have been searching through scientific and publicly available information on organic acids in coffee, but we’ve found no supporting evidence for the claims made. When we requested references to the purported ‘years of research,’ we learned that this research is kept private because it was made for a private company, which is the company that is now the CQI, as far as we understood. Scientific research should inherently be public, especially when it forms the foundation of major sensory education systems worldwide. Why keep it a secret? If I had conducted research that I profited from—both by training students and selling the required materials—I would be eager and proud to share the underlying research. This secrecy is baffling. As of this morning, I spoke with my childhood friend Nicolai, who now lives in Dubai. He noted that workers from less affluent backgrounds, such as Filipinos, often save for years to attend these courses and particularly fear the organic acid test, known for its high failure rate. No, I can’t shut up about this! Research is public, education is public, and I’m just citing what is already public in magazines, articles and blog posts. If there is anything there is no space for in sciences, it is secrets and conflicting public claims, so this is what we are trying to bring into the light here.

Nobody from SCA or CQI has reached out to challenge us on this claim, showing data that goes against our findings and asking us to present it in any of their communication channels, so I guess nobody is busy working on implementing this in any education system. If we are right, this work could save many hours, flight tickets, and hotel expenses for retaking failed exams. If we are right. I don’t care what is right or wrong as we would have benefitted as much if we had found that some of the acids are above sensory threshold and shows a difference in concentration so they could be used as sensory identifiers. We whould have been the first to document this. The way we setup the test we could not protect ourselves against if the panelists could detect the acids. We only wanted to know the truth. We wanted to provide evidence for whether it seems like the current trends in education is making a Type 1 error or if we with our scepticism where making a type 2 error! With our setup we could test exactly this (both Type 1 and Type 2 errors!) and with our data it strongly indicated that the current education systems are doing Type 1 errors (thinking there is something there even though there is not!). But where is the dialogue? The public truth seeking open and critical dialogue. And more importantly. Where are the data?!! The public data that is! Please meet us with criticism and data, not silence and denial. We, as educators and scientists, can’t leave the global community with such contradicting claims as the quotes I just gave from Roast Magazine, Barista Huslte and CoffeeMind’s scientific paper.

If you want to get insight into the details of the evidence (or lack thereof), please attend the seminars at World of Coffee in Copenhagen or find them on YouTube to witness what it looks like to establish evidence for a scientific claim—or lack thereof.

Anyway. Let’s get into the helicopter view to make this less of a personal battle between opinions and people with these opinions and look at what our 2000-year-long scientific tradition offers us regarding models for how a constructive dialogue and clarification process should look like. It seems like, for us, the most crucial step is to look at the dynamics between Expert experience and observations on the one hand and Scientific Research processes on the other. Where does this sit, and what are the positive, progressive opportunities for clarification and growth in our understanding and, thereby, improvement of our education systems worldwide?

The dynamics of scientific investigation is the evidence-based dynamic process of investigating whether a hypothesis can be supported to be neither a Type 1 or Type 2 error, so we believe that what we know is the case and is not the case. Since reality and subject areas are complex, we can’t just guess or reason to get to the truth, but we have to compile a lot of evidence in the area and think of ways of gathering new evidence in areas not clarified yet. This is an expensive and time-consuming process, and the best platform for investigating efficiently is a community of active, curious, critical, and open people who collaborate to seek the truth. Being as efficient as possible with resources and time, it is prudent to make sure to avail ourselves of the best practices in each area, which is the specific methods developed in each branch of science. Often, people focus on the object of science (physics has atoms and forces, chemistry has molecules, and so on), but the real value of science is the methods used to investigate the object of science, so we should focus on the method of science rather than the objects as it is the methods that help us navigate the jungle of Type 1 and Type 2 errors in our exploration of reality. One of the aspects we think is a bit annoying with the new SCA Coffee Value Assessment system is that it seems like they are a bit more eager to create a new method that they can call their own rather than just make the already proven and available scientific methods know and used in the community. Sensory science was born in the 1970’ies and microeconomics (where you distinguish between a generic product that is the physical commodity and a differentiated product where the story or extra appreciated features create value on top of the commodity value) was formulated in the 1930’eis yet when SCA promote the CVA system they talk about it as they are bringing ‘recent advances in sensory science and economic’ to the community and only if you count Jesus as ancient these advances in sensory science and economics can be called recent. It seems like they are trying to sell us a new over-complicated system, and I think it would be more prudent to use the already available scientific sensory rapid methods (such as napping and simple consumer tests) that are meant to thrive in non-scientific communities. Why all the paper shuffling where they try to control centrally what overcomplicated routines to use in our daily lives? I think they should spend a bit more time with Occham’s Razor before releasing anything to the public and perhaps work a bit more with scientists in the process. We need the broadest battery of tests and investigation models, and here, it would be nice to have an overview of what fundamental research designs are available and what the advantages and disadvantages of the different possibilities are. This model exists, and I hereby give you:

The Evidence Hierarchy

The evidence hierarchy is a model used in research design to understand and rank the different types of research designs we know of and put them into context so that we know the value of each and how the different types of research designs can collaborate and create the best evidence base for a subject matter area, such as speciality coffee. Each step has its advantages and disadvantages, typical weaknesses and/or errors, so the overall hierarchy and how we as a community tap into the different steps at different times is essential. Knowing about this is the first step, so when you listen to this, you can be happy that you are among the first movers in the speciality coffee business to know what an evidence hierarchy is 😊

The Evidence Hierarchy consist of the following steps (there are different versions out there, and below is a version that is adapted to the relevance of coffee science)

1. Expert Experience, Observations and Opinions

2. Case Reports / White papers / Investigative Blog Post

3. Cross-sectional studies (including Surveys)

4. Case-control studies

5. Cohort Studies (Retrospective and Prospective Observational Studies)

6. Randomized Controlled Trials (RCTs)

7. Systematic Reviews and Meta-Analyses

This sounds a bit overwhelming, perhaps, but let me explain each step. You will see that each step serves a purpose that you can clearly understand and recognise from what you have already heard and experienced regarding community claims and scientific findings.

Expert Experience, Observations and Opinions

Expert experience is based on the experiences and insights of experts in the field. It provides valuable insights into what could be an exciting new area of investigation. Still, it is subjective and prone to bias and, therefore, would benefit from being investigated by independent scientists who are not interested in a certain outcome but want to investigate it deeper regardless of the outcome.

Typical Errors or Risks: Over-reliance on personal experience or bias (you are more interested in one outcome than another because you have made a business out of a certain claim that only survives if the experiment turns out in one way and not another), leading to anecdotal or un-generalisable conclusions.

It is essential to notice that this is an extremely valuable step in the hierarchy, as scientists need these observations to have anything to investigate at all! If a coffee expert goes through the hassle of writing up these observations, this coffee expert takes the observation to the next level, which is:

Case Reports / White Papers / Investigative Blog Post

Detailed presentations of individual cases or a series of instances are often backed up with collected data.

They can help identify new or rare phenomena without investing in expensive scientific equipment or heavy statistical data analysis, which, of course, limits their generalizability and does not provide the basis for hard conclusions.

But it is important to mention that this kind of project, despite not being ideally collected nor analysed based on state-of-the-art research design and statistics, provides a good and valuable insight into what could be the case and how a more comprehensive analysis should focus and be designed. Hence, it is a big help and qualification of the subject for later deeper scrutiny of the subject by scientists for whom this is a big help in both identification of the subject but also deeply inspirational on how the final research design should be designed as well as what it would be worth focusing on in terms of types of samples and treatment of samples.

Advantages: Cheap and quick in execution and deeply inspirational for scientists as coffee experts are out there observing what scientists would never observe nor have an intuition about.

Typical Error: Mistaking a rare or unique case for a general phenomenon or error in the fundamental research design (not applying the everything else equal principle. Missing a confounding factor. Mistaking a random sample variation for an actual difference and so on – if you haven’t already, listen to the first CoffeeMind’s list of features of a good scientific theory in our podcast series that goes into the specifics of all this.

Cross-sectional studies (including Surveys)

Observational studies that analyse already existing data from a population at a specific point in time make this design fall into the category of ‘retrospective’ research design. As it is retrospective, it is not an experiment and can only establish correlation and not causation.

Advantages: It is suitable for hypothesis generation and is easy to execute because you don’t need to run a physical experiment with equipment and people. It is different from the first two steps in the hierarchy in that this level involves state-of-the-art data analysis in terms of both univariate and multivariate statistics, which makes it more expensive than a case study but less expensive than a prospective experiment that is covered later in the evidence hierarchy.

Typical Error: Confusing correlation with causation.

Case-Control Studies

Where a Cross-sectional study looks broadly at data and tries to establish interesting correlations, observational studies focus on comparing groups with a specific focus. More precisely, it attempts to group already existing data into groups with a specific condition to those without. It is suitable for studying rare events and can suggest associations by exploring possible causes if an experiment cannot be set up. In a case-control setup, you pick the outcomes and explore conditions backwards, whereas in an experiment, you choose conditions and monitor outcomes.

Typical Error: Selection and recall biases limit the ability to infer causation which limits the type of conclusion to correlations and not causation.

Cohort Studies (Prospective Observational Studies)

It follows a group over time to see how specific factors affect outcomes. It is more robust and more reliable than a case-control study for showing associations, but it is also more expensive in terms of equipment and people's time.

The disadvantage of this research design is that it is still vulnerable to confounding variables and does not allow for a causation conclusion but only correlation as you can’t select samples (or in medicine people) for specific treatments as you only observe conditions and outcomes. Only on the next step in the hierarchy will you make the critical shift from correlation to causation claims, which can only be made if you can freely choose treatment per sample (or people in medicine). As you have heard thus far, there is a difference between retrospective (you collect data about events that have already happened – which is cheaper) and prospective (you plan to collect data about observations that have not happened yet – which is more expensive), but this is not the link in the hierarchy that makes the difference if you are allowed to make causation claims rather than only correlation claims. What makes all the difference, which allows you to make causation claims, is that you can freely choose the treatment per sample (person in medicine) and eliminate possible confounding factors in the sample’s (or person in medicine) environment. An excellent example from medicine that shows us how fortunate we are in coffee science is if you want to investigate if there is a relationship between lung cancer and smoking. When dealing with medical conditions, you deal with people; when you deal with people, there is an apparent ethical constraint to what you can do. Ideally, you want to randomly allocate each participant to either the smoking group or the non-smoking group to make sure no confounder is influencing either the tendency to smoke or the tendency to develop lung cancer. However, the ethical problem is that if you insist on allocating people randomly to the two groups, you can’t avoid allocating non-smokers to the group who are forced to take up smoking. Still, you would face a lot of problems with the smokers you randomly allocate to the non-smoking group (how do you guarantee compliance here? So generally, in medical and financial research, where you sometimes/often can’t establish a prospective randomised study, you are often limited to research design from this step in the hierarchy and below. Luckily, in food science, we have no ethical constraints when it comes to the random allocation of samples to different treatment groups (even though sometimes it hurts to roast some really lovely beans very dark!), so we can freely go to the next level in the hierarchy where we can apply state of the art random allocation which is the condition for applying the everything else equal principle since if the allocation of beans (or people in medicine) is random there can be no confounding factors transferred to the different treatment groups and therefore in each group all experience the same in their prospective environment except the deliberate experimental factors specific for the research project. This allows you to make causation claims and not correlation claims.

It is essential to notice that when it is often the case for medical or business science that they can’t do prospective randomised research, they compensate by being EXTREMELY SKILLED in analysing observational data. This became apparent to me when I had a few months at Copenhagen Business School at the Department of Innovation and Organizational Economics, where I might expect them to be strong in management theory only to realise that they are primarily statisticians who can do the most advanced statistical analysis and handle massive datasets in statistical software. To further strengthen their research, they are exceptionally skilled in reading literature as when doing financial research, you can’t just know the basics of atoms, chemistry, and tissue as your basis, but you must know the history of publications in social studies associated with your field of investigation. CoffeeMind’s latest research paper on the dynamics of coffee roastery business models was designed and conducted with Kristina Vaarst Andersen, whom I met at Copenhagen Business School. She is one of the most skilled researchers I have ever met because she handles hard-core statistics in the field of a vast social science literature landscape. Suppose you have dealt with the complexity of that kind of research. In that case, you are relieved when you return to coffee science, which feels a bit less complex if you know your introductory chemistry, physics, and research design; you don’t have to read hundreds of articles and understand the ideas of many previous researchers in the social dynamics of investigation. So again, the hierarchy does not necessarily mean that one step in the hierarchy is better than another, nor that people at one level are better than people dealing with another step. Each step has unique value on its own, just as each step has limitations and risks.

Randomised Controlled Trials (RCTs)

Participants (in medicine) or food substances (coffee beans!) are randomly assigned to receive different treatments. Therefore, this is the Gold Standard for establishing causation and reducing bias and confounding factors. So here you take some coffee (or in medicine, some people) and split them into groups by random and treat the groups differently. You could investigate the flavour modulation of different processing methods, different fermentation times, different roast degrees or time aspects of the roasting process, or different extraction percentages – you get my point. As mentioned earlier, in coffee science, we are lucky as we don’t have any ethical barriers like they do in medicine. There are no moral considerations between choosing a single bean if it will be going through a Natural or a Washed process, whereas in, as already mentioned, a medical research project looking into the health benefits of smoking, you can’t randomly allocate people to either a smoking group or a non-smoking group as you would face the moral challenge of forcing a non-smoker to smoke in each event a non-smoker is randomly allocated to smoke in the experiment period. Tobacco research is therefore limited to research designs ‘lower’ in the evidence hierarchy where only correlation and not causation claims can be made. Something the tobacco industry was not late to point out every time a doctor talked about the possible health disadvantages of tobacco. If you can’t separate and randomise the research object (coffee beans or people) into different groups, remove them from the conditions they used to occupy. In that case, you can’t know the actual cause of the difference in outcome. There can be endless confounding factors other than smoking that cause lung cancer – socioeconomic factors, the type of house they live in, the diet, or it could even plausibly be argued that a genetic variation that simultaneously codes for a strong urge to smoke and lung cancer so that smoking as such is not the cause of developing lung cancer but something that co-appears as an urge simultaneously with lung cancer caused by the defective gene. In this scenario, smoking habits and lung cancer co-appear if you look at retrospective data and even prospective observational studies, and in that kind of research design, it is impossible to establish causation, yet only correlation.

A lot of economics research is faced with the same challenge since economics deals with retrospective or prospective observational studies where randomisation and the establishment of different treatment groups, like in an experiment, are logistically impossible. You can indeed conduct actual experiments in economics, such as we did in the behavioural economic study on the quality perception published in the paper “Quality does not sell itself” in the British Food Journal

The fact that we have the liberty to do actual randomised experiments where we can establish real and solid ‘everything-else-equal’ situations for each experimental group in the coffee business makes irrelevant the reservations many people have against research since they have felt led down many times by research findings in medical and economic studies where randomisation, therefore the everything else equal principle can’t be applied and hence is left at the level of correlation conclusion and not causation. This is not the situation we are in, and we should celebrate this by not being reluctant to do actual causation experiments in the coffee community!

Typical Error: Any conclusions in a controlled trial are strictly limited to the experimental design setup, which beans were chosen and how they were treated. Often, journalists report a wrong conclusion of a study because they conclude outside the scope of the experiment, which is strictly set by the inclusion and exclusion criteria and the research design of the project.

A good example is the meta-analysis by researchers at UC Davis in 2021 titled “Acids in coffee: A review of sensory measurements and meta-analysis of chemical composition”. SCA made a public review article of the scientific work (Link in show notes) in which it says in the first paragraph, “Knowing that acids are arguably one of the most important components in coffee, it was essential to collect any and all information out there about acids in coffee.” as if it is a fact, that knowing about different acids in coffee is essential from a sensory perspective. When reading the scientific paper, it appears that none of the coffees were brewed, as only chemical concentrations in green coffee and roasted beans were measured. Furthermore, the article focuses on comparing arabica and Robusta beans. There is hardly anything comparing acids in arabica beans. This work is done by the Specialty Coffee Association and reported on their website. Since the Specialty Coffee Association’s members spend most of their time on brewed arabica, non-brewed Robusta is hardly relevant. One would think that it is worth mentioning that this work does not support at all the relevance of training and testing students on Formic, Acetic, Lactic, Glycolic, Malic, Citric, Tartaric, Isovaleric and Phosphoric acids as is done by the major training systems and implicitly made relevant in the SCA Flavour Wheel. But sticking to the point here, this is not something you can blame the meta-analysis. In a meta-analysis, you are analysing existing data amongst existing published papers, and you have clear inclusion and exclusion criteria that will determine the set of papers included in the meta-analyses. Suppose none of the papers has been looking into comparing different arabica beans and scrutinising concentration differences and corresponding sensory properties. In that case, this is not the fault of the meta-analysis, as the methodology behind any meta-analysis is completely clear and transparent. So, my point is that there is confusion in the ‘translation’ from the scientific article itself to the popularised version on SCA’s website that misleads the conclusion and ultimately leaves out the point that nothing is said about acids in Arabica, and nothing is said about the relevance. The relevance is assumed with no evidence to back it up, and you could wonder why there is not at least a critical question to this relevance in either the introduction nor the summary of the scientific article or in the popularised version on SCA’s website, which then does not give a relevant and clear picture of how this meta-analysis does NOT support the major education systems around the world who have been wasting peoples time for almost two decades by now. This is not a fault in the excellent meta-analysis that is probably executed to the highest scientific standards. Still, SCA should be a bit better at reflecting this into the relevance and possible wrong conclusion that could appear if the scope of the research is not scrutinised and relevant reservations on the relevance or not to the education systems.

Systematic Reviews and Meta-Analyses

Nature: Synthesize data from multiple studies, increasing the overall sample size and power which can provide a high level of evidence and provide the highest level of evidence in a subject area, which is good for summarising research and guiding practice in a subject area.

Typical Error: This can be limited by the quality of the included studies or by the different types of data collection and research design between studies, which can be wrongly merged (garbage in, garbage out principle).

From Expert Opinion to Research Questions

As is evident from what we have talked about this far, expert opinion is a crucial starting point. It sparks the well-founded initial questions and hypotheses that guide scientific inquiry. However, we must subject these ideas to rigorous testing and refinement for our coffee education to evolve, and the first step is to let the claims be tested in impartial, blinded, actual experiments. Remember, we can freely do this in coffee as we don’t have any moral barriers to an authentic randomised research design so that we can draw actual causal conclusions, not just correlation conclusions, as in some medicine and most economic research. As per my experience, there are a few barriers here. The first is often a combination of time, money, and competencies to have experienced scientists participate in the research. But even if this could be found, some educators have come up with some claims that they base their business on, which makes them reluctant to have it tested. I have offered to investigate several claims for free, and few are on board.

The reason I recalled the Evidence hierarchy some months ago and realised how great a model it is to have a critical but positive and progressive dialogue about findings was when I saw an Instagram post by Kat Melheim (aka roasterkat), who explored the sensory variation between four different (but very similar) roast profiles (link in show notes). I was excited to see that kind of content and appreciate all the work done to share insights that could be the basis of further scrutiny. When commenting on her Instagram post, I wanted to show my excitement and respect for the work. Still, with my scientific background, I could not just comment without also commenting about the reservations and necessity for future studies to get to the next level. The feeling of being the person who criticises such findings for not being scientific, combined with my urge to show my excitement, I had a feeling that I could formulate my excitement better and also leave a space for my thoughts on the next steps without sounding like an old grumpy scientist. It struck me that The Evidence Hierarchy that I learned about when teaching medical statistics many years ago at the university is precisely the model we can use for coffee professionals to freely publish their observations to the benefit of the community and pride themselves on contributing to the essential and qualified second step in the Evidence Hierarchy namely the level of “Case Reports / White papers / Investigative Blog Post”. Expert observations are important and valuable as you can’t just come up with something exciting or qualified without much experience and hard work. If the work is not starting here and getting tracked by being published and discussed in the community, how would we ever get the funding to make an experiment or some of the more expensive steps of the Evidence Hierarchy?

Also, I have reached out to several educators with exotic theories that I disagree with, and leaving the possibility open that I’m wrong, I have suggested investing in the investigation of the claims using a blinded and randomised research design where only nature can provide the conclusion. Few are willing even to collaborate, which makes me – even if I’m actually wrong – more suspicious that the claims are biased and/or confounded and, therefore, might be a waste of everybody's time. The only way we as an educator community can serve the busy and hard-working speciality coffee community is to let our theories and claims be scrutinised at different levels of the Evidence Hierarchy, giving us all more insights into what is the case and what is just a good story.

As is also evident from the exploration of each step in the Evidence Hierarchy in this podcast episode, no one study is comprehensive enough or necessarily bulletproof, so each article should be taken with a grain of salt, and the scope (inclusion and inclusion criteria for sample selection), as well as research design and statistical data, could very well be more limited than you would think from just reading the title and the abstract - or even worse reading about the scientific article in a magazine or online review article where journalists have failed to stay inside the scope of what is said and what is not said in the original work. Sometimes, journalists are spot on and really convey it well, and sometimes, they don’t at all. This is where it would be great if you went to the source itself and made up your mind. You could try grabbing the UC Davis meta-analysis and see how excellent it is executed from a scientific point of view (link in show notes to full article), but also keep in mind if it informs you at all whether it is relevant to train and test students in differentiation and identifying Formic, Acetic, Lactic, Glycolic, Malic, Citric, Tartaric, Isovaleric and Phosphoric acids in an exam situation. This way, you will get my point first hand.

The point that no research design is perfect nor necessarily correctly conducted (as I have indicated in each step with the risks and limitations of each step in the hierarchy) is not a good argument for not trying to get claims used in our education systems op the steps in the hierarchy. For claims to be well established the data behind them needs to be public and made in setups where personal incentives can’t affect the outcome (blinded and randomised settings conducted by a researcher who has no incentive in a particular outcome) and preferably scrutinised at different steps in the Hierarchy so that more aspects of the claim has been shed to light. Research needs to be public, and the repetition of the outcome in a different laboratory is a good thing, as this shows that the result of the first research project was not just a coincidence or a confounding error in the setup. Still, indeed, an objective truth in the sense that the phenomena investigated will show itself regardless of who executes the experiment. When we did the first organic acids project at the University of Copenhagen, we found that the low-acidity Brazilian coffee had double the amount of the dominant citric acid compared to the highly acidic Kenyan coffee, which made me dismiss the result as a label error the student did. I felt that we did not find anything, and nothing was published. After talking to Same Smrke from ZHAW, who had seen similar data, we started to suspect that the student did not make a label error, and we found the courage to invest more time and money in another investigation. We did this at another university using different equipment, and this time, it was conducted by scientists and not students from the University of Southern Denmark. When we found the same data with a different group of people on different equipment, we trusted that the first project was neither a coincidence, a confounding factor, nor a label error. Can you see how even randomised controlled trials cannot necessarily be trusted in themselves for many reasons, which is why a few projects looking at the same or similar objects with identical or similar questions and identical or different research designs and preferably by different groups of researchers on different universities are needed before certainty in a subject starts to dawn in the community? If a claim is stuck on the expert experience level, it is not scrutinised, and there is a high risk if we base too much of our education systems on claims stuck at this potentially highly biased level.

I have specified each weakness and risk per step in the Evidence Hierarchy, but there is a risk that permeates all levels, which is fundamental to the peer review system and is worth mentioning as a last thing. In scientific publications, a system is in place to increase the probability that the published work is at the highest international level. The way it works is that peers are chosen based on being the experts in the field based on the history of publications and the impact of their published work in general. As a publishing scientist (14 papers and counting), I’m often invited as a reviewer on scientific papers about coffee roasting and sensory evaluation, which is great fun. I’m also on the other end of that stick when I’m publishing research with my research buddies. As the peer review process is the last stand for keeping the quality of the published research at the highest level, peers are expected to be as critical as humanly possible. When I get a review, I often feel inadequate and annoyed because I think they are overly critical and don’t see the value of the research. But never have I been part of such a process without the fierce critique by peers has improved the quality of the research and the paper manifold, so it is an extremely valuable process for the quality that goes through. I have even had work plainly rejected, to my immense disappointment and annoyance, but reluctantly agreed and went back to the drawing board and re-made the project or added more data and ended up appreciating the rejection because it made the next version of the research and paper even better. However, this process is not without possible errors as sometimes multidisciplinary projects have blind spots for peers, which often happens in research projects that combine chemical and sensory data. As sensory science is still not understood broadly in the scientific community and least of all by chemists, those projects end up in papers with peers with a chemistry background who do not take sensory science seriously nor understand the fundamental research designs in sensory science, which makes them accept the 2004 SCA cupping protocol to be a protocol that can be used in scientific projects. I think enough is said about that protocol and the complete lack of appropriateness from a scientific point of view (please listen to the earlier podcast episode on this subject and get a deep understanding of why we would say this), and it goes without stating that if I as a reviewer gets a paper that investigates differently physical and/or chemical aspects of coffee and uses the 2004 SCA protocol as a measurement of sensory data, I tell them to take out the sensory data and see if they still have a story to publish and if they don’t there is nothing to publish. I feel a strong responsibility for the community because if it is published in a respected peer-reviewed paper, the community will be even more confused, run with wrong conclusions and waste even more precious time. I have seen several scientific papers using the 2004 SCA cupping protocol and think none should have been published. Recently (2022), a paper was published on the influence of Jude bags vs Grainpro on the sensory properties of coffee. Even though the majority of the article is about the chemical aspects of the coffees, the sensory data is published as if the ‘quality data’ should be taken seriously, and in the months following the publications, I heard people referencing it as the latest news about how different materials affected the quality of the coffee. This is an example of the many cases where the peer review process fails the community. So, as you see, there is no safe space here. We need to understand all the claims, get an idea of the quality of each study, and see how many different studies there have been on the subject. Reading the article yourself and keeping my 8 points of a good theory in mind (listen to the first podcast episode on the theory of science), as well as scepticism of flawed sensory methodology (2004 SCA protocol), is a good starting point. The next step would be to look for the reference list of the research paper to get an idea of how much research there has been on this field already and of how well the Evidence Hierarchy is populated with this research subject. If there is hardly anything, there is little evidence that this has been scrutinised by the public global research community, and perhaps the subject is stuck at the expert experience level, which is a good start but not a reason to include it in global education systems.

I think SCA is too slow with progress in this area and that much of the research done is done with a financial bias towards the bigger industrial companies who can afford to invest in the research, which is a good case of what is called ‘publication bias’ and unfortunately this construction in research funding leaves the education systems to the small businesses worldwide in the blind spot of this construction and for that reason progress it is really slow.

- Expert Experience, Observations, and Opinions

Insights from experienced professionals offer valuable starting points for research but can be subjective and biased. They need further scientific testing to validate but is a great fast and inexpensive starting point often based on extensive expert experience.

- Case Reports / White Papers / Investigative Blog Posts

Detailed accounts of specific cases or findings can highlight new areas of interest. These provide initial data but lack generalizability and rigorous statistical analysis.

- Cross-Sectional Studies (including Surveys)

Analyse data at a single point in time from a large group to identify correlations. Useful for generating hypotheses but can't establish cause and effect.

- Case-Control Studies

Compare groups with specific conditions to those without to find potential causes. These studies can suggest associations and correlation but can't prove causation due to selection biases.

- Cohort Studies (Retrospective and Prospective Observational Studies)

Follow groups over time to study how different factors influence outcomes. These studies are more reliable than case-control but still limited to showing correlations, not causation.

- Randomized Controlled Trials (RCTs)

Participants are randomly assigned to different treatments to establish causation by eliminating confounding factors. Considered the gold standard for determining cause and effect and isolating causal factors for an outcome which is necessary to understand a system scientifically at the highest level.

- Systematic Reviews and Meta-Analyses

Combining data from multiple studies provides a high level of evidence. These methods increase reliability but depend on the quality of the included studies.

In this episode, we explored the evidence hierarchy, a crucial model for understanding how different levels of research provide varying degrees of reliability and validity. Starting from expert opinions to systematic reviews, each step in the hierarchy helps us refine our knowledge and practices in speciality coffee. By recognising and applying this hierarchy, we can better navigate the complex landscape of expert claims and scientific evidence, ultimately improving our educational systems and collaborative efforts in the coffee community. But if we are not open, curious and critical at the same time, we get nowhere, and I must admit that I have been really disappointed about the lack of these scientific virtues in a lot of the organisations and influencers in the coffee community which is why I feel the need for fleshing out the subject matter in this episode.

My personal motivation and passion are related to the worry and annoyance that it seems like our community is not in an ideal situation to know what is the case and what is not the case, and there are too many Type 1 errors flooding our education system because people have been more engaged in making good stories for marketing purposes than making sure, that what they said was actually correct. This is where I feel our education systems and associated research institutions have a separate responsibility, but having been part of the formation of SCAE’s research initiative in 2013, been in the research committee, education committee and curriculum creators group before and during the merging of SCAE and SCAA I have been shocked by how little space the truth is given in processes where organisations, universities and influencers where people are happy with Type 1 errors as long as it does not hinder their income or power ambitions. I, too, am selling products and striving for influence, but I can’t see why I would have to bend reality in that ambition as I think science is a great way for companies (like my own) and communities to not waste their time. I think we have enough challenges and opportunities even without bending reality and honestly do what we can to neither make Type 1 nor Type 2 errors in that pursuit. I think too many are doing their marketing as if they are speed dating for a quickie. I think we as a community should market and sell to each other more like a long-term relationship where we must be curious about our own Type 1 and Type 2 errors to stand as firm as possible on common ground where bias would always be a stabilising factor making the relationship and our community weaker. Doing this open, honest, critical, but progressive communication is best, and this would naturally harvest interesting observations from the expert level over time; collaboration with the scientific community would populate the evidence hierarchy to give us the best evidence in the area. But we need to get started and get all the claims into the light so that we can start to take away bias caused by Type 1 and Type 2 errors and then disseminate this to the education systems so as not to waste the time of people entering our community. I’m impatient as I feel we are far away from being here, and I don’t see the big organisations working on this at the speed I think is possible.

To come back to the areas we are critical about mentioned at the beginning of the podcast episode, we feel that we need to clarify if it is relevant or not to even teach, train, or test students in the identification of organic acids (which we think all evidence – including our own) points towards it is not! The Rate of Rise theory needs to be formulated differently as right now it is formulated more like a correlation observation by exports, and in order to investigate this further, it needs to be formulated differently so that it is clear what the independent and dependent variables before the theory can enter the higher levels of experimental designs rather than the anecdotal correlative observation style it is formulated as now. The SCA 2004 quality-oriented cupping needs to just go away, and the new cupping form should be much simpler and clearer. If you are not totally sure where you are going and how the probability of getting there is low, we can’t have overcomplicated methods promoted by our own institutions. All of this would be exposed and corrected naturally and automatically if only we as a community had a habit and processes in place to purify our claims through a well-functioning collaboration between companies and research institutions through the evidence hierarchy.

If you found this episode interesting, you can attend a free seminar at World of Coffee in Copenhagen or find the recorded seminars on YouTube, where we will dig deeper into the actual evidence of several topics where you can see the thoughts in this podcast episode in action.

UC Davis article: Acids in coffee: A review of sensory measurements and meta-analysis of chemical composition: https://doi.org/10.1080/10408398.2021.1957767

SCA’s article about UC Davis’ work: Acids in Coffee: A Review of Sensory Measurements and Meta-Analysis of Chemical Composition https://sca.coffee/sca-news/2021/10/19/acids-in-coffee-a-review-of-sensory-measurements-and-meta-analysis-of-chemical-composition

CoffeeMind’s artickle: Acids in brewed coffees: Chemical composition and sensory threshold: https://doi.org/10.1016/j.crfs.2023.100485

Creativity and commerce: a shifting balance for specialty foods and beverages: https://doi.org/10.1108/jbs-02-2023-0022

Kat Melheim’s (roasterkat) Instagram post ”Same coffee roasted four (very similar) ways.”

https://www.instagram.com/p/CvKWhKwsNi0/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==

Comparison of Chemical Compounds and Their Influence on The Taste of Coffee Depending on Green Beans Storage Conditions: https://www.researchsquare.com/article/rs-547987/v1

“Quality does not sell itself”: Divergence between “objective” product quality and preference for coffee in naïve consumers. British Food Journal. https://doi.org/10.1108/bfj-03-2016-0127 and also summarized by Morten at ReCo in Gothenburg in 2015 in this YouTube Video

Deleted passages

As a teacher of coffee roasting, the constant declining rate of rise theory is an excellent example of an unclear theory in that it claims a correlative (the causation aspect of this is completely lacking) relationship between the shape of the rate of rise curve and the ‘optimum’ flavour. If you want to go deeper into this, you can check out our blog post on this subject or read our article in Roast Magazine about it. The short version is that the Rate of Rise is a derived value of the second causal link in the causal chain of a coffee roaster (no 1 being the heat source setting, number 2 is the conductor of heat between the heat source and the beans are then the 3rd link in the chain) and this is a highly fluffy and assumption burdened concept to use as the cause/input/independent parameter in a research design and ‘quality’ of a roast is a really misleading effect/output/dependent parameter in a research setup as preferences are different from person to person (If you want to hear more about this revisit our critical podcast episode of the 2004 SCA cupping protocol). One of the difficulties of using Rate of Rise as a cause/input/independent parameter is that keeping it constantly declining or not would involve changing both roast time and colour, both of which are simple concepts where at least one of them would need to be kept constant for an experiment to align with the ‘everything-else-equal’ principle. Since this is impossible, the theory about the superiority of a constantly declining rate of rise fails to be a prediction theory. At best, it is a vague correlation claim, with no real substance in the parameters it builds on, making it impossible to make solid experiments to test anything related to it.