Every engineering curricula covers data analysis techniques either as a separate course or within the engineering courses. All engineers are taught about how to calculate mean, variance, standard deviation, standard error, the difference between sample statistics and population statistics, etc. In engineering practice these calculations and concepts are used to confirm the results being observed can be counted upon to prove a product, process, etc. fit for the next step in development or manufacturing. They are used as confirmatory statistics. Often missed and / or overlooked is how to determine if the data meets the criteria for those standard confirmatory statistics. For an engineer working in the medical device/pharmaceutical industry the importance of using the right type of statistical analysis takes on increased significance. Every decision made in the development of a product is documented as part of the design history required by the FDA. Although statisticians are sometimes available they are typically assigned to more complex problems and waiting for their services can slow down the innovation process.

Having written the protocol, executed the testing and obtained the data how does an engineer know which of the standard techniques he has learned is or is not applicable? It turns out there is a well developed method. The first step in the method is part of the Exploratory Data Analysis process championed initially by John Tukey. (1) The essence of the EDA process is to analyze the data by graphical techniques to determine its characteristics. Once the data characteristics are understood the scientist/engineer with appropriate background or in consultation with a statistician can choose which confirmatory statistics are appropriate.

Why is this important? First, most biomedical engineers learn statistical techniques that require the data to be uncorrelated, normally distributed, with a mean ( or equivalent statistical deterministic component) and variance ( or equivalent random component) that don’t change during the experiment. Second, biomedical engineers work in a world where anyone or all of the previously listed requirements may not be met. This can be particularly true during the first attempts at creating a process, system or a complex process. Third, without understanding the data’s characteristics the engineer can not make accurately informed decisions about how to improve the product, process, or component.

The first step of the Exploratory Data Analysis process is to create four different plots to determine what the basic characteristics of the data. The plots are the run sequence plot, lag plot, histogram, and the normal probability plot. The runs sequence plot provides information about whether the data has a fixed location ( mean ) and variation ( variance ) vs. time. The lag plot determines if the data is structured or random. The histogram provides insight into whether the distribution is approximately normal and its symmetry. The log linear plot when approximately linear strongly suggests the data has a normal distribution. These four plots form a powerful quartet because they provide a succinct process to obtain insight into the basic characteristics of the data that the biomedical engineer can use to guide his decision making process. Without the insights provided by these four plots or their equivalent an engineer applies the standard statistical analysis essentially guessing that the data meets all the preconditions for their use.

(1) Tukey, John W, Exploratory Data Analysis (1977)