Nosology refers to the discipline of the systematic classification of diseases. While the field has ancient roots, its introduction into Western society was made by Thomas Syndenham during the 17th century [2]. The importance of nosology has continued to increase over time and the field has become particularly relevant as technology continues to play a more prominent role in the delivery of healthcare. ICD-9 codes are perhaps the most commonly used classification scheme in perioperative epidemiologic research. The generation of these codes is undoubtedly susceptible to error at several different points along the path from patient admission to the inclusion into a database [3]. The concern is that if researchers subsequently use these codes that are prone to error in studies, then false conclusions may be made.
It has been suggested that validation studies be routinely performed to understand the accuracy of specific ICD-9 codes before using them in an analysis [4]. This type of study involves the comparison of administrative codes to data abstracted from chart review. The work of Thomas et al. [1], falls short of invalidating codes for sepsis since the authors did not investigate the accuracy of coding but rather looked at their use over time. Thus, it is unclear what is responsible for the discrepancy that they discovered and it could be that coding for sepsis became more accurate over time.
Validation studies are not a panacea for misclassification bias. First, validation studies are usually undertaken at a single center since large national databases are typically de-identified. It is plausible and likely that coding practices differ across institutions as the coders undoubtedly have varying levels of training/experience between centers. Thus the generalizability of validation studies is unclear. The issue becomes murkier when considering diseases that do not have strict diagnostic criteria such as acquired muscle weakness in the intensive care unit [5], which creates variation amongst clinician documentation as well.
There are no set criteria or cut-offs in defining acceptable accuracy of a particular code for use in a study. The validity of a specific code can be described in terms of its specificity, sensitivity, negative predictive value and positive predictive value. Which of these measures is most important can depend on the question that is being asked of the data. Finally some would argue that the level of accuracy is less important than the pattern of error. If there is random or non-differential misclassification than it has been traditionally argued that this would bias estimates towards the null, although this notion has been challenged [6].