Introduction
Although spine fractures do not occur in all trauma patients, their impact on the patients’ life is more significant than that of other injuries. Therefore, early diagnosis and proper treatment are necessary to avoid complications and neurological deficits in patients with thoracolumbar injury. Currently, the following 4 thoracolumbar injury classification systems are widely used: the Denis classification [
1], the AO classification [
2], the load-sharing classification [
3], and the thoracolumbar injury classification and severity score (TLICS) [
4]. Despite extensive research on these systems, none is fully satisfying and spine surgeons therefore choose their own classification methods depending on their personal preferences [
5,
6]. We focused on the TLICS score for diagnosing thoracolumbar spine fractures and aimed to determine the reliability of this classification system in young neurosurgeons. Intra- and inter-observer differences between the TLICS system and the McAfee classification were studied to identify which classification method may facilitate communication between physicians and help young neurosurgeons establish a treatment plan for patients with thoracolumbar injury.
Material and Method
Six young Korean neurosurgeons, who had all obtained their specialist certification in neurosurgery in less than 4 years, reviewed thoracolumbar spinal fracture patients between January 2016 and October 2020. Patients with spinous process fractures, transverse process fractures alone, or pathologic fractures were excluded. The patients were categorized according to their level of injury: thoracic (T1-T10, n=27), thoracolumbar (T11-L2, n=131), and lumbar (L3-L5, n=43) spine injury. The 6 observers analyzed patients’ thoracolumbar spinal fractures according to the 2 classification methods, the TLICS and the McAfee. They were well trained on both methods through a review of studies, and independently reviewed the patients’ history and neurologic, plain film, computed tomography (CT), and magnetic resonance imaging (MRI) findings. The patients’ data were re-evaluated after 4 weeks.
The TLICS is scored based on 3 categories: morphology of the injury, integrity of the posterior ligamentous complex (PLC), and the patient's neurologic status (
Table 1). Widening of the interspinous space, diastasis of the facet joint, and facet perch or subluxations on X-ray and/or CT indicate an injured PLC [
7,
8]. However, the most credible sign of PLC injury is discontinuation of the low-signal-intensity black strip on sagittal T1- or T2-weighted MR images [
9].
The McAfee classification is subdivided into 6 categories: wedge compression, stable burst fracture, unstable burst fracture, chance fracture, flexion-distraction injury, and translational injury. An unstable burst fracture is defined as a burst fracture with a neurologic deficit, three-column injury, kyphosis over 30 degrees, a decrease of anterior body height over 40%, and more than 50% canal compromise [
10].
The intra- and inter-observer reliability was assessed by Cohen’s and Fleiss’ kappa values. Statistical analyzes were performed using IBM SPSS ver. 20.0 (IBM Corp., Armonk, NY, USA).
The kappa value was interpreted using the Landis and Koch [
11] grading system, which defines κ <0.2 as slight agreement, 0.21 to 0.40 as fair agreement, 0.41 to 0.60 as moderate agreement, 0.61 to 0.80 as substantial reliability, and >0.81 as excellent reliability.
Patients were informed of the use of medical information in the study and informed consent was obtained.
Results
During the study period, 201 patients (109 males and 92 females) visited our hospital. The mean age was 60.5 years (range, 18-88), with the age distribution as follows: 3 (10-19 years), 13 (20-29 years), 11 (30-39 years), 26 (40-49 years), 30 (50-59 years), 35 (60-69 years), 53 (70-79 years), and 30 (80-89 years). Thoracic spine fractures were noted in 27 (13.4%) patients, lumbar spine fractures in 43 (21.4%), and thoracolumbar spine fractures in 131 (65.2%). Spinal fusion operation was performed in 115 (57.2%) patients, percutaneous vertebroplasty or kyphoplasty in 62 (30.8%), and conservative treatment with a thoracolumbosacral orthosis for 8 to 12 weeks in 24 (11.9%) (
Table 2).
The intra-observer kappa value showed almost perfect agreement according to the Landis and Koch [
11] grading system in each category of the TLICS, with κ=0.85 (morphology), κ=0.95 (neurologic status), κ=0.87 (integrity of the PLC), and κ=0.85 (total score of the TLICS), while the kappa value for the McAfee classification was κ=0.79, representing substantial agreement (
Table 3).
The inter-observer kappa value in each category of the TLICS was κ=0.69 (morphology; substantial agreement according to the Landis and Koch [
11] grading system), κ=0.93 (neurologic status; almost perfect agreement), κ=0.74 (integrity of the PLC; substantial agreement), and κ=0.72 (total score of the TLICS; substantial agreement), while the kappa value for the McAfee classification was lower, at κ=0.52, representing moderate agreement (
Table 4).
Discussion
Spinal injury treatment remains a challenge, despite the developments in diagnostic and treatment systems for trauma patients. When accompanied by neurological deficits, thoracolumbar injuries have an emotional and economic effect on patients and their families, in addition to the physical disabilities they cause. Traumatic thoracolumbar injury is associated with a poor prognosis, even when aggressive rehabilitative treatments are provided. Numerous thoracolumbar spine injury classification systems have been introduced to aid clinical and surgical treatment [
1-
4,
10,
12-
15]. The first classification of traumatic thoracolumbar injury was reported by Böhler and Böhler [
12] in 1929. Watson-Jones [
16] introduced a classification system based on the concept of “instability” in 1938, recognizing that posterior ligamentous integrity is a key element for spinal stability. In 1949, Nicoll [
15] defined the concept of “stability” using an anatomic classification and identified 4 anatomical structures involved in mechanical stability—vertebral bodies, facet joints, posterior ligaments, and disks. Holdsworth [
14] first recommended the modern classification of fractures based on a two-column theory, and this method employed simple X-rays. With the rapid development of radiology, Denis [
1] proposed a three-column theory complementing the two-column theory and divided the spine into anterior, middle, and posterior columns. This method focused on the stability of the middle column measured by CT. Depending on the injury mechanism and the degree of injury, this system classified spinal injuries into 4 groups—compression fractures, burst fractures, seatbelt injuries, and fracture-dislocations. However, McAfee et al. [
10] pointed out that this system overestimates the influence of middle column stability, thereby increasing the number of unnecessary surgical procedures. McAfee classified fractures into 6 types depending on compression, distraction, and direct shearing force on the middle column as determined on CT scans. Consequently, bursting fractures were subclassified into stable fractures and unstable fractures, and the latter were further subclassified based on the three-column theory, including posterior column injury. Magerl et al. [
2] introduced the AO classification system and categorized thoracolumbar injuries into A type injuries by compression force, B type injuries by distraction force, and C type injuries by torsional force, with subcategories from group 1 to 3 based on the degrees of injuries.
Likewise, many classification methods of thoracolumbar injuries have been introduced, but these methods have been found to show diagnostic variability depending on personal point of view and low inter-observer reliability [
7,
17]. Moreover, the need for a new classification system reflecting the importance of soft tissue such as the PLC using MRI has emerged. Therefore, Vaccaro et al. [
4] developed the TLICS classification to overcome these limitations. The TLICS is based on injury morphology, neurological status, and PLC integrity (
Table 1). Treatment plans are decided according to scores for each category: ≤3 points indicate conservative care, ≥5 points indicate surgical treatment, and a score of 4 allows for either option. Existing classification systems emphasize mechanisms of injury assumed by observers, while the TLICS emphasizes objective analysis of the injury.
Consequently, many classification systems have been developed to offer better treatment to patients. Blauth et al. [
5] have reported fair inter-observer reliability (κ=0.33) for only the 3 main types (A, B, C) of the AO classification, and decreasing reliability with the inclusion of the AO subtypes. Oner et al. [
17] and Wood et al. [
7] have reported that the Denis classification system has higher inter-observer reliability than the AO classification system (Oner, κ=0.60, 0.35; Wood, κ=0.606, 0.475). However, both classification systems showed only fair to moderate inter-observer reliability. The AO classification system includes much information about the injury, and its complexity and the consequent low reproducibility limit its clinical and surgical application [
1,
5,
7,
17]. While the Denis classification is simple, it does not consider anatomical and pathophysiologic factors such as PLC or nerve injury [
18].
Given the numerous limitations of these classification systems, the TLICS system was found favorable in many studies. Whang et al. [
8] reported satisfactory reliability of the TLICS with substantial agreement (κ=0.626) on injury morphology, moderate agreement (κ=0.447) on PLC integrity, and moderate agreement (κ=0.455) on the total score. However, the intra- and inter-observer reliability of the TLICS is not well-studied, especially among young neurosurgeons. Therefore, this study evaluated the intra- and inter-observer reliability between the TLICS and McAfee classification systems. The intra-observer reliability of the McAfee classification system shows substantial reliability, but that of the TLICS shows almost perfect reliability (
Table 3). Fleiss’ kappa test on inter-observer reliability revealed high reliability in all categories (injury morphology=0.69, neurologic status=0.93, PLC integrity=0.74, total score=0.72) for the TLICS, while the kappa value for the McAfee classification was 0.52, representing only moderate reliability (
Table 4,
Fig. 1). Among the subcategories of the TLICS classification, injury morphology showed the lowest value, assuming that the ratio of flexion-distraction and bursting fracture were more significant than others among the 201 cases included in this study. Accordingly, the TLICS system has higher statistical significance than the McAfee classification regarding conformity and consistency. The young neurosurgeons in this study found the TLICS system to have higher reliability than the McAfee classification in suggesting treatment plans for patients with thoracolumbar injury. Moreover, it can facilitate communication among young neurosurgeons.
This study has, however, some limitations. First, this is a retrospective analysis based on clinical records of patient information. Therefore, records with insufficient information were excluded to improve the accuracy of our results. Second, all researchers in this study received their education from the same hospital; therefore, the neurosurgeons included in this study do not represent the general neurosurgeon population.