Diagnostic potential of gut microbiota in Parkinson’s disease

Background. Nowadays many efforts are taken in searching for Parkinson’s disease biomarkers, especially for an early recognition of the disease. The gut microbiota is one of the potential sources of biomarkers, changes in the composition of which in PD are actively studied. The aim of this study is to identify microbiota biomarkers in the Parkinson’s disease with an estimated accuracy of the diagnostics, including differential diagnostics, relative to other neurological diseases for patients of the Russian population. Material and methods. One hundred ninety-two metagenomics profiles from patients with Parkinson’s disease (n = 93), people with other neurological diagnoses (n = 33), and healthy controls (n = 66) were included in this study. These profiles were obtained with amplicon sequencing of bacterial 16S rRNA genes. Classifying models were made using the naive Bayes classifier, the artificial neural network, support vector machine, generalized linear model, and partial least squares regression. As a result we established that an optimal classification by the composition of the gut microbiota on the validation sample (sensitivity 91.30%, specificity 91.67% at 91.49% accuracy) amid patients was demonstrated with a naive Bayes classifier using the representation of the following genera as predictors: Christensenella, Methanobrevibacter, Leuconostoc, Enterococcus, Catabacter, Desulfovibrio, Sphingomonas, Yokenella, Atopobium, Fusicatenibacter, Cloacibacillus, Bulleidia, Acetanaerobacterium, and Staphylococcus. Conclusions. Information of the gut microbiota taxonomic composition may be used in differential diagnosis of Parkinson’s disease.


INTRODUCTION
Among neurological diseases, Parkinson's disease (PD) occupies an important place -a chronic neurodegenerative disease that affects dopaminergic neurons of the substantia nigra of the brain and manifests itself primarily in the form of motor disorders: rigidity, trembling, postural instability. Currently, about 4 million people worldwide are affected by PD, and its prevalence in the population is projected to only increase [1]. Due to the lack of methods for radical therapy of the disease, on the one hand, and a long period of the asymptomatic course, on the other hand, much attention is paid to the search for approaches to the early diagnosis of PD, the identification and study of new disease markers. Based on studies by N. Braak et al. [2], during which the early involvement of intestinal neurons in the process of neurodegeneration was established, the gastrointestinal tract is considered as one of the promising sources of biomarkers.
It is known that the human intestine serves as a habitat for a complex multicomponent community consisting of trillions of symbiotic microorganisms, including bacteria, archaea, protozoa, fungi and viruses. This community, also called microbiota or microbiome, plays an important role in the life of the host organism, taking part in digestion [3], secretion of vitamins and biologically active substances [4], maintaining the physiological level of inflammation in the intestine, thereby protecting against the introduction of pathogens [5]. It has been shown that for many diseases, both systemic and occurring at the local level, a change in the composition of the intestinal flora occurs [6]. A similar fact is also characteristic of PD -over the past 3 years, several studies have been carried out confirming changes in the representation of intestinal microorganisms in this disease [7][8][9][10][11][12]. Three papers proposed algorithms based on the use of microbiota as a predictor for the diagnosis of PD [9,11,12], but their use is limited. First, the list of predictor microorganisms found for different populations varied, which is most likely the result of an ethnographic and geographical peculiarity of the composition of the microbiota [13]. Secondly, the selection of coefficients and the assessment of the accuracy of diagnostic algorithms were carried out on the same sample, which leads to a distortion in the assessment of the prognostic properties of the proposed models [14]. It is also unknown whether the microbiotic landscape of the intestine in PD is unique only for this disease or whether similar changes are also observed in other severe neurode-generative and neuroinflammatory diseases, which can affect the specificity of diagnostics using microbiota.
The aim of this study is to identify microbiotic biomarkers of PD with an estimated accuracy of diagnosis, including differential, relative to other neurological diseases for the Russian population.

MATERIALS AND METHODS
The study included 192 metagenomic profiles of the intestinal microbiota obtained as a result of amplicon sequencing of the V3-V4 fragment of the bacterial 16S rRNA gene, which were analyzed and published previously by our team [15,16]. All samples were divided into two groups: experimental, which included 93 metagenomic profiles from patients with a confirmed diagnosis of Parkinson's disease, and control. The control group included 66 samples from the healthy control group without signs of neurodegenerative and neuroinflammatory diseases and 33 patients with other neurological diseases: multiple sclerosis -15 people, essential tremor -10, idiopathic familial dystonia -5 patients, and one patient each with diagnoses "Multiple systemic atrophy", "diffuse Lewy body disease" and "acute disseminated encephalomyelitis." The plan and conduct of the research work was fully consistent with the principles of Good Clinical Practice and the Helsinki Declaration (including amendments). Written informed consent was obtained from all patients or from their close relatives and those officially recognized as responsible for the patients at the time of the study. Patients and their relatives were informed about the nature of the study, its purpose and possible complications, and could also unilaterally interrupt the study at any time.
The data from the sequencer was analyzed in the QIIME 1.9.1 software [17]; to determine the taxonomic position of the reads, GreenGenes databases version 13.5 and HITdb [18,19] were used. Statistical analysis was performed using the programming language R [20]. Beta diversity was calculated using multidimensional scaling in the weighted Unifrac metric [21] with data normalization using the CSS algorithm [22]. The influence of the diagnosis, gender, and age of the patients on the total composition of the microbiota was determined by the ANOSIM algorithm and using nonparametric variance analysis with a confidence rating of 9999 permutations in the vegan package [23]. To search for differences in the taxonomic composition of metagenomes at the genus level, the fitZig statistical model of the metagenomeSeq package was used [22]. The differences were considered significant at Оригинальные статьи p <0.05 after applying the correction for multiple comparisons according to the Benjamini-Hochberg method.
Feature selection and training algorithm were carried out in the caret package [24] of the R language. The following algorithms were used for classification: a naive Bayes classifier, a single-layer artificial neural network, a support vector machine using a radial basis function, a generalized linear model and partial least squares regression. Before training, the data was centered and scaled. The entire sample was divided into two unequal parts: the training (146 samples, or 75% of the total sample) and the test (46 samples, or 25% of the total sample). Samples were randomly and equally distributed in the samples. On the training sample, the coefficients of the models were adjusted to assess the classification accuracy (the quotient of the correct predictions and the total number of predictions) with a 10-fold cross-check. After training, the models were tested on a test sample with the calculation of the sensitivity and specificity of the classification.

RESULTS
When visualizing the beta diversity of microbiota in the space of the first and second main coordinates (Fig. 1), there was no linear separability between the experimental and control groups, however, clustering of patients depending on the disease was determined: for example, patients with Parkinson's disease were located at the bottom of the graph, patients groups of healthy control groups -in the upper left, patients with other neurological diseases were closer to the middle of the graph. Nonparametric analysis of variance and the ANOSIM algorithm confirmed this observation: the determination coefficient for the variable "diagnosis" was 0.103 (p = 0.0001) and 0.123 (p = 0.0001), respectively. The age of the patients also influenced the total taxonomic composition of metagenomes according to the results of nonparametric analysis of variance, but the strength of the effect was low (R2 = 0.019, p = 0.0007). Gender in this sample did not contribute to beta diversity. According to the discovered contribution of age, the search for differences in the composition of the intestinal microbiota at the birth level between the control and experimental groups for choosing a list of predictors was carried out taking into account age as covariates. As a result, it was found that in the microbiota of patients of the experimental group, as compared to the combined control group, the content of bacteria of the genera Acetanaerobacterium, Anaerococcus, Atopobium, Bulleidia, Cloacibacillus, Christensenella, Catabacter, Desulfovibrio, Staphylococcus, Succinivibrio-75, Yoken, Sphingomonas, Papillibacter, Oxalobacter, Leuconostoc, as well as the Archean Methanobrevi-bacter, and the representation of Fusicatenibacter decreases (Table 1).

Оригинальные статьи
When training models on the training sample, the following results were obtained (Fig. 2, b). The machine of support vectors with a radial basis function had the highest average classification accuracy during cross-checking in 77%, the average accuracy of the naive Bayesian algorithm with a nuclear estimate of the distribution density was slightly lower and amounted to 76%, while the medians of the classification accuracy using these algorithms were the same and accounted for 79%. Partial least squares regression had the same average classification accuracy as the naive Bayesian algorithm, but the median accuracy for this algorithm was 73%. A single-layer neural network with a similar median classification accuracy showed a lower average accuracy of 75%. The lowest average and median accuracy indicators had a generalized linear model -72 and 74%, respectively. T a b l e 2 Т а б л и ц а 2 N o t e . NB is a naive Bayesian classifier, SVM is a support vector machine, PLS is a partial least squares regression, NNET is an artificial neural network, GLM is a generalized linear model. П р и м е ч а н и е . NB -наивный байесовский классификатор, SVM -машина опорных векторов, PLS -регрессия методом частичных наименьших квадратов, NNET -искусственная нейронная сеть, GLM -обобщенная линейная модель. As a result of testing models on a validation sample, it was found that the naive Bayes algorithm demonstrated the optimal sensitivity, specificity, and accuracy of determining the presence of PD by the composition of the intestinal microbiota of patients ( Table 2). The reference vector machine showed the highest sensitivity and lowest classification specificity. Classifiers based on the application of the method of partial least squares and a single-layer neural network showed the same classification accuracy, while the neural network was characterized by higher sensitivity, but less specificity. The generalized linear model turned out to be the least suitable for determining PD based on an assessment of the characteristics of the composition of the intestinal microbiota.

DISCUSSION
It was shown that the structure of the microbiome in PD is characterized by a rather high degree of uniqueness, which potentially allows the use of information on its taxonomic composition for diagnosing the disease, including differential with respect to other severe neurological (neurodegenerative and neuroinflammatory) diseases. Given the early appearance of signs of neurodegeneration in the nervous system of the intestine [25], this algorithm could potentially be effective for diagnosing at an early stage of the disease, before the first clinical signs appear.
As a result of studies on this topic, conducted by other teams, the suitability of data on the composition of microbiota for differentiating patients with PD and healthy individuals was also shown [9,11,12]. The developed classification algorithms are characterized by relatively high specificity up to 90%, but low sensitivity, reaching 66.7% only when using additional clinical information [9]. This limits the application of this approach in real clinical practice without significant improvement.
The classification algorithms proposed in earlier studies were based on a generalized linear model and ROC analysis, which do not give good prediction quality in the absence of linear class separability. Moreover, the fitting and testing of the algorithms was carried out on the same sample, which leads to retraining -overstating the classification quality parameters -accuracy, specificity and sensitivity of the algorithm [14]. The structure of metagenomic data is highly complex; for this reason, the use of validation samples to verify the quality of classifiers is especially important [26].
As a result of this study, the accuracy of the classification of patients based on the taxonomic composition of the intestinal microbiota relative to PD in the validation sample was first assessed. Obtained during the study, the sensitivity parameters of patient classification ranged from 65.22 to 100%, specificity from 62.50 to 91.67%. Interestingly enough, the optimal classification parameters (sensitivity 91.30%, specificity 91.67% with an accuracy of 91.49%) were obtained using a technically simple naive Bayes classifier with a nuclear distribution density estimate. In this case, this algorithm turned out to be the most suitable for diagnostics, since it is more stable on samples of relatively small size, which include the sample used in our study, and also works quite efficiently with complex structure data [27].
The list of predictors used to train classification models used in various studies also varied. On the one hand, this can be explained by a different approach to choosing the level of taxa used for classification, as well as using different bioinformatics analysis algorithms. In two studies, information on the composition of microbiota at the family level was used for this purpose [9,11]; in another study, data on the representation of OTUs were used [12]. In the framework of this study, information on the generic composition of the microbiome was used to construct the classifier. On the other hand, differences in the taxonomic composition of microbiota observed between different human populations could also influence the list of predictors [13]. For this reason, the use of information on the functional composition of microbiota, due to its relative homogeneity, may be the best option for creating a classifier suitable for use by residents of different countries. An important fact is that in this study, not only healthy people, but also patients with other neurological diseases were used as a control group, which could also affect the final composition of the list of biomarkers.

CONCLUSION
As a result of the study, it was found that information on the taxonomic composition of the intestinal microbiota could potentially be used for differential diagnosis of Parkinson's disease. The development of new non-invasive biomarkers for the diagnosis of PD, including at the preclinical stage, will allow the treatment of the disease to begin even before the complete loss of dopaminergic neurons, which will improve the quality of life of patients.