Skip to main content

Table 3. The list of the genome projects in GOLD with the top 10 MCI scores

From: The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness

GOLD ID

Organism Name

Study Group

MCI %

Gi05215

Streptococcus bovis ATCC 700338

HMP

66.95

Gi02825

Mycobacterium parascrofulaceum ATCC BAA-614

HMP

66.10

Gc00590

Ensifer medicae WSM419

RNB

65.25

Gc00870

Rhizobium leguminosarum bv. trifolii WSM2304

RNB

65.25

Gi02071

Anaerofustis stercorihominis DSM 17244

HMP

64.41

Gi02072

Anaerotruncus colihominis DSM 17241

HMP

64.41

Gi02680

Clostridium hiranonis TO-931, DSM 13275

HMP

64.41

Gi01716

Clostridium scindens ATCC 35704

HMP

64.41

Gc01039

Rhizobium leguminosarum bv. trifolii WSM1325

RNB

64.41

Gi02147

Bacteroides stercoris ATCC 43183

RNB

63.56

  1. Accordingly, the above discussion points out that an MCI score is useful when applied to large datasets: it can provide the average score across all the records as well as the distribution of the scores across the records. To demonstrate this, we plot the distribution of the MCI scores across the HMP and GEBA datasets, for each of their corresponding records. As shown on Figure 3, this distribution reveals that the HMP dataset has indeed a larger number of records that currently are characterized with lower MCI score, compared to the GEBA dataset.