Skip to main content

Table 1. The list of all selected metadata fields in GOLD (columns 2 and 6)1

From: The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness

GOLD Metadata Field

Records

MCI %

GOLD Metadata Field

Records

MCI %

1

GOLD STAMP ID

13,786

100

58

HMP FINISHING GOAL2

2,472

17.93

2

DISPLAY NAME

13,786

100

59

ENERGY SOURCES

2,467

17.89

3

NCBI TAXON ID

13,786

100

60

ASSEMBLY METHOD

2,235

16.21

4

DOMAIN

13,786

100

61

HMP ISOLATION BODY SITE2

2,169

15.73

5

AVAILABILITY

13,786

100

62

GREENGENES ID

2,146

15.57

6

GOLD GENUS

13,785

99.99

63

PROJECT DESCRIPTION

2,122

15.39

7

PROJECT TYPE

13,784

99.99

64

PUBLICATION LINK

2,062

14.96

8

PROJECT STATUS

13,784

99.99

65

HMP NCBI SUBMISSION STATUS2

1,948

14.13

9

NCBI SUPERKINGDOM

13,782

99.97

66

HMP PROJECT STATUS2

1,948

14.13

10

GOLD PHYLUM

13,778

99.94

67

HMP ID2

1,946

14.12

11

PROPOSAL NAME

13,761

99.82

68

ISOLATION SOURCE

1,884

13.67

12

GOLD SPECIES

13,734

99.62

69

SEQUENCING STATUS LINK

1,849

13.41

13

NCBI PHYLUM

13,526

98.11

70

GENE CALLING METHOD

1,811

13.14

14

NCBI GENUS

13,506

97.97

71

LONGITUDE

1,631

11.83

15

NCBI ORDER

13,435

97.45

72

LATITUDE

1,629

11.82

16

NCBI SPECIES

13,359

96.90

73

HMP ISOLATE SOURCE2

1,482

10.75

17

NCBI FAMILY

13,135

95.28

74

BEI STATUS2

1,355

9.83

18

NCBI CLASS

13,063

94.76

75

BODY SAMPLE SUBSITES

1,236

8.97

19

SEQUENCING STATUS

12,498

90.66

76

16S ID

1,195

8.67

20

STRAIN

12,480

90.53

77

BIOSAFETY LEVEL

1,154

8.37

21

SEQUENCING COUNTRY

12,326

89.41

78

ISOLATION DATE

1,080

7.83

22

SEQUENCING CENTER

11,837

85.86

79

HMP ISOLATION COMMENTS2

1,052

7.63

23

NCBI PROJECT ID

10,358

75.13

80

NUMBER OF READS

1,048

7.60

24

UPDATE DATE

10,247

74.33

81

ORGANISM COMMENTS

948

6.88

25

RELEVANCE

9,993

72.49

82

METABOLISM

947

6.87

26

CONTACT NAME

8,413

61.03

83

ISOLATION COMMENTS

874

6.34

27

HABITATS

7,979

57.88

84

LIBRARY METHOD

778

5.64

28

TEMPERATURE RANGE

7,673

55.66

85

SEROVAR

774

5.61

29

GRAM STAIN

7,341

53.25

86

BODY PRODUCTS

723

5.24

30

BIOTIC RELATIONSHIP

7,147

51.84

87

HOST HEALTH

712

5.16

31

CONTACT EMAIL

7,037

51.04

88

STRAIN INFO ID

691

5.01

32

OXYGEN REQUIREMENT

7,028

50.98

89

HMP ISOLATION COMMENTS2

690

5.01

33

CELL SHAPE

6,748

48.95

90

HMP ISOLATION BODY SUBSITE2

681

4.94

34

DISEASES

6,661

48.32

91

SYMBIOTIC RELATIONSHIP

493

3.58

35

MOTILITY

6,275

45.52

92

SHORT READ ARCHIVE ID

475

3.45

36

HOST NAME

5,807

42.12

93

INFORMATION URL

465

3.37

37

SEQUENCING METHODS

5,636

40.88

94

PH

441

3.20

38

ISOLATION SITE

5,388

39.08

95

IMAGE URL

415

3.01

39

SPORULATION

5,187

37.63

96

VECTOR

380

2.76

40

HOST TAXON ID

5,131

37.22

97

SYMBIONT

348

2.52

41

GENOME SIZE

4,706

34.14

98

SYMBIOTIC INTERACTION

344

2.50

42

COMPLETION DATE

4,585

33.26

99

ISOLATION PUBMED ID

339

2.46

43

CULTURE COLLECTION

4,212

30.55

100

HOST GENDER

323

2.34

44

CELL ARRANGEMENTS

4,126

29.93

101

DEPTH

308

2.23

45

PHENOTYPES

4,045

29.34

102

SALINITY

281

2.04

46

GC PERC

3,693

26.79

103

HOST AGE

250

1.81

47

GENE COUNT

3,556

25.79

104

ISOLATION METHOD

238

1.73

48

IN IMG DATABASE

3,453

25.05

105

CELL DIAMETER

233

1.69

49

PUBLICATION JOURNAL

3,395

24.63

106

CELL LENGTH

189

1.37

50

SEQUENCING QUALITY

3,286

23.84

107

COLOR

157

1.14

51

GEO LOCATION

3,265

23.68

108

ALTITUDE

94

0.68

52

TYPE STRAIN

3,248

23.56

109

HOST RACE

72

0.52

53

COVERAGE

3,246

23.55

110

HOST COMMENTS

50

0.36

54

BODY SAMPLE SITES

3,225

23.39

111

PROJECT COMMENTS

38

0.28

55

ISOLATION COUNTRY

3,140

22.78

112

SYMBIONT TAXON ID

36

0.26

56

TEMPERATURE OPTIMUM

2,712

19.67

113

NCBI ARCHIVE ID

10

0.07

57

CONTIG COUNT

2,472

17.93

    
  1. 1with the number of records for each of them (columns 3 and 7), and the MCI % (columns 4 and 8), ordered by the field with highest MCI. Rows in gray belong to the MIGS minimum information checklist that extends what is captured by the INSDC [4] (i.e. full taxonomy is not captured since a reference to a valid NCBI taxid is expected).
  2. 2fields relevant only to projects that are part of the HMP study