Conversation
DeondeJager
commented
Nov 3, 2025
- Added section: How to fill in missing data.
- Populated the new section with information.
- Fixed a small typo.
- Added section: How to fill in missing data. - Populated the new section with information. - Fixed a small typo.
| - it can not be shared due to data agreement restrictions; | ||
| - it is not applicable to that particular field (e.g. it is a negative control and the field does not apply) | ||
| - Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing. | ||
| - There are three levels at which you can report missing data, with an increasing amount of specificity for each: _**top level**_, _**lower level**_, and _**reporting level**_. Be as specific/granular as possible when reporting missing values. The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing. |
There was a problem hiding this comment.
I would break up the three level descriptions into bullets
| - If using terms from the most granular level (_reporting level_), then exclude the _lower level_ term, as each _reporting level_ term is a "child" of the _lower level_, which can then be inferred based on the [table](https://www.insdc.org/technical-specifications/missing-value-reporting/). | ||
|
|
||
| ### Examples | ||
| - missing |
There was a problem hiding this comment.
And let's provide context for each example:
- How to encode missing info for a negative control (not applicable)
- How to encode missing info because the infromation was not collected during a historical sampling event, or the collection records burnt down
- How to encode missing information because native/indigenous groups do not permit sharing of this information
| - it can not be shared for privacy reasons; | ||
| - it can not be shared due to data agreement restrictions; | ||
| - it is not applicable to that particular field (e.g. it is a negative control and the field does not apply) | ||
| - Missing data should only be reported for **mandatory** fields, not for **recommended** or **optional** fields. For the latter two, simply leave the field blank if the (meta)data are missing. |
There was a problem hiding this comment.
I think the mandatory fields sentence should go at the beginning of this section in a preamble paragraph and say more along the lines of:
Fields in MIxS that are mandatory always require something filled into the given metadata entry. If you do not have this information, you must encode this using the specific 'missing information' categories as below.
While optional fields in MIxS can be left blank, if you have a specific reason the information will never be able to be reported (see examples below), then it is good to use these missing data categories there also.
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
DeondeJager
left a comment
There was a problem hiding this comment.
Made several changes based on feedback from @jfy133.
|
|
||
| _TODO_ | ||
|
|
||
| ## How to fill in missing data |
There was a problem hiding this comment.
| ## How to fill in missing data | |
| ## How to fill in missing data | |
| _TODO_ | ||
|
|
||
| ## How to fill in missing data | ||
| Fields in MIxS that are mandatory always require something filled into the given metadata entry. |
There was a problem hiding this comment.
| Fields in MIxS that are mandatory always require something filled into the given metadata entry. | |
| Fields in MIxS that are mandatory (or 'required') always require something filled into the given metadata entry. |
| Fields in MIxS that are mandatory always require something filled into the given metadata entry. | ||
| If you do not have this information, you must encode this using the specific 'missing information' categories as below. | ||
|
|
||
| While optional fields in MIxS can be left blank, if you have a specific reason the information will never be able to be reported (see examples below), then it is good to use these missing data categories there also. |
There was a problem hiding this comment.
| While optional fields in MIxS can be left blank, if you have a specific reason the information will never be able to be reported (see examples below), then it is good to use these missing data categories there also. | |
| While optional fields in MIxS can be left blank, if you have a specific reason the information will never be able to be reported (see examples below), then it is also good to use these missing data categories there. |
| - There are three levels at which you can report missing data, with an increasing amount of specificity for each: | ||
|
|
There was a problem hiding this comment.
| - There are three levels at which you can report missing data, with an increasing amount of specificity for each: | |
| There are three levels at which you can report missing data, with an increasing amount of specificity for each: | |
| - _**top level**_ | ||
| - _**lower level**_ | ||
| - _**reporting level**_ |
There was a problem hiding this comment.
| - _**top level**_ | |
| - _**lower level**_ | |
| - _**reporting level**_ | |
| - top level | |
| - lower level | |
| - reporting level |
| - _**lower level**_ | ||
| - _**reporting level**_ | ||
|
|
||
| Be as specific/granular as possible when reporting missing values. |
There was a problem hiding this comment.
| Be as specific/granular as possible when reporting missing values. | |
| Be as specific/granular as possible when reporting missing values, i.e., try and specify to the reporting level as far as possible. |
| - _**reporting level**_ | ||
|
|
||
| Be as specific/granular as possible when reporting missing values. | ||
| The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing. |
There was a problem hiding this comment.
| The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing. | |
| The _top level_ only indicates that the data are missing. The _lower-_ and _reporting_ levels then additional provide a reason for the missingness (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)). |
|
|
||
| Be as specific/granular as possible when reporting missing values. | ||
| The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing. | ||
| - Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ". |
There was a problem hiding this comment.
| - Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ". | |
| Some additional recommendations: | |
| - Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ". |
| Be as specific/granular as possible when reporting missing values. | ||
| The _top level_ indicates that the data are missing, while the _lower-_ and _reporting_ levels give a reason (from the [controlled vocabulary](https://www.insdc.org/technical-specifications/missing-value-reporting/)) why the data are missing. | ||
| - Always report the _top level_ (i.e. "not applicable" or "missing") even when reporting at the more granular levels, in which case separate the _top level_ and _lower/reporting level_ terms with ": ". | ||
| - If using terms from the most granular level (_reporting level_), then exclude the _lower level_ term, as each _reporting level_ term is a "child" of the _lower level_, which can then be inferred based on the [table](https://www.insdc.org/technical-specifications/missing-value-reporting/). |
There was a problem hiding this comment.
Is this a recommendation of INSDC? If not I don't particualrly like it, e.g. missing: endangered species doesn't tell me much but saying missing: restricted access, endangered species is much clearer to me (personally)
| For example, in some implementations, numeric-only metadata terms may not allow non-number characters and thus will fail validation when giving e.g. `not applicable: control sample` category. | ||
| In these cases, refer to the documentation of the place you are submitting your metadata to. | ||
|
|
||
| ### Examples |