If uploading a tsv file for the pairwise visualisation (with s1 and s2 fields for text) and the text pair exists in the OpenITI corpus, then the text is pulled from OpenITI GitHub, rather than using the submitted text fields. The result is a diff calculated on the wrong version of a text with the offsets:
If the csv is given with a name that is not recognised in the metadata, e.g. text1_text2.csv, then the issue does not occur (text is fetched from s1 and s2 as expected):
Suggested fix
When taking data from upload, should always use s1 and s2 fields from input data for the diff.
If uploading a tsv file for the pairwise visualisation (with s1 and s2 fields for text) and the text pair exists in the OpenITI corpus, then the text is pulled from OpenITI GitHub, rather than using the submitted text fields. The result is a diff calculated on the wrong version of a text with the offsets:
If the csv is given with a name that is not recognised in the metadata, e.g. text1_text2.csv, then the issue does not occur (text is fetched from s1 and s2 as expected):
Suggested fix
When taking data from upload, should always use s1 and s2 fields from input data for the diff.