Add TRY database formatter for meta-analysis (Issue #3717)#3720
Add TRY database formatter for meta-analysis (Issue #3717)#3720Mayanknishad9 wants to merge 3 commits intoPecanProject:developfrom
Conversation
- Implements format_try_for_ma() function for GitHub issue PecanProject#3717 - Converts TRY database exports to PEcAn meta-analysis format - Provides empty structure with all required columns (id, citation_id, mean, n, vname, etc.) - Includes example_trait_mapping() for common trait conversions - Ready for actual TRY data format specification
- Implements format_try_for_ma() function as requested - Converts TRY database exports to PEcAn meta-analysis format - Filters trait data (TraitID not NA) from covariates - Maps TRY trait names to PEcAn variables via trait_map parameter - Extracts StdValue, ErrorRisk, Replicates columns - Returns data frame with all required PEcAn MA columns - Includes try_trait_mapping() helper for common trait conversions
mdietze
left a comment
There was a problem hiding this comment.
- need to make doc to build function man page (and then commit)
- need to update log and overall pecan documentation
- need to verify that this function runs correctly on real data and that the MA function reads the resulting dataframe successfully
- should document clearly whether this data is in the format returned by the PEcAn database (which needs to be run through
jagifybefore it can go intopecan.ma) or the format that the MA itself uses. - definitely take a look at the jagify and pecan.ma functions to make sure you understand which columns are used and which are currently dropped.
| #' @param trait_map Named vector: TRY TraitName -> PEcAn vname | ||
| #' @param citation_id Default citation ID | ||
| #' @export | ||
| format_try_for_ma <- function(try_data, trait_map, citation_id = 999) { |
There was a problem hiding this comment.
The default citation_id is a valid citation in our database. Would it work to make the default -9999 or something like that?
| result <- data.frame( | ||
| id = if ("ObsDataID" %in% names(data)) data$ObsDataID else 1:nrow(data), | ||
| citation_id = citation_id, | ||
| site_id = 1, |
There was a problem hiding this comment.
you definitely don't want to set all the data to have the same site_id, that would get analyzed incorrectly by the MA code. Is there any geospatial info in the try data (e.g. lat/lon) that you can part into a site table?
| date = NA, | ||
| time = NA, | ||
| cultivar_id = NA, | ||
| specie_id = NA, |
There was a problem hiding this comment.
This is really important, don't leave as NA
| treatment_id = NA, | ||
| name = NA, | ||
| date = NA, | ||
| time = NA, |
There was a problem hiding this comment.
Does TRY really provide no info on data and time?
| time = NA, | ||
| cultivar_id = NA, | ||
| specie_id = NA, | ||
| mean = if ("StdValue" %in% names(data)) as.numeric(data$StdValue) else NA, |
There was a problem hiding this comment.
If there's no actual data in the TRY file then we're in trouble! This will need to at least throw a warning if not an error
| statname = if ("ErrorRisk" %in% names(data)) "SE" else NA, | ||
| stat = if ("ErrorRisk" %in% names(data)) as.numeric(data$ErrorRisk) else NA, | ||
| n = if ("Replicates" %in% names(data)) as.numeric(data$Replicates) else NA, | ||
| vname = if ("TraitName" %in% names(data)) { |
| month = NA, | ||
| greenhouse = FALSE, | ||
| control = FALSE, | ||
| stringsAsFactors = FALSE |
There was a problem hiding this comment.
Is this actually part of our internal format?
| }) | ||
| } else NA, | ||
| month = NA, | ||
| greenhouse = FALSE, |
There was a problem hiding this comment.
Did you confirm that no TRY data ever comes from a greenhouse or similar treatment (potted plants, growth chambers, etc)
|
|
||
| #' Example trait mapping | ||
| #' @export | ||
| try_trait_mapping <- c( |
There was a problem hiding this comment.
Any reason to not have this as the default for the function?
|
@Mayanknishad9 Is this PR still on your radar? |
|
no you can take it
…On Sun, Mar 1, 2026 at 1:46 PM Chris Black ***@***.***> wrote:
*infotroph* left a comment (PecanProject/pecan#3720)
<#3720 (comment)>
@Mayanknishad9 <https://github.com/Mayanknishad9> Is this PR still on
your radar?
—
Reply to this email directly, view it on GitHub
<#3720 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BUC7ROYLREAHL3DCZVNESML4OPWWZAVCNFSM6AAAAACPBKRRJKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSNZZGQ3TEMRTGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Hi @infotroph , since it looks like this PR has been stalled for a bit, I went ahead and took a fresh look at the original issue. I've created a new pull request with my approach here: #3842 . I'd love to get your feedback on it!" |
AritraDey-Dev
left a comment
There was a problem hiding this comment.
closing this because of #3842
Description
fixes #3717
Could be used to fix 3707
Adds the
format_try_for_ma()function to convert trait data from the external TRY database into the tabular format required by the PEcAn meta-analysis module. The function is located atmodules/data.remote/R/format_try.R.The implementation includes:
TraitIDis notNA) and excludes covariates.TraitNameto PEcAn variable names (vname) using a user-providedtrait_mapvector.StdValue,ErrorRisk,Replicates) to the PEcAn equivalents (mean,stat,n).try_trait_mapping()function with predefined conversions for common ecological traits.Motivation and Context
This change addresses GitHub issue #3717. It creates a helper function that allows users to easily reformat data from common external trait databases (like TRY) for use in PEcAn's meta-analysis workflows. This is important because, as noted in the original issue, the internal BETYdb is no longer being actively updated, making external data sources the de facto norm for many users[citation:1].
Review Time Estimate
Types of changes
Checklist: