Setup DistABLPLoader for GraphStore mode#485
Conversation
|
/e2e_test |
|
/integration_test |
|
/unit_test_py |
GiGL Automation@ 19:58:53UTC : 🔄 @ 21:23:22UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 19:58:54UTC : 🔄 @ 21:07:19UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 19:58:59UTC : 🔄 @ 21:10:08UTC : ✅ Workflow completed successfully. |
mkolodner-sc
left a comment
There was a problem hiding this comment.
Thanks Kyle, left a few comments/questions
| # Graph Store mode inputs | ||
| dict[int, tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]], | ||
| tuple[ | ||
| NodeType, |
There was a problem hiding this comment.
Just to double check, how would this interplay if I have user as my anchor node type and item as my supervision node type?
Tbh it would feel a bit weird to specify {user: {0: (0, 1, 1)}} with the 0 corresponding to the user node type but the two 1 node types are items. Ideally, if a user is providing the positive and negative labels here, I feel like they should have to provide the node type in some way. This may be even more true once we will need to support multiple supervision edge types
There was a problem hiding this comment.
Good callout!
Hmmm, wdyt about dict[int, tuple[tuple[NodeType, Tensor], tuple[NodeType, Tensor], Optional[tuple[NodeType, Tensor]]?
Or honest that's so complicated probably worth creating a dataclass to wrap all the data?
I know we want to avoid dataclasses here but frankly this is so complicated I think it's worth it. WDYT?
There was a problem hiding this comment.
It is growing complicated enough here honestly where I think a dataclass specifically explaining this input (and potentially having the RemoteDataset.get_ablp_inputs() return this dataclass) makes sense.
Re: dict[int, tuple[tuple[NodeType, Tensor], tuple[NodeType, Tensor], Optional[tuple[NodeType, Tensor]]
Not to add complexity unfortunately to this already complex object, but for multiple supervision edge types, we may need to make it list[tuple[NodeType, Tensor]] for the positive and negatives :')
Alternatively, another option could be to expose positive_nodes, negative_nodes, or label_nodes as direct arguments. It seems like it's a tradeoff between
- adhering to GLT by only having one
input_nodesbut having a fairly complex structure to provide as input :') - having multiple arguments but now we've evolved the API beyond GLT/PyG and further added complexity to the API
I need to think more about which approach I prefer but these are my thoughts for now
|
/integration_test |
|
/unit_test_py |
GiGL Automation@ 21:37:00UTC : 🔄 @ 22:39:45UTC : ❌ Workflow failed. |
GiGL Automation@ 21:37:10UTC : 🔄 @ 21:44:13UTC : ❌ Workflow failed. |
|
/unit_test_py |
GiGL Automation@ 21:52:15UTC : 🔄 @ 23:04:55UTC : ✅ Workflow completed successfully. |
|
/integration_test |
GiGL Automation@ 23:13:23UTC : 🔄 @ 24:01:43UTC : ❌ Workflow failed. |
|
/integration_test |
GiGL Automation@ 24:06:05UTC : 🔄 @ 01:59:33UTC : ❌ Workflow failed. |
|
/integration_test |
GiGL Automation@ 02:34:39UTC : 🔄 @ 04:06:55UTC : ❌ Workflow failed. |
|
/unit_test_py |
|
/integration_test_py |
|
/e2e_Test |
GiGL Automation@ 18:11:08UTC : 🔄 @ 19:35:54UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:11:15UTC : 🔄 @ 19:50:46UTC : ❌ Workflow failed. |
|
/integration_test |
GiGL Automation@ 18:11:24UTC : 🔄 @ 19:33:42UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 18:11:34UTC : 🔄 @ 19:43:40UTC : ❌ Workflow failed. |
|
/unit_test_py |
|
/integration_test |
GiGL Automation@ 19:56:48UTC : 🔄 @ 21:03:45UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 19:57:03UTC : 🔄 @ 21:32:55UTC : ❌ Workflow failed. |
| "supervision_edge_type must not be provided when using Graph Store mode. " | ||
| "The supervision edge types are inferred from the ABLPInputNodes label keys in input_nodes." |
There was a problem hiding this comment.
Is there a particular reason we want to be raising an error here if it is provided? I thought it would be sufficient to just validate that this is the same as what is provided in input_nodes in the case that it is also provided here, rather than raise an error directly.
| # Validate that the negative label edge types (if present) correspond to the | ||
| # same supervision edge types as the positive labels. |
There was a problem hiding this comment.
I wonder, since we need to do this, if it'd be better to have some labeled_nodes that is a Dict[EdgeType, Tuple(torch.Tensor, Optional[torch.Tensor]). That way, they are guaranteed to have the same sueprvision edge types and consumers won't be able to make this error.
| logger = Logger() | ||
|
|
||
| DEFAULT_TIMEOUT_SECONDS: Final[float] = 60.0 * 10 # 10 minutes | ||
| DEFAULT_TIMEOUT_SECONDS: Final[float] = 60.0 * 30 # 30 minutes |
There was a problem hiding this comment.
qq: Why did we need to bump this?
Scope of work done
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO