-
Notifications
You must be signed in to change notification settings - Fork 248
Update local-execution to newest task-spec #1625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update local-execution to newest task-spec #1625
Conversation
… equivalent dataflow, expand.
Refactor computation graph instance as DTG. Cut out more stuff that doesn't compile yet.
…taflow_graph_from_cg.
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few comments, but most are minor style issues--only a few comments of real importance. I also pushed a couple changes directly before I remembered that this is still one of your earlier PRs so I should really leave comments so you're aware of things. Let me know when it's ready for another round of review--the next round should be quick for me to review and will very likely get merged without additional changes needed.
@lockshaw reviewed 80 files and all commit messages, and made 33 comments.
Reviewable status: all files reviewed, 30 unresolved discussions (waiting on @elliottslaughter).
lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 25 at r1 (raw file):
struct ComputationGraphInstance { public:
Minor: Not strictly necessary, but explicitly deleting the default constructor can be nice for readability
Suggestion:
public:
ComputationGraphInstance() = delete;lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 30 at r1 (raw file):
OptimizerAttrs const &, std::optional<LossAttrs> const &, std::optional<GenericTensorAccessorW>);
Minor: Generally we make all constructors explicit unless there's a clear reason not to
Suggestion:
explicit ComputationGraphInstance(std::vector<DynamicNodeInvocation> const &,
Allocator &,
OptimizerAttrs const &,
std::optional<LossAttrs> const &,
std::optional<GenericTensorAccessorW>);lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_init_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
For clarity
Suggestion:
if (!task_impl_fn.has_value()) {lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Suggestion:
if (!task_impl_fn.has_value()) {lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Suggestion:
if (!task_impl_fn.has_value()) {lib/local-execution/include/local-execution/task_execution.h line 17 at r1 (raw file):
DeviceType kernel_device_type, PCGOperatorAttrs op_attrs, std::optional<LossAttrs> const &loss_attrs,
Minor: Any reason for deleting all of the argument names?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 62 at r1 (raw file):
dg, value.tensor_guid, value.role)) .second.accessor) .get<GenericTensorAccessorW>();
Minor: In general I'd recommend splitting this up into a couple statements to improve readability, but for a small static function like this it's not a big deal
Code quote:
return assert_unwrap(assert_unwrap(find_output_tensor(
dg, value.tensor_guid, value.role))
.second.accessor)
.get<GenericTensorAccessorW>();lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
input_tensors; std::optional<DynamicValueAttrs> logit_grad_value; if (loss_attrs) {
For readability
Suggestion:
if (loss_attrs.has_value()) {lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
std::optional<DynamicValueAttrs> logit_grad_value; if (loss_attrs) { auto [dg2, label_v, logit_grad_v] = perform_loss_insertion(
For readablity
Suggestion:
auto [loss_inserted_dg, label_v, logit_grad_v] = perform_loss_insertion(lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
device_id_t device_idx) { std::vector<DynamicNodeInvocation> const &execution_order = instance.get_execution_order();
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Suggestion:
std::vector<DynamicNodeInvocation> execution_order =
instance.get_execution_order();lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
FFIterationConfig iteration_config, device_id_t device_idx) { std::vector<DynamicNodeInvocation> const &execution_order =
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
FFIterationConfig iteration_config, device_id_t device_idx) { std::vector<DynamicNodeInvocation> const &execution_order =
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
FFIterationConfig iteration_config, device_id_t device_idx) { std::vector<DynamicNodeInvocation> const &execution_order =
Here as well
lib/local-execution/src/local-execution/task_execution.cc line 17 at r1 (raw file):
namespace FlexFlow { TaskTensorParameter make_task_tensor_parameter_dynamic(
Minor Makes the name a bit clearer
Suggestion:
make_task_tensor_parameter_from_dynamic_slot(lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
TaskTensorParameter make_task_tensor_parameter_dynamic( DynamicTensorSlot slot,
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as const &.
In general, for consistency default to const & unless you have a specific reason for not doing so
Suggestion:
DynamicTensorSlot const &slot,lib/local-execution/src/local-execution/task_execution.cc line 53 at r1 (raw file):
device_id_t device_idx) { std::unordered_map<TaskTensorParameter, DynamicTensorAccessor> tensor_slots_backing;
I think this can be converted to a binary_merge_disjoint_maps of the result of a composition of a map_keys and map_values (it might actually be a good idea to make a map_keys_and_values that is simply a composition of the two, as iirc composing map_keys and map_values is a reasonable common operation in the codebase.
lib/local-execution/src/local-execution/task_execution.cc line 54 at r1 (raw file):
std::unordered_map<TaskTensorParameter, DynamicTensorAccessor> tensor_slots_backing; for (auto const &[slot, input] : invocation.inputs) {
Minor If the type isn't clear from the immediate context, I'd recommend leaning toward explicit types for slot and input over the auto expansion
Code quote:
for (auto const &[slot, input] : invocation.inputs) {lib/local-execution/src/local-execution/task_execution.cc line 89 at r1 (raw file):
std::optional<LossAttrs> const &loss_attrs, std::optional<PerDeviceOpState> const &per_device_op_state, FFIterationConfig iteration_config,
Minor pass by const ref unless you have a reason not toa
Code quote:
FFIterationConfig iteration_config,lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
break; default: PANIC("Unhandled DynamicTaskType", fmt::to_string(task_type));
I don't think the fmt::to_string is necessary
Suggestion:
PANIC("Unhandled DynamicTaskType", task_type);lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
SUBCASE("get_tensor") { SUBCASE("get_tensor(TensorSlotName, Permissions::RO, " "TrainingTensorType::FORWARD)") {
Update subcase names to match (or at least not contradict) the updated function signature of get_tensor
Code quote:
SUBCASE("get_tensor(TensorSlotName, Permissions::RO, "
"TrainingTensorType::FORWARD)") {lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
std::optional<std::pair<DynamicNodeInvocation, DynamicValueAttrs>> find_output_tensor(DynamicOpenDataflowGraph const &dg,
I don't really understand this function--if you already have a tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
lib/task-spec/include/task-spec/dynamic_graph/dynamic_tensor_accessor.dtg.toml line 16 at r1 (raw file):
[[values]] name = "accessor_r" type = "::FlexFlow::GenericTensorAccessorR"
FYI You can put key = "some_string" and then dtgen will autogenerate has_some_string and require_some_string (and a couple other) functions that you can use as a sometimes more readable version of has<GenericTensorAccessorR> and get<GenericTensorAccessorR>
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 31 at r1 (raw file):
/*accessor=*/std::nullopt, /*role=*/mk_dynamic_tensor_role_loss(), };
FYI generally we use this syntax, as it's a bit more approachable for new people. That said, not a big deal
Suggestion:
DynamicValueAttrs label_value = DynamicValueAttrs{
/*tensor_guid=*/mk_dynamic_tensor_guid_loss(),
/*parallel_tensor_shape=*/logit_value.parallel_tensor_shape,
/*shard_coord=*/logit_value.shard_coord,
/*accessor=*/std::nullopt,
/*role=*/mk_dynamic_tensor_role_loss(),
};lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
}, /*node_attrs=*/ DynamicNodeAttrs{
Shouldn't loss_attrs go somewhere in here?
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 66 at r1 (raw file):
DynamicOpenDataflowGraph result = dg; result.invocations.insert(loss_invocation); return std::tuple{result, label_value, logit_grad_value};
Especially when you're passing back tuples with multiple values of the same type, it's usually preferred to make a dtgen struct so that the fields have names, which makes code that uses this function safer to write and easier to read
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 6 at r1 (raw file):
namespace FlexFlow { dynamic_layer_guid_t mk_dynamic_layer_guid(layer_guid_t l) {
Minor: Would make the name a bit clearer
Suggestion:
mk_dynamic_layer_guid_for_cg_layerlib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
return dynamic_layer_guid_t{l}; } dynamic_layer_guid_t mk_dynamic_layer_guid_parallel(parallel_layer_guid_t l) {
Minor: Would make the name a bit clearer
Suggestion:
mk_dynamic_layer_guid_for_parallel_layerlib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 12 at r1 (raw file):
return dynamic_layer_guid_t{l}; } dynamic_layer_guid_t mk_dynamic_layer_guid_loss() {
Minor: Would make the name a bit clearer
Suggestion:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_loss() {lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
DynamicValueAttrs{ /*tensor_guid=*/dynamic_tensor_guid_t{tensor}, /*parallel_tensor_shape=*/lift_to_parallel(attrs.shape),
Do we ever need to be able to distinguish between a TensorShape and a ParallelTensorShape for this field? If not this solution is fine, just wanted to confirm since by lifting everything to a ParallelTensorShape we do lose this information
lib/task-spec/src/task-spec/dynamic_graph/dynamic_tensor_guid_t.cc line 6 at r1 (raw file):
namespace FlexFlow { dynamic_tensor_guid_t mk_dynamic_tensor_guid(tensor_guid_t t) {
Makes the name a bit less ambiguous
Suggestion:
mk_dynamic_tensor_guid_for_tensor_guidlib/task-spec/src/task-spec/dynamic_graph/dynamic_tensor_guid_t.cc line 10 at r1 (raw file):
} dynamic_tensor_guid_t mk_dynamic_tensor_guid_parallel(parallel_tensor_guid_t t) {
Makes the name a bit less ambiguous
Suggestion:
mk_dynamic_tensor_guid_for_parallel_tensor_guidlib/task-spec/src/task-spec/dynamic_graph/dynamic_tensor_guid_t.cc line 13 at r1 (raw file):
return dynamic_tensor_guid_t{t}; } dynamic_tensor_guid_t mk_dynamic_tensor_guid_loss() {
Makes the name a bit less ambiguous
Suggestion:
mk_dynamic_tensor_guid_for_loss
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress, left some notes on things I have questions about.
@elliottslaughter made 18 comments and resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 21 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 18 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor Generally we pass composite arguments (or at least arguments that are logically composite--some types may interally have multiple values, but are intended to be treated as opaque handles) as
const &.In general, for consistency default to
const &unless you have a specific reason for not doing so
I guess I was treating this like Copy in Rust (it's a composite type but all the fields are flat POD), but happy to do whatever you want.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
I definitely copied this from elsewhere in the codebase. What's the rule for when this is necessary vs not?
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update subcase names to match (or at least not contradict) the updated function signature of
get_tensor
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 9 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
Do you want this to be consistent with your other name? Because it seems you're asking for:
dynamic_layer_guid_t mk_dynamic_layer_guid_for_cg_layer(layer_guid_t);
dynamic_layer_guid_t mk_dynamic_layer_guid_for_parallel_layer(parallel_layer_guid_t);Personally I would either make it layer/parallel_layer or else cg_layer/pcg_layer, but that's just me.
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't really understand this function--if you already have a
tensor_guid, what is the output tensor you're trying to find? And why does it return a pair of values?
See if the new version in the updated diff makes more sense.
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Shouldn't
loss_attrsgo somewhere in here?
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Do we ever need to be able to distinguish between a
TensorShapeand aParallelTensorShapefor this field? If not this solution is fine, just wanted to confirm since by lifting everything to aParallelTensorShapewe do lose this information
Honestly, I don't know. As far as I'm aware everything downstream uses the ParallelTensorShape, but if you want this represented I can put it in.
lib/local-execution/src/local-execution/device_state_initialization.cc line 84 at r2 (raw file):
device_idx); }); ASSERT(all_nodes_are_initialized(dg));
We can't assert here because not all nodes end up initialized at the end. I made the same mistake when I was initially writing, but I had to remove it. Perhaps you could write a more principled version that only checks certain kinds of nodes for this property, but it felt like more trouble than it's worth.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elliottslaughter resolved 10 discussions.
Reviewable status: 64 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lockshaw reviewed 4 files and all commit messages, and made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @elliottslaughter).
lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 30 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Generally we make all constructors
explicitunless there's a clear reason not to
Testing publishing a response
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elliottslaughter made 10 comments.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lib/local-execution/src/local-execution/local_task_registry.cc line 158 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For clarity
Done.
lib/local-execution/src/local-execution/task_execution.cc line 126 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
I don't think the
fmt::to_stringis necessary
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 84 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readability
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 85 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
For readablity
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 163 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Don't assign reference variables unless you know the cost savings are significant--it just creates too many opportunities for someone to mix up the lifetimes and create really annoying memory bugs
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 185 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Really don't assign the result of nontrivial expressions to refs as it's then really easy to mess up the lifetimes--I'm actually not sure if this code is actually correct or not
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 211 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Update here as well (mostly leaving the additional comment to help me remember to check this next round--normally I don't leave comments on every occurence)
Done.
lib/local-execution/src/local-execution/computation_graph_instance/computation_graph_instance.cc line 236 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Here as well
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 175 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_fwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
lib/local-execution/src/local-execution/local_task_registry.cc line 189 at r1 (raw file):
std::optional<TaskImplFunction> task_impl_fn = get_bwd_task_impl_for_op_attrs(op_attrs); if (!task_impl_fn) {
Done.
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elliottslaughter made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
lib/local-execution/include/local-execution/computation_graph_instance/computation_graph_instance.h line 30 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Testing publishing a response
Test reply
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elliottslaughter made 1 comment.
Reviewable status: 68 of 81 files reviewed, 20 unresolved discussions (waiting on @lockshaw).
elliottslaughter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elliottslaughter made 6 comments and resolved 1 discussion.
Reviewable status: 68 of 81 files reviewed, 19 unresolved discussions (waiting on @lockshaw).
lib/local-execution/include/local-execution/task_execution.h line 17 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Any reason for deleting all of the argument names?
I went back and forth on this one because the rest of the code seems to be inconsistent. What do you prefer?
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 6 at r1 (raw file):
Previously, lockshaw (Colin Unger) wrote…
Minor: Would make the name a bit clearer
I'm not sure this is the most consistent, particularly considering the parallel case of tensors.
I feel like the two consistent options are:
layer/parallel_layer/loss, orcg_layer/pcg_layer/loss
We used option 1 for tensors so that feels the most consistent to me, but let me know if you disagree.
lockshaw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lockshaw reviewed 13 files, made 7 comments, and resolved 11 discussions.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @elliottslaughter).
lib/local-execution/include/local-execution/task_execution.h line 17 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I went back and forth on this one because the rest of the code seems to be inconsistent. What do you prefer?
I have moved over to including the argument names because it enables us to use clang-tidy to check the argument name comments (i.e., the /*my_argument=*/... syntax), and because it enables adding doxygen docstrings (I don't think you can document arguments without names in doxygen, at least not easily)
lib/local-execution/test/src/local-execution/local_task_argument_accessor.cc line 73 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I'm actually not sure how you want me to do this, unless you're literally asking me to write:
SUBCASE("get_tensor(make_task_tensor_parameter_fwd(TensorSlotName::LHS_INPUT), Permissions::RO)")Is that it or do you have a shorter version you want to see here?
You can also just use a prose name, like get_tensor for read-only forward tensor. I just don't want the subcase names to explicitly contradict the signature
lib/task-spec/src/task-spec/dynamic_graph/dynamic_layer_guid_t.cc line 6 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I'm not sure this is the most consistent, particularly considering the parallel case of tensors.
I feel like the two consistent options are:
layer/parallel_layer/loss, orcg_layer/pcg_layer/lossWe used option 1 for tensors so that feels the most consistent to me, but let me know if you disagree.
Yeah, either is fine, I just don't like layer by itself because it's ambiguous (unfortunately we don't have a good name for a non-parallel layer that's shorter than "non_parallel_layer", hence "cg_layer"). Either parallel_layer or pcg_layer is fine to me
lib/task-spec/src/task-spec/dynamic_graph/dynamic_open_dataflow_graph.cc line 88 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
See if the new version in the updated diff makes more sense.
Yup, makes much more sense. Thanks!
lib/task-spec/src/task-spec/dynamic_graph/loss_insertion.cc line 49 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
I was on the fence about this. We could put loss attrs into the dynamic graph. On the other hand, optimizer attrs aren't stored in the graph, and arguably shouldn't be because they need to change on every iteration.
If you want them in the graph I'll add them.
Makes sense. I'd say either (1) include loss attrs in the graph, or (2) remove the loss_attrs argument. Either is fine with me, but having a loss_attrs argument that just isn't used feels weird.
lib/task-spec/src/task-spec/dynamic_graph/make_dynamic_open_dataflow_graph_from_cg.cc line 42 at r1 (raw file):
Previously, elliottslaughter (Elliott Slaughter) wrote…
Honestly, I don't know. As far as I'm aware everything downstream uses the
ParallelTensorShape, but if you want this represented I can put it in.
Nope, if it's working for now that's fine, just wanted to raise the issue to make sure it got considered/kept in mind as we go forward. We can always change it if we discover a case where it is necessary.
lib/local-execution/src/local-execution/device_state_initialization.cc line 85 at r5 (raw file):
}); // FIXME: this assert fails because not all kinds of nodes require // initialization ASSERT(all_nodes_are_initialized(dg));
Got it, in that case it seems like having an all_nodes_are_initialized function doesn't make a whole lot of sense as it doesn't tell us anything, so maybe the answer is just to remove that function. Another option would be to keep it, but to have it incorporate what types of nodes don't need to be initialized. Either is fine, just trying to understand (and if convenient, add a check in the code) the postcondition of this function
Runs end-to-end tests, including loss function. Do not include cost estimator, that will be a separate PR.
Changes to
local-execution:ComputationGraphInstancenow captures all the necessary state to run a computationInitializedComputationGraphInstancehas been removedComputationGraphInstanceruns all passes, allocates and initializes the graphlocal_task_registry.his now static: no dynamic map is retainedChanges to
task-spec:DynamicValueAttrsnow stores aDynamicTensorAccessor(variant ofGenericTensorAccessorRandGenericTensorAccessorW)labelled_open_kwarg_dataflow_graph_from_dynamic_open_dataflow_graphmake_dynamic_open_dataflow_graph_from_cgperform_update_insertionFollow-up PRs:
This change is