⚡️ Optimizing GPMA #53
Replies: 9 comments 26 replies
-
BaseDataset: Foorah_large_8 The code below attempts to evaluate the amount of time the The output for the above code is
|
Beta Was this translation helpful? Give feedback.
-
🤔 What is the need for a graph datastructureWhat does Naive doNaive contains CSR representations of the graph for different timestamps on the CPU. At every timestamp, the following operations are performed
[Move to GPU step] The row_offset, col_indices, eids of a timestamp has to be moved to the GPU. What does GPMA doGPMA maintains a CSR structure on the GPU. At every timestamp, the following operations are performed
[Move to GPU step] We know that for an update to be performed (insertion, deletion, value updation) the data has to be organize in three lists of src, dst, value. This then has to moved to the GPU and then operated upon. ConclusionThe relevance of using the graph datastructure is to tradeoff the time taken in the movement of large CSR arrays to GPU (in naive) with the more faster movement of smaller updates to the GPU (in GPMA). That means GPMA should ideally be faster than the Naive approach. It may be noted that in the case of Naive it is possible to pin memory of each of those CSR arrays, however due to limitations of pinning large amounts of memory it is probably better to just pin the updates. |
Beta Was this translation helpful? Give feedback.
-
📌 Pinned MemoryWe will store the updates in pinned memory for fast transfer to the GPU. Was able to achieve this using the following code and it seemed to have parallel performance with pytorch's pinned memory movement. ImplementationI have implemented a Note: We will modify the approach to storing redundant |
Beta Was this translation helpful? Give feedback.
-
FINALLY YESS 🥳
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
🫙 GPMA StorageRan a test for GPMA on
Some more additional info is as given
How much data does the graph comprise ofI ran a separate script to verify how much GPU data is used to store Okay I ran a separate check and it seems that the size of the variable that stores edges_lst takes only 71 MB. While this does feel a bit pointless in trying to save this amount of space, I'm hoping we can see some gains in certain usecases and it always does help to optimize 😉 So our GPMA needs to take less that 71 MB |
Beta Was this translation helpful? Give feedback.
-
Edge labelling & Faster backward graph generationLabelling of edges and building of a labelled backward graph in an efficient manner is discussed #56. We noticed that there was a considerable speed-up with the introduction of this feature, we believe it is primarily due to the speed of generating the labelled backward graph. The results are as shown below.
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
One last bug 🐛GPMA goes out of memory for
|
Beta Was this translation helpful? Give feedback.







Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Here we will try to optimize the GPMA implementation for better speedup. Couple of areas to look at
[NEW IDEA] A new approach that is possible is to insert all possible edges into the GPMA and then only update the values at every time, we could avoid the rebalancing and reallocation overhead of GPMA.
Beta Was this translation helpful? Give feedback.
All reactions