There are many things with Python that are left to go over, so the tutorial will look at Jupyter notebooks and the Pandas library.
The goals for this tutorial are:
- Download data from LIGO and know what an HDF5 file is.
- Use the NumPy and Pandas packages.
You can run Jupyter notebooks in the Codespace.
Open the first notebook in a tab (either click on the file in the left-side file browser, or use the code command at the command line).
When the notebook opens, there should be a button at the top left to "Select kernel".
Click it and then choose the second option, the installed python version.
And this point you can execute the code cells in the notebook by putting the curser in the cell and typing Ctrl-Return.
There is an extremely useful tool for working with CSV data called Pandas. Open and work through the second notebook.
Finally, lets look at some scientific data from a major facility. This is in the third notebook
- Advanced HDF5 with h5py Tutorial
- HDF5 Field Guide
- Basics of Pandas
- Gravational Wave Open Science Center
Scientific Computing
- HPC Carpentry Introduction to High-Performance Computing
- News article on design of a new HPC system at TACC: With Vista, TACC now has three paths to its future horizon superomputer (2024)
- BOINC distributed scientific computing
- SETI@Home is the original distributed computing project. No longer distributing tasks, though.
- TACC's Frontera User Guide
Networking Performance Links
-
Latency and IP by Geoff Huston, September 2000.
-
High latency, but high throughput (i.e. putting data on a disk drive and ship it) examples are Amazon Snowball and Snowmobile. Quote: "This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks."
-
One can use non-TCP and non-IP protocols like Internet2, ESnet (DOE), NSF FABRIC Testbed, or custom (e.g. Facebook
-
Popular press article: Connecting the South Pole
-
Jupyter Notebooks – a publishing format for reproducible computational workflows (Kluyver, et al.) Original paper introducing Jupyter notebooks.