Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions _episodes/02-creating-a-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ There are a number of different kinds of factories available in JANA which we ha
4. It requires a deeper understanding of JANA internals to use correctly. The user is allowed to perform actions inside the factory callbacks that don't necessarily make sense. We remedied this issue by developing `JOmniFactory`, which *declares* what it needs upfront, and JANA *provides* it only when it makes sense. `JOmniFactory` supports all of the functionality developed for points (1), (2), and (3), and presents a simpler interface.


In summary, always use `JOmniFactory` if you are writing something new. All existing factories in EICrecon are in the process of being migrated right now: https://github.com/eic/EICrecon/issues/1176.
In summary, always use `JOmniFactory` if you are writing something new. The migration of all EICrecon factories to `JOmniFactory` (tracked in https://github.com/eic/EICrecon/issues/1176) is essentially complete, so you should not encounter the older base classes in current code.


## The JOmniFactory interface
Expand Down Expand Up @@ -102,16 +102,16 @@ public:
// The logger, parameters, and services have all been fetched before this is called
}

void ChangeRun(int64_t run_number) {
void ChangeRun(int32_t run_number) {
// This is called whenever the run number is changed.
// Use this callback to retrieve state that is keyed off of run number.
}

void Process(int64_t run_number, uint64_t event_number) {
void Process(int32_t run_number, uint64_t event_number) {
// This is called on every event.
// Use this callback to call your Algorithm using all inputs and outputs
// The inputs will have already been fetched for you at this point.
// m_algo->execute(...);
// m_algo->process({...}, {...});

logger()->debug( "Event {}: Calling Process()", event_number );
}
Expand All @@ -120,21 +120,21 @@ public:

## The JOmniFactory inputs and outputs

The user specifies the JOmniFactory's inputs by declaring `PodioInput` or `VariationalPodioInput` objects as data members. These are templated on the basic PODIO type (Not the collection type or mutable type or object type or data type), and require the user to pass `this` as a constructor argument. These objects immediately register themselves with the factory, so that the factory always knows exactly what data it needs to fetch. To access the data once it has been fetched, the user can call the object's `operator()`, which returns a constant pointer to a PODIO collection of the correct type. For instance, suppose the user declares the data member:
The user specifies the JOmniFactory's inputs by declaring `PodioInput` or `VariadicPodioInput` objects as data members. These are templated on the basic PODIO type (Not the collection type or mutable type or object type or data type), and require the user to pass `this` as a constructor argument. These objects immediately register themselves with the factory, so that the factory always knows exactly what data it needs to fetch. To access the data once it has been fetched, the user can call the object's `operator()`, which returns a constant pointer to a PODIO collection of the correct type. For instance, suppose the user declares the data member:

```c++
PodioInput<MCParticles> m_particles_in {this};
PodioInput<edm4hep::MCParticle> m_particles_in {this};
```

In this case, the user would access the input data like this:

```c++
const MCParticlesCollection* particles_in = m_particles_in();
const edm4hep::MCParticleCollection* particles_in = m_particles_in();
```

Of course, for brevity, the user could simply write this instead:
Of course, for brevity, the user could simply pass `m_particles_in()` straight into the algorithm, and write the algorithm output through `m_particles_out().get()` — which is the pattern most factories use today:
```c++
m_particles_out() = smearing_algo->execute( m_particles_in() );
m_algo->process({m_particles_in()}, {m_particles_out().get()});
```

As you have just seen, PodioOutputs are very analogous to PodioInputs.
Expand Down
28 changes: 15 additions & 13 deletions _episodes/03-calling-a-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Instead of handing over the OmniFactory to JANA directly, we create a `JOmniFact
Here is how you set up a factory generator:

```c++
app->Add(new JOmniFactoryGeneratorT<MC2SmearedParticle_factory>(
app->Add(new JOmniFactoryGeneratorT<MC2ReconstructedParticle_factory>(
"GeneratedParticles",
{"MCParticles"},
{"GeneratedParticles"},
Expand All @@ -31,7 +31,7 @@ In this example, "GeneratedParticles" is the factory instance's unique tag, `{"M

- If you are only creating one instance of this factory, feel free to use the "primary" output collection name as the factory prefix. (This has to be unique because PODIO collection names have to be unique.)

- Collection names are positional, so they need to be in the same order as the `PodioInput` and `VariationalPodioInput` declarations in the factory.
- Collection names are positional, so they need to be in the same order as the `PodioInput` and `VariadicPodioInput` declarations in the factory.

- Variadic inputs are a little bit interesting: You can have any number of variadic inputs mixed in among the non-variadic inputs, as long as there are the same number of collection names for each variadic input. If this confuses you, just restrict yourself to one variadic input and put it as the very last input, like most programming languages do.

Expand All @@ -54,15 +54,17 @@ eicrecon -Ppodio:output_collections=MyNewCollectionName1,MyNewCollectionName2 in

#### To permanently include your factory's outputs in the output file:

Add your collection name to the `output_collections` list in src/services/io/podio/JEventProcessorPODIO.cc:44
Add your collection name to the `output_collections` list in `src/services/io/podio/JEventProcessorPODIO.cc`:
```c++
std::vector<std::string> output_collections={
"EventHeader",
"MCParticles",
"CentralTrackingRecHits",
"CentralTrackSeedingResults",
"CentralTrackerMeasurements",
//...
std::vector<std::string> output_collections = {
// Header and other metadata
"EventHeader",

// Truth record
"MCParticles",
"MCBeamElectrons",
"MCBeamProtons",
// ...
```

### To temporarily use your factory's outputs as inputs to another factory
Expand All @@ -73,9 +75,9 @@ eicrecon -Ptargetfactory:InputTags=MyNewCollectionName1,MyNewCollection2 in.root

### To permanently use your factory's outputs as inputs to another factory

Change the collection name in the `OmniFactoryGeneratorT` or `JChainMultifactoryGeneratorT`:
Change the collection name in the `JOmniFactoryGeneratorT`:
```c++
app->Add(new JOmniFactoryGeneratorT<MC2SmearedParticle_factory>(
app->Add(new JOmniFactoryGeneratorT<MC2ReconstructedParticle_factory>(
"GeneratedParticles",
{"MCParticlesSmeared"}, // <== Used to be "MCParticles"
{"GeneratedParticles"},
Expand All @@ -87,7 +89,7 @@ Change the collection name in the `OmniFactoryGeneratorT` or `JChainMultifactory
> - Create a JOmniFactoryGenerator for your ElectronReconstruction factory
> - Give your factory's output collection a fun name
> - Call your factory from the command line and verify that you see its logger output.
> - Add it to the `JEventSourcePODIO::output_collections`, so that it gets called automatically.
> - Add it to the `JEventProcessorPODIO::output_collections`, so that it gets called automatically.
> - Experiment with multiple factory generators so you can have multiple instances of the same factory
{: .challenge}

8 changes: 4 additions & 4 deletions _episodes/04-parameterizing-a-factory.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,17 @@ Parameters are also handled using registered members. JOmniFactory provides a `P
ParameterRef<std::string> m_energyWeight {this, "energyWeight", config().energyWeight};
```

Parameters are fetched immediately before `Init()` is called, so you may access them from any of the callbacks like so:
Parameters are fetched immediately before `Configure()` is called, so you may access them from any of the callbacks like so:

```c++
void Process(int64_t run_number, uint64_t event_number) {
void Process(int32_t run_number, uint64_t event_number) {
logger()->debug( "Event {}: samplingFraction = {}", event_number, m_samplingFraction() );
}

```
Because we are using ParameterRefs, we can also access the field the ref points to directly:
```c++
void Process(int64_t run_number, uint64_t event_number) {
void Process(int32_t run_number, uint64_t event_number) {
logger()->debug( "Event {}: samplingFraction = {}", event_number, config().sampFrac );
}
```
Expand Down Expand Up @@ -91,4 +91,4 @@ Oftentimes we want to retrieve a resource from a Service and refresh it whenever
> - Give your factory a Config struct
> - Give your Config struct some parameters
> - Experiment with overriding parameter values in the generator and on the command line.
{: .challenge}
{: .challenge}
105 changes: 69 additions & 36 deletions _episodes/05-adding-an-algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ objectives:

## The difference between a factory and an algorithm

*Algorithms* are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. The core of an Algorithm is a method called `execute` which inputs some PODIO collections and outputs some other PODIO collections. Algorithms don't know or care where the inputs come from and where they go. Algorithms also don't know much about where their parameters come from; rather, they are passed a `Config` structure which contains the parameters' values. The nice thing about algorithms is that they are simple to design and test, and easy to reuse for different detectors, frameworks, or even entire experiments.
*Algorithms* are classes that perform one kind of calculation we need and they do so in a generic, framework-independent way. The core of an Algorithm is a method called `process` which takes a tuple of input PODIO collections and a tuple of output PODIO collections. Algorithms don't know or care where the inputs come from and where they go. Algorithms also don't know much about where their parameters come from; rather, they are passed a `Config` structure which contains the parameters' values. The nice thing about algorithms is that they are simple to design and test, and easy to reuse for different detectors, frameworks, or even entire experiments.


Most of what makes an Algorithm an Algorithm is convention. (These are largely inspired by the KISS principle in software engineering!) There is an ongoing effort to create a "framework-less framework" for formally expressing Algorithms using templates, which lives at https://github.com/eic/algorithms. Eventually, we may encourage users to have all Algorithms inherit from the `Algorithm<Input<...>, Output<...>>` templated interface. For now, however, just follow the Algorithm conventions that we will go over next.
In EICrecon, all new algorithms inherit from the templated `algorithms::Algorithm<Input<...>, Output<...>>` interface (provided by the [eic/algorithms](https://github.com/eic/algorithms) "framework-less framework"), and use the `WithPodConfig<ConfigT>` mixin to attach a configuration struct. The `algorithms::Algorithm` base provides logging facilities (`info()`, `debug()`, `trace()`, ...) and a structured way of declaring inputs and outputs by their PODIO collection types.

## Where to put the algorithm code

Expand All @@ -29,32 +29,68 @@ Here is a template for an algorithm header file:

#pragma once

// #include relevant header files here
#include <algorithms/algorithm.h>
// #include relevant edm4eic / edm4hep collection headers here

#include "MyAlgorithmNameConfig.h"
#include "algorithms/interfaces/WithPodConfig.h"

namespace eicrecon {

class MyAlgorithmName {
using MyAlgorithmNameAlgorithm =
algorithms::Algorithm<algorithms::Input<MyInputCollection>,
algorithms::Output<MyOutputCollection>>;

class MyAlgorithmName : public MyAlgorithmNameAlgorithm,
public WithPodConfig<MyAlgorithmNameConfig> {

public:

// init function contains any required initialization
void init();

// execute function contains main algorithm processes
// (e.g. manipulate existing objects to create new objects)
std::unique_ptr<MyReturnDataType> execute();

// Any additional public members go here
MyAlgorithmName(std::string_view name)
: MyAlgorithmNameAlgorithm{name,
{"inputCollectionName"},
{"outputCollectionName"},
"Short description of what this algorithm does."} {}

// init() is called once before processing starts. Most algorithms do not need it.
void init() final {};

private:
std::shared_ptr<spdlog::logger> m_log;
// any additional private members (e.g. services and calibrations) go here
// process() does the actual work for each event. The Input/Output tuples
// contain pointers to the PODIO collections.
void process(const Input&, const Output&) const final;

};
} // namespace eicrecon

~~~

A few things worth noting:

- The class is *templated* on the list of input and output collection types. The `Input` and `Output` aliases inside the class expand into `std::tuple` of pointers (`gsl::not_null<const T*>` for inputs, `T*` for outputs).
- `process()` is `const` — algorithms must not mutate their own state during event processing. Run-by-run state should be set up in `init()` instead.
- Logging is inherited from `algorithms::AlgorithmBase`, so inside `process()` you simply call `info(...)`, `debug(...)`, or `trace(...)` directly — no logger pointer needs to be passed in.
- The configuration struct is held by `WithPodConfig` and accessible as the protected member `m_cfg`.

The corresponding implementation file unpacks the input and output tuples with structured bindings:

~~~ c++

#include "MyAlgorithmName.h"

namespace eicrecon {

void MyAlgorithmName::process(const Input& input, const Output& output) const {

const auto [in_particles] = input;
auto [out_particles] = output;

// ... fill out_particles using in_particles and m_cfg ...
}

} // namespace eicrecon

~~~

## How to call an algorithm from a factory

The code to call an algorithm from a factory generally follows a specific pattern:
Expand All @@ -64,37 +100,34 @@ The code to call an algorithm from a factory generally follows a specific patter
// This is called when the factory is instantiated.
// Use this callback to make sure the algorithm is configured.
// The logger, parameters, and services have all been fetched before this is called
m_algo = std::make_unique<eicrecon::ElectronReconstruction>();

// Pass config object to algorithm
m_algo->applyConfig(config());
// Construct the algorithm with the factory's prefix as its name —
// this is what hooks the algorithm's logger up to the same prefix as the factory.
m_algo = std::make_unique<eicrecon::ElectronReconstruction>(GetPrefix());

// If we needed geometry, we'd obtain it like so
// m_algo->init(m_geoSvc().detector(), m_geoSvc().converter(), logger());
// Forward the JANA log level down to the algorithm.
m_algo->level(static_cast<algorithms::LogLevel>(logger()->level()));

// Pass the config object to the algorithm.
m_algo->applyConfig(config());

m_algo->init(logger());
// Call init() once. Note that init() takes no arguments — services
// (e.g. geometry) are accessed by the algorithms framework via the
// algorithms::ServiceSvc / algorithms::GeoSvc, not by passing pointers in.
m_algo->init();
}

void Process(int64_t run_number, uint64_t event_number) {
// This is called on every event.
// Use this callback to call your Algorithm using all inputs and outputs
// The inputs will have already been fetched for you at this point.
auto output = m_algo->execute(
m_in_mc_particles(),
m_in_rc_particles(),
m_in_rc_particles_assoc(),
m_in_clu_assoc()
);

m_out_reco_particles() = std::move(output);
// JANA will take care of publishing the outputs for you.
void Process(int32_t /* run_number */, uint64_t /* event_number */) {
// This is called on every event. The inputs will have already been fetched.
// Call process() with brace-enclosed lists of input pointers and output pointers.
m_algo->process({m_in_particles()}, {m_out_particles().get()});
}
```


> Exercise:
> - Create your own ElectronReconstruction algorithm using the code skeleton above.
> - Print some log messages from your algorithm's `execute()` method.
> - Print some log messages from your algorithm's `process()` method using `info(...)` / `debug(...)`.
> - Have your ElectronReconstruction factory call the algorithm.
> - Run this end-to-end.
{: .challenge}
{: .challenge}
4 changes: 2 additions & 2 deletions _episodes/06-working-with-podio.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ subset_clusters->push_back(cluster);
```


Note that when you write a factory, its inputs will be `const ExampleHitCollection*`, which are *immmutable*.
Its output will be `std::unique_ptr<ExampleHitCollection>`, which is still mutable but will transfer its ownership to JANA2. JANA2 will add the collection to a podio `Frame`. From that point on, the collection is immutable and owned by the `Frame`.
Note that when you write a factory, its inputs will be `const ExampleHitCollection*`, which are *immutable*.
Its output is held by a `PodioOutput<ExampleHit>` member; calling `m_output()` returns a `std::unique_ptr<ExampleHitCollection>&` that the factory can mutate, and `m_output().get()` gives a raw mutable pointer that you can hand to your algorithm's `process()` method. After `Process()` returns, JOmniFactory transfers ownership of that collection to JANA2, which adds it to a podio `Frame`. From that point on, the collection is immutable and owned by the `Frame`.

JANA2 will create and destroy `Frame`s internally.

Expand Down
Loading