Setting up a dunedaq v5.1.0 Development Area

01-Aug-2024 - Work in progress (the following steps have been verified to almost work) (Thanks to Kurt for making a wonderful "getting started" template)

Reference links:

general development: software development workflow, DUNE DAQ Software Style Guide
suggested Spack commands to learn about the characteristics of an existing software area are available here
an introduction to the "assets" system, which we use to store files that are not code, is here
testing: NP04 computer inventory
other: Working Group task lists, List of DUNE-DAQ GitHub teams and repos
Main grafana dashboard
Tag Collector
OKS System Description
DBE Editor Documentation

create a new software area based on the latest nightly build (see step 1.iv for the exact dbt-create command to use)
1. The steps for this are based on the latest instructions for daq-buildtools
2. As always, you should verify that your computer has access to /cvmfs/dunedaq.opensciencegrid.org
3. If you are using one of the np04daq computers, and need to clone packages, add the following lines to your .gitconfig file (once you do this, there will be no need to activate the web proxy each time you log in, and this means that you won't forget to disable it...):
```
[http]
  proxy = http://np04-web-proxy.cern.ch:3128
  sslVerify = false
```
4. Here are the steps for creating the new software area:
```
cd <directory_above_where_you_want_the_new_software_area>
source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt latest_v5
dbt-create fddaq-v5.1.0-a9      # or for a nightly: dbt-create -n NFD_DEV_240725_A9
cd fddaq-v5.1.0-a9              # or cd NFD_DEV_240725_A9
```
Do not add -q to the dbt-create command as we need a local Python environment to install drunc
1. Please note that if you are following these instructions on a computer on which the DUNE-DAQ software has never been run before, there are several system packages that may need to be installed on that computer. These are mentioned in this script. To check whether a particular one is already installed, you can use a command like yum list libzstd and check whether the package is listed under Installed Packages.

add the needed repositories to the /sourcecode area. If you want to be able to modify the test-session configuration below, or if you are updating the appmodel scheam you will need to clone the appmodel package. In order to run the unit tests mentioned below, you will need to clone the dfmodules package. To just run the integration tests or the test-session as defined in the release, you will not need to clone any packages.

clone the repositories you will edit (the following block has some extra directory checking; it can all be copy/pasted into your shell window)

# change directory to the "sourcecode" subdir, if possible and needed
if [[ -d "sourcecode" ]]; then
    cd sourcecode
fi
# double-check that we're in the correct subdir
current_subdir=`echo ${PWD} | xargs basename`
if [[ "$current_subdir" != "sourcecode" ]]; then
    echo ""
    echo "*** Current working directory is not \"sourcecode\", skipping repo clones"
else
    # finally, do the repo clone(s)
    # We always get appmodel so that we can look at the configurations
    # If you want to run the dfmodules unit tests, clone dfmodules as well
    # appmodel and dfmodules are used as examples
    git clone https://github.com/DUNE-DAQ/appmodel.git -b develop
    git clone https://github.com/DUNE-DAQ/dfmodules.git -b develop
    cd ..
fi

setup the work area and build the software. NB: even if you haven't checked out any packages the dbt-build is necessary to install the rte script passed to the applications started by drunc
```
source env.sh
dbt-build -j 20
dbt-workarea-env
```
dfmodules contains unit tests which have been updated to use OKS, they can be run with
```
dbt-unittest-summary.sh
```

Integration tests can be run straight from the release with

pytest -s $DUNE_DAQ_RELEASE_SOURCE/daqsystemtest/integtest/minimal_system_quick_test.py

If developing drunc or druncschema, after these are cloned run pip install in the corresponding sourcecode subdirectories. Run dbt-workarea-env in the root of the working directory.
When you return to working with the software area after logging out, the steps that you'll need to redo are the following:
```
cd <work_dir>
source ./env.sh
dbt-build  # if needed
dbt-workarea-env  # if needed
```

Running the `test-session` example configuration from `appmodel` with `drunc`

Prerequisites

The connectivity service currently has statically defined ports, hence you need to check if there are any other drunc users on the physical host you are running on. If there are, when you boot you will likely get an error of

drunc.utils.grpc_utils.ServerUnreachable: ('failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:3333: connection attempt timed out before receiving SETTINGS frame', 14)

To resolve this issue, the current recommendation is to use a different physical host on which there are no other drunc users.

Drunc Configuration

We use a configuration coming from the installed drunc package to start up the process_manager. The choices for the test configuration include

ssh-standalone.json - ssh process manager with no kafka connection
ssh-CERN-kafka.json - ssh process manager with connection to the kafka instance running on the np04 cluster
ssh-pocket-kafka.json - ssh process manager with connection to the kafka instance running in pocket.

Then to run the drunc CLI, run for example

drunc-unified-shell ssh-standalone  test/config/test-session.data.xml test-session

This sets up drunc's unified shell the arguments are:

ssh-standalone - a shorthand for the process manager configuration ssh-standalone.json
test/config/test-session.data.xml - appmodel configuration file
test-session - session name. Note that the session name refers to the name of the Session object in OKS configuration database and is not just an arbitrary string.

Boot

You can now boot test-session in the same terminal using the boot command. This command parses the configuration file and looks for the session name (both of which were provided to the drunc unified shell arguments).

Finite State Machine (FSM) sequences

The FSM will then take all the commands for operating the run control. An outline of the operating instructions can be found in https://github.com/DUNE-DAQ/drunc/wiki/Running-drunc.

Some examples:

conf
start run_number 101
enable-triggers

Specific FSM documentation can be found here

The `test-session` Example Configuration (And setting up your own)

The current example configuration is here: https://github.com/DUNE-DAQ/appmodel/blob/develop/test/config/test-session.data.xml.
It consists of several files in both appmodel/test/config (configuration-specific) and appmodel/config/appmodel (somewhat more generic)
1. test-session.data.xml is the main entry point for this configuration; it includes the other data files and defines the high-level objects like the Session, the DetectorConfig, and some of the Service entries.
2. hosts.data.xml defines the hosts and processing resources
3. wiecconfs.data.xml defines the "front end" electronics configuration
4. ru-segment.data.xml defines the readout applications (ru-01, ru-02, ru-03)
5. df-segment.data.xml defines the dataflow applications (df-01, df-02, df-03 and dfo-01) and their module configurations except
6. trigger-segment.data.xml defines the trigger applications (tc-maker-1, mlt and hsi-to-tc-app) and its module configurations
7. data-store-params.data.xml defines the output file configuration for the dataflow apps
8. fsm.data.xml defines the state machine and the supported transition commands
9. connections.data.xml defines the Network and Queue connection rules used to generate appropriate endpoints in the SmartDaqApplications
10. moduleconfs.data.xml contains DAQ Module configuration objects for readout, dataflow, and trigger
These files are placed in the install/appmodel/share/test/config directory upon build (via the daq_install() CMake command, based on their location in appmodel/test/config), where OKS can find them
To create a new configuration, we are currently limited to copying files from an existing configuration, but tools are in development to generate the readout map from an existing one.
All OKS configurations are loaded from files relative to the current working director and the paths listed in DUNEDAQ_DB_PATH. If you are preparing a configuration in a separate directory you may want to prepend it to the list in DUNEDAQ_DB_PATH. By default it is set by dbt to include the install directory of any packages you have in sourcecode that have a config or schema directory with .xml files, followed by the packages from the release.

You can test the SmartDaqApplication generation of modules and connections using

listApps test-session test/config/test-session.data.xml
generate_modules_test test-session <app_name> test/config/test-session.data.xml # <app_name> is one of the apps listed by listApps

Editing database files

The data and schema XML files can be edited manually or by running the graphical dbe editors dbe_main and schemaeditor. To enable use of the dbe editors, you must first run the command spack load dbe then dbe_main -f /path/to/file for editing data files or schemaeditor -f <path to file> for editing schema files.

NB: Spack loading the dbe package updates your environment in ways that may affect the running of other commands. It is recomemned to do this in a separate shell or window.

Running with CRP4 as example with real hardware

Log into the EHN1 DAQ cluster.
setup a work area with a fresh nightly (NFD_DEV_240701_A9++) as above
Clone the EHN1 config repository and source its setup_db_path.sh script
Start drunc as above but boot sessions/crp4-session.data.xml crp4-oks-session
You will be able to look at monitoring metrics and errors using grafana (v5 dashboards).

Running `listrev` with `drunc`

For now, you need to check out the gcrone/drunc-config branch of listrev as the configuration files in the release only work with nanorc
```
cd sourcecode
git clone https://github.com/DUNE-DAQ/listrev.git -b gcrone/drunc-config
cd -
dbt-workarea-env
dbt-build
```
Launch drunc-unified-shell as in the test-session above, with the listrev configuration and session name:
```
drunc-unified-shell ssh-standalone config/lrSession.data.xml lr-session
```
Boot, start a run, wait a while and stop the run

Evaluating the `listrev` Run

grep Exiting log_*lr-session_listrev* Will show the reported statistics.
The example is targeted at 100 Hz, so the expected number of messages seen by ReversedListValidator should be at least 100 times the run duration. There should be three lists in each message (from the three generators), so it should report 300 times the run duration for the number of lists.
Messages are round-robined to the two reversers, so each should see 50run_duration messages and 150run_duration lists. They should have approximately equal values for the reported counters.
Generators should generate 100*run_duration lists and send all (or almost all) of them.

Useful DBT and Spack commands for software areas

dbt-info release # prints out the release type and name, and the base release name (version)
dbt-info package <dunedaq_package_name> # prints out the package version and commit hash used by the release
dbt-info sourcecode # prints out the branch names of source repos under sourcecode, and marks those with local changes with "*"
spack find --loaded -N <external_package_name>, e.g. spack find --loaded -N boost # prints out the version of the specified external package that is in use in the current software area
spack info fddaq # prints out the packages that are included in the fddaq bundle for the current software area
spack info dunedaq # prints out the packages that are included in the dunedaq (common) bundle for the current software area

Also see here.

Monitoring the system

When running with nanorc, metrics reports appear in the info_*.json files that are produced (e.g. info_dataflow_<portno>.json). We can collate these, grouped by metric name, using python -m opmonlib.info_file_collator info_*.json (default output file is opmon_collated.json).

It is also possible to monitor the system using a graphic interface.

Steps to enable and view TRACE debug messages

Here are suggested steps for enabling and viewing debug messages in the TRACE memory buffer:

set up your software area, if needed (e.g. cd <work_dir>; source ./dbt-env.sh ; dbt-workarea-env)
export TRACE_FILE=$DBT_AREA_ROOT/log/${USER}_dunedaq.trace
- this tells TRACE which file on disk to use for its memory buffer, and in this way, enables TRACE in your shell environment and in subsequent runs of the system with nanorc.
run the application using the nanorc commands described above
- this populates the list of already-enabled TRACE levels so that you can view them in the next step
run tlvls
- this command outputs a list of all the TRACE names that are currently known, and which levels are enabled for each name
- TRACE names allow us to group related messages, and these names typically correspond to the name of the C++ source file
- the bitmasks that are relevant for the TRACE memory buffer are the ones in the "maskM" column
enable levels with tonM -n <TRACE NAME> <level>
- for example, tonM -n DataWriter DEBUG+5 (where "5" is the level that you see in the TLOG_DEBUG statement in the C++ code)
re-run tlvls to confirm that the expected level is now set
re-run the application
view the TRACE messages using tshow | tdelta -ct 1 | more
- note that the messages are displayed in reverse time order

A couple of additional notes:

For debug statements in our code that look like TLOG_DEBUG(5) << "test, test";, we would enable the output of those messages using a shell command like tonM -n <TRACE_NAME> DEBUG+5. A couple of notes on this...
- when we look at the output of the bitmasks with the tlvls command, bit #5 is going to be offset by the number of bits that TRACE and ERS reserve for ERROR, WARNING, INFO, etc. messages. At the moment, the offset appears to be 8, so the setting of bit "DEBUG+5" corresponds to setting bit #13.
- when we view the messages with tshow, one of the columns in its output shows the level associated with the message (the column heading is abbreviated as "lvl"). Debug messages are prefaced with the letter "D", and they include the number that was specified in the C++ code. So, for our example of level 5, we would see "D05" in the tshow output for the "test, test" messages.
There are many other TRACE 'commands' that allow you to enable and disable messages. For example,
- tonMg <level> enables the specified level for all TRACE names (the "g" means global in this context)
- toffM -n <TRACE NAME> <level> disables the specified level for the specified TRACE name
- toffMg <level> disables the specified level for all TRACE names
- tlvlM -n <TRACE name> <mask> explicitly sets (and un-sets) the levels specified in the bitmask

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting up a dunedaq v5.1.0 Development Area

Running the `test-session` example configuration from `appmodel` with `drunc`

Prerequisites

Drunc Configuration

Boot

Finite State Machine (FSM) sequences

The `test-session` Example Configuration (And setting up your own)

Editing database files

Running with CRP4 as example with real hardware

Running `listrev` with `drunc`

Evaluating the `listrev` Run

Useful DBT and Spack commands for software areas

Monitoring the system

Steps to enable and view TRACE debug messages

Clone this wiki locally

Setting up a dunedaq v5.1.0 Development Area

Running the test-session example configuration from appmodel with drunc

Prerequisites

Drunc Configuration

Boot

Finite State Machine (FSM) sequences

The test-session Example Configuration (And setting up your own)

Editing database files

Running with CRP4 as example with real hardware

Running listrev with drunc

Evaluating the listrev Run

Useful DBT and Spack commands for software areas

Monitoring the system

Steps to enable and view TRACE debug messages

Clone this wiki locally

Running the `test-session` example configuration from `appmodel` with `drunc`

The `test-session` Example Configuration (And setting up your own)

Running `listrev` with `drunc`

Evaluating the `listrev` Run