Skip to content

Setting up a dunedaq v5.1.0 Development Area

Gordon Crone edited this page Jun 20, 2024 · 24 revisions

20-June-2024 - Work in progress (the following steps have NOT been verified to work) (Thanks to Kurt for making a wonderful "getting started" template)

Reference links:

  1. create a new software area based on the latest nightly build (see step 1.iv for the exact dbt-create command to use)

    1. The steps for this are based on the latest instructions for daq-buildtools

    2. As always, you should verify that your computer has access to /cvmfs/dunedaq.opensciencegrid.org

    3. If you are using one of the np04daq computers, and need to clone packages, add the following lines to your .gitconfig file (once you do this, there will be no need to activate the web proxy each time you log in, and this means that you won't forget to disable it...):

      [http]
        proxy = http://np04-web-proxy.cern.ch:3128
        sslVerify = false
      
    4. Here are the steps for creating the new software area:

      cd <directory_above_where_you_want_the_new_software_area>
      source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
      setup_dbt latest
      dbt-create -q -n NFD_DEV_240620_A9 <work_dir>
      cd <work_dir>
      
    5. Please note that if you are following these instructions on a computer on which the DUNE-DAQ software has never been run before, there are several system packages that may need to be installed on that computer. These are mentioned in this script. To check whether a particular one is already installed, you can use a command like yum list libzstd and check whether the package is listed under Installed Packages.

  2. add the needed repositories to the /sourcecode area.

    1. decide if you want the very latest code, or a more stable set of packages that has been verified to work.

      Run this command to select the very latest code

      export use_very_latest_dunedaq_code=1
      
    2. clone the repositories (the following block has some extra directory checking; it can all be copy/pasted into your shell window)

      # change directory to the "sourcecode" subdir, if possible and needed
      if [[ -d "sourcecode" ]]; then
          cd sourcecode
      fi
      # double-check that we're in the correct subdir
      current_subdir=`echo ${PWD} | xargs basename`
      if [[ "$current_subdir" != "sourcecode" ]]; then
          echo ""
          echo "*** Current working directory is not \"sourcecode\", skipping repo clones"
      else
          # finally, do the repo clone(s)
      
          # We always get appmodel so that we can look at the configurations
          # If you want to run the dfmodules unit tests, clone dfmodules as well
          #if [ ${use_very_latest_dunedaq_code:-0} -eq 1 ]; then
              git clone https://github.com/DUNE-DAQ/appmodel.git -b develop
              git clone https://github.com/DUNE-DAQ/dfmodules.git -b develop
          #else
          #fi
          cd ..
      fi
      
      
  3. setup the work area, possibly install the latest nanorc version, and build the software

    source env.sh
    dbt-build -j 20
    dbt-workarea-env
    
    
  4. dfmodules contains unit tests which have been updated to use OKS, they can be run with

    dbt-unittest-summary.sh
    
  5. When you return to working with the software area after logging out, the steps that you'll need to redo are the following:

    • cd <work_dir>
    source ./env.sh
    dbt-build  # if needed
    dbt-workarea-env  # if needed
    

Running the test-session example configuration from appmodel <=== To be updated for drunc

  1. We still need a configuration "boot.json" file to instantiate applications in NanoRC. Create a configuration directory (e.g. test_config) and create a link to the sample boot.json:

    mkdir test_config && cd test_config
    if [ -f ../sourcecode/appdal/test/boot.json ];then
      ln -s ../sourcecode/appdal/test/boot.json
    else
      wget https://github.com/DUNE-DAQ/appdal/raw/develop/test/boot.json
    fi
    cd ..
    

    A few notes on the sample file:

    • Since this is a pre-generated boot.json, it defaults to starting the Connectivity Service on port 15000.
    • This configuration starts the ru-01, ru-02, ru-03, df-01, df-02, df-03, dfo-01, and ta-01 apps defined in the OKS database
    • The OKS "database" file is in sourcecode/appmodel/test/config and contains the rest of the configuration information.
    • You can test the SmartDaqApplication generation of modules and connections using
      listApps test-session sourcecode/appmodel/test/config/test-session.data.xml
      generate_modules_test test-session <app_name> sourcecode/appmodel/test/config/test-session.data.xml # <app_name> is one of the apps listed by listApps
      
  2. nanorc --partition-number <num> <config name> test-session boot conf wait 10 start_run <run number> wait 60 stop_run scrap terminate

    • e.g. nanorc --partition-number 2 test_config test-session boot conf wait 10 start_run 111 wait 60 stop_run scrap terminate
    • or, you can simply invoke nanorc --partition-number 2 test_config test-session by itself and input the commands individually
    • Note that the session name "test-session" refers to the OKS configuration database https://github.com/DUNE-DAQ/appmodel/blob/develop/test/config/test-session.data.xml

The test-session Example Configuration (And setting up your own)

  • The current example configuration is here: https://github.com/DUNE-DAQ/appmodel/blob/develop/test/config/test-session.data.xml
  • It consists of several files in both appmodel/test/config (configuration-specific) and appmodel/config/appmodel (somewhat more generic)
    1. test-session.data.xml is the main entry point for this configuration; it includes the other data files and defines the high-level objects like the Session, the DetectorConfig, and some of the Service entries.
    2. hosts.data.xml defines the hosts and processing resources
    3. dummy-readoutmap.data.xml defines the "front ends" present in the configuration
    4. ru-segment.data.xml defines the readout applications (ru-01, ru-02, ru-03)
    5. df-segment.data.xml defines the dataflow applications (df-01, df-02, df-03 and dfo-01) and their module configurations except
    6. trigger-segment.data.xml defines the trigger applications (tc-maker-1, mlt and hsi-to-tc-app) and its module configurations
    7. data-store-params.data.xml defines the output file configuration for the dataflow apps
    8. fsm.data.xml defines the state machine and the supported transition commands
    9. connections.data.xml defines the Network and Queue connection rules used to generate appropriate endpoints in the SmartDaqApplications
    10. moduleconfs.data.xml contains DAQ Module configuration objects for readout, dataflow, and trigger
  • These files are placed in the install/appmodel/share/test/config directory upon build (via the daq_install() CMake command, based on their location in appmodel/test/config), where OKS can find them
  • To create a new configuration, we are currently limited to copying files from an existing configuration, but tools are in development to generate the readout map from an existing one.
  • XML files can be edited manually or by running spack load dbe;dbe_main -f /path/to/file

Running with CRP4 as example with real hardware

  1. Log into the EHN1 DAQ cluster.
  2. setup a work area with a fresh nightly (NFD_DEV_240620_A9++)
  3. Clone appmodel (develop) and build it in your work area.
  4. Copy into the directory from where you want to run the directory appmodel/test/test-crp4.
  5. Launch nanorc using the command: nanorc ./test-crp4 crp4-oks-session
  6. You will be able to look at monitoring metrics and errors using grafana (v5 dashboards).

Useful DBT and Spack commands for software areas

  • dbt-release-info # prints out the release type and name, and the base release name (version)
  • dbt-pkg-info <dunedaq_package_name> # prints out the package version and commit hash used by the release
  • dbt-src-status # prints out the branch names of source repos under sourcecode, and marks those with local changes with "*"
  • spack find --loaded -N <external_package_name>, e.g. spack find --loaded -N boost # prints out the version of the specified external package that is in use in the current software area
  • spack info fddaq # prints out the packages that are included in the fddaq bundle for the current software area
  • spack info dunedaq # prints out the packages that are included in the dunedaq (common) bundle for the current software area

Also see here.

Monitoring the system

When running with nanorc, metrics reports appear in the info_*.json files that are produced (e.g. info_dataflow_<portno>.json). We can collate these, grouped by metric name, using python -m opmonlib.info_file_collator info_*.json (default output file is opmon_collated.json).

It is also possible to monitor the system using a graphic interface.

Steps to enable and view TRACE debug messages

Here are suggested steps for enabling and viewing debug messages in the TRACE memory buffer:

  • set up your software area, if needed (e.g. cd <work_dir>; source ./dbt-env.sh ; dbt-workarea-env)
  • export TRACE_FILE=$DBT_AREA_ROOT/log/${USER}_dunedaq.trace
    • this tells TRACE which file on disk to use for its memory buffer, and in this way, enables TRACE in your shell environment and in subsequent runs of the system with nanorc.
  • run the application using the nanorc commands described above
    • this populates the list of already-enabled TRACE levels so that you can view them in the next step
  • run tlvls
    • this command outputs a list of all the TRACE names that are currently known, and which levels are enabled for each name
    • TRACE names allow us to group related messages, and these names typically correspond to the name of the C++ source file
    • the bitmasks that are relevant for the TRACE memory buffer are the ones in the "maskM" column
  • enable levels with tonM -n <TRACE NAME> <level>
    • for example, tonM -n DataWriter DEBUG+5 (where "5" is the level that you see in the TLOG_DEBUG statement in the C++ code)
  • re-run tlvls to confirm that the expected level is now set
  • re-run the application
  • view the TRACE messages using tshow | tdelta -ct 1 | more
    • note that the messages are displayed in reverse time order

A couple of additional notes:

  • For debug statements in our code that look like TLOG_DEBUG(5) << "test, test";, we would enable the output of those messages using a shell command like tonM -n <TRACE_NAME> DEBUG+5. A couple of notes on this...
    • when we look at the output of the bitmasks with the tlvls command, bit #5 is going to be offset by the number of bits that TRACE and ERS reserve for ERROR, WARNING, INFO, etc. messages. At the moment, the offset appears to be 8, so the setting of bit "DEBUG+5" corresponds to setting bit #13.
    • when we view the messages with tshow, one of the columns in its output shows the level associated with the message (the column heading is abbreviated as "lvl"). Debug messages are prefaced with the letter "D", and they include the number that was specified in the C++ code. So, for our example of level 5, we would see "D05" in the tshow output for the "test, test" messages.
  • There are many other TRACE 'commands' that allow you to enable and disable messages. For example,
    • tonMg <level> enables the specified level for all TRACE names (the "g" means global in this context)
    • toffM -n <TRACE NAME> <level> disables the specified level for the specified TRACE name
    • toffMg <level> disables the specified level for all TRACE names
    • tlvlM -n <TRACE name> <mask> explicitly sets (and un-sets) the levels specified in the bitmask