Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Developer Documentation Content #184

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

michaelmckinsey1
Copy link
Collaborator

@michaelmckinsey1 michaelmckinsey1 commented Jun 25, 2024

This PR adds some developer documentation pages:

  • thicket_properties - descriptions about the assumptions we make in Thicket and how they are enforced in the code.
  • concatenating_thickets - explanation of the process of concatenating Thickets for developers, focusing on the topics that are hard to understand from the code and we get the most questions on.

@michaelmckinsey1 michaelmckinsey1 changed the title Add to docs Add Explanation of Thicket Properties in Docs Jun 25, 2024
@michaelmckinsey1 michaelmckinsey1 self-assigned this Jun 25, 2024
@michaelmckinsey1 michaelmckinsey1 added area-docs Issues and PRs related to Thicket's documentation, notebooks, and examples priority-normal Normal priority issues and PRs status-work-in-progress PR is currently being worked on labels Jun 25, 2024
@michaelmckinsey1 michaelmckinsey1 changed the title Add Explanation of Thicket Properties in Docs Add Documentation of Thicket Properties Jun 25, 2024
@michaelmckinsey1 michaelmckinsey1 marked this pull request as ready for review July 2, 2024 21:08
@michaelmckinsey1 michaelmckinsey1 changed the title Add Documentation of Thicket Properties Add Developer Documentation Content Jul 2, 2024
@michaelmckinsey1 michaelmckinsey1 requested a review from ilumsden July 2, 2024 21:08
Copy link
Collaborator

@ilumsden ilumsden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a lot of this confusing, mostly due to a lack of detail and inconsistent organization.

Regarding detail, these are developer docs, so more details are almost always going to be better than fewer details. The people reading these docs want to know the technical details of how Thicket works, what it does, and what constraints they have to comply with.

Regarding organization, it's better to essentially hold the reader's hand through the pages than not. The more you can do to make the organization easy to understand and almost formulaic, the better. For example, in the "Concatenating Thickets" page, you could end the introduction by saying that concatenating Thickets consists of 3 steps:

  1. Unifying graphs
  2. Updating Node objects based on changes to the graphs
  3. Merging DataFrames by row or column

The text for these bullets would then literally become the names of the subsections.


1. :code:`utils.validate_dataframe._check_duplicate_inner_idx` - There are no duplicate indices.
2. :code:`utils.validate_dataframe._check_missing_hnid` - *Node* objects, identified by their :code:`_hatchet_nid` are in ascending order without gaps.
3. :code:`utils.validate_dataframe._validate_name_column` - The values in the "name" column match the :code:`Node.frame["name"]` attribute for that row or are :code:`None`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general format of this page is really confusing. I had to reread several times before I understood what was being conveyed.


##################
Unifying Thickets
##################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sections don't make a ton of sense right now because they don't clearly connect back to anything in the intro text above or the figure. I'd add a sentence or two to do this connecting. Maybe something that says "step X in the figure consists of the following steps."

Unifying Thickets
##################

This section mainly refers to the :code:`Thicket.Ensemble._unify()` function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what "Unifying Thickets" means here. What is it? How does it differ from "Concatenating Thickets?"

Unifying Calltrees
===================

*Unifying Calltrees* is the process of performing a **set operation** (e.g. :code:`Hatchet.graph.union()`) on multiple :code:`Thicket.graph`'s. Comparing two graphs involves comparing :code:`Hatchet.Node` objects between the graphs. The :code:`Hatchet.graph.union()` function computes the union graph between two Hatchet graphs. Nodes are compared by:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no such thing as a universal "set operation." Is it creating a set? Is it performing a union on an existing set? Intersection? Some other set operation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to "graph operation", referring to operations that can be performed on a graph. This could just say "... performing a union or intersection on multiple ..."


Nodes that match in #1 and #2 are merged in the resulting union graph as a new :code:`Hatchet.Node` object (`deep copy of the first node <https://github.com/LLNL/hatchet/blob/6a6d7027056df96bd1c919ab34a9acce81f3b9a1/hatchet/graph.py#L227>`_). Deep copies of nodes that do **not** match are inserted into the union graph at the appropriate depth.

*Note:* The :code:`Thicket.intersection()` function first applies the :code:`Hatchet.graph.union()` before computing the intersection of the graphs, since their does not exist a :code:`Hatchet.graph.intersection` function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did intersection() come from? I have absolutely no idea how this plugs into Thicket based off of the rest of the page.

Updating Node Objects
======================

Because Node objects must be identical between Thicket components (see :ref:`/thicket_properties.rst#nodes`), The resulting new nodes in the union graph must replace the old node objects in components like the :code:`Thicket.dataframe.index` (see `code <https://github.com/LLNL/thicket/blob/develop/thicket/ensemble.py#L68-L83>`_). The :code:`Hatchet.graph.union()` function provides a dictionary mapping old nodes to new nodes, however to avoid applying these updates after every union between two graphs, we `update a dictionary of all the node mappings <https://github.com/LLNL/thicket/blob/develop/thicket/ensemble.py#L53-L67>`_ and apply the updates after all of the unions have been computed. This is **only** necessary when concatenating more than **two** Thickets, as only one union will be performed when concatenating two Thickets. We `apply this idea when reading files <https://github.com/LLNL/thicket/blob/develop/thicket/thicket.py#L393-L413>`_ to avoid this cost.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other comments about connecting back to the earlier parts of the page still apply here, but otherwise, this section looks good.

Index Concatenation
####################

*Index Concatenation* refers to the process that happens for the performance and metadata tables. We concatenate the tables, which is essentially "stacking the rows on top of each other". Because we check that the performance profiles we concatenate are unique (:ref:`/thicket_properties.rst#profiles`), we do not need to worry about duplicate indices in either table. We sort the index of both tables, which interleaves the profiles in the MultiIndex of the performance table to visually group all of the profiles in the table for each node. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index Concatenation refers to the process that happens for the performance and metadata tables.

I don't understand this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, most of the wording in this section is hard to understand. Honestly, if I didn't have pre-existing knowledge of how Thicket works, I wouldn't understand this at all.

Column Concatenation
#####################

*Column Concatenation* refers to the process that happens in the performance, metadata, and statistics tables. We create a MultiIndex out of the columns, such that for each metric, there is a higher level index label. An example of this operation can be seen in the :ref:`/thicket_tutorial.ipynb`, when :code:`axis="columns"`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column Concatenation refers to the process that happens in the performance, metadata, and statistics tables.

Again, I don't really understand the first sentence of this section.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a MultiIndex out of the columns, such that for each metric, there is a higher level index label.

Although correct, this seems kind of "hand wavey." If these are meant to be developer docs, don't be afraid of going into detail with this stuff. The people reading this want to know all the technical stuff about how Thicket works.

@michaelmckinsey1 michaelmckinsey1 added the status-revisions-needed Revisions have been requested from a reviewer for this PR label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-docs Issues and PRs related to Thicket's documentation, notebooks, and examples priority-normal Normal priority issues and PRs status-revisions-needed Revisions have been requested from a reviewer for this PR status-work-in-progress PR is currently being worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants