feat: Added the initial implementation of KT-split #871

qh681248 · 2024-11-15T13:07:54Z

PR Type

Feature

Description

How Has This Been Tested?

Checklist before requesting a review

I have made sure that my PR is not a duplicate.
My code follows the style guidelines of this project.
I have ensured my code is easy to understand, including docstrings and comments where necessary.
I have performed a self-review of my code.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
New and existing unit tests pass locally with my changes.
Any dependent changes have been merged and published in downstream modules.
I have updated CHANGELOG.md, if appropriate.

github-actions · 2024-11-15T13:09:23Z

Performance review

Commit `27887b6` - Merge `eb68cc6` into `98b6142`

No significant changes to performance.

github-actions · 2024-11-25T10:36:08Z

Performance review

Commit `17ef126` - Merge `e008146` into `ed78130`

No significant changes to performance.

github-actions · 2024-11-25T11:07:44Z

Performance review

Commit `cd03a13` - Merge `1475026` into `ed78130`

No significant changes to performance.

github-actions · 2024-11-25T11:41:28Z

Performance review

Commit `1cafdf8` - Merge `e1dedb4` into `ed78130`

No significant changes to performance.

gw265981

I added the comments which we mostly discussed previously. After you have a final working implementation, add some unit tests or create an issue for those (but it might be helpful for you to check that everything is working).

gw265981 · 2024-11-29T16:10:28Z

coreax/solvers/coresubset.py

+        probabilistic components in the algorithm.
+    """
+
+    kernel: ScalarValuedKernel


add the defaults for some parameters as discussed previously

gw265981 · 2024-11-29T16:11:56Z

coreax/solvers/coresubset.py

+    random_key: KeyArrayLike
+
+    @classmethod
+    def get_swap_params(


this seems to be the same as get_a_and_param in kt_half, do you need both?

gw265981 · 2024-11-29T16:19:09Z

coreax/solvers/coresubset.py

+        final_coresets = self.kt_split(dataset)
+        return self.kt_refine(self.kt_choose(final_coresets, dataset)), solver_state
+
+    def kt_half_recursive(self, points, m, original_dataset):


I would rename points to something like current_subset.

Also, add type annotations.

gw265981 · 2024-11-29T16:25:09Z

coreax/solvers/coresubset.py

+        """
+        n = len(points) // 2
+        original_array = points.data
+        arr1 = jnp.zeros(n, dtype=jnp.int32)


Use more descriptive variable names and preferable add a comment before to explain what they are for. If the variables are simply temporary placeholders, explain that in a comment.

gw265981 · 2024-11-29T16:25:46Z

coreax/solvers/coresubset.py

+            subset1 = eqx.tree_at(lambda x: x.nodes.data, subset1, subset1_indices)
+            subset2 = eqx.tree_at(lambda x: x.nodes.data, subset2, subset2_indices)
+
+        # Recur for both subsets and concatenate results


recur -> recurse

gw265981 · 2024-11-29T16:40:16Z

coreax/solvers/coresubset.py

+            alpha = term1 + term2
+            return alpha, bool_arr_1, bool_arr_2
+
+        def final_function(


Again, just be more descriptive in the name, e.g., apply_probabilistic_assignment (just an example, feel free to make it more appropriate of course).

gw265981 · 2024-11-29T16:42:15Z

coreax/solvers/coresubset.py

+
+        return Coresubset(final_arr1, points), Coresubset(final_arr2, points)
+
+    def kt_split(self, points: _Data) -> list[Coresubset[_Data]]:


As discussed, we probably want to remove this from here for now but save it for later.

gw265981 · 2024-11-29T16:44:11Z

coreax/solvers/coresubset.py

+
+        return final_coresets
+
+    def kt_choose(


This will likely have to be changed to be jit-compatible, e.g., using vmap to get a vector of MMD values and then jnp.argmin to select the best.

Also, implement the baseline coreset computation and expose the method for this as parameter (random is probably a good default).

gw265981 · 2024-11-29T16:50:46Z

coreax/solvers/coresubset.py

+
+        return best_coreset
+
+    def kt_refine(self, candidate_coreset: Coresubset[_Data]) -> Coresubset[_Data]:


Feel free to use the Kernel Herding refine method here.

gw265981 · 2024-11-29T16:54:57Z

coreax/solvers/coresubset.py

+
+        return a, new_sigma
+
+    def reduce(


If we want to make this an ExplicitSizeSolver, this might be a place to do the logic of discarding and padding the points. Also, this will provide the coreset_size parameter, so you will probably want to remove m as a parameter and compute it as log2(data_size/coreset_size) (after discarding etc).

feat: Added the initial implementation of KT-split

eb68cc6

qh681248 linked an issue Nov 15, 2024 that may be closed by this pull request

Add KT-split algorithm #863

Open

qh681248 marked this pull request as draft November 15, 2024 13:08

feat: Add recursive kernel halving

e008146

doc: Add missing docstrings for Kernel Thinning

1475026

qh681248 linked an issue Nov 25, 2024 that may be closed by this pull request

Add KT-select_best algorithm #864

Open

doc: fix build documentation error

e1dedb4

gw265981 self-requested a review November 27, 2024 16:40

gw265981 reviewed Nov 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added the initial implementation of KT-split #871

feat: Added the initial implementation of KT-split #871

qh681248 commented Nov 15, 2024

github-actions bot commented Nov 15, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

gw265981 left a comment

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024

gw265981 Nov 29, 2024


		return Coresubset(final_arr1, points), Coresubset(final_arr2, points)

		def kt_split(self, points: _Data) -> list[Coresubset[_Data]]:


		return best_coreset

		def kt_refine(self, candidate_coreset: Coresubset[_Data]) -> Coresubset[_Data]:

feat: Added the initial implementation of KT-split #871

Are you sure you want to change the base?

feat: Added the initial implementation of KT-split #871

Conversation

qh681248 commented Nov 15, 2024

PR Type

Description

How Has This Been Tested?

Checklist before requesting a review

github-actions bot commented Nov 15, 2024

Performance review

Commit 27887b6 - Merge eb68cc6 into 98b6142

github-actions bot commented Nov 25, 2024

Performance review

Commit 17ef126 - Merge e008146 into ed78130

github-actions bot commented Nov 25, 2024

Performance review

Commit cd03a13 - Merge 1475026 into ed78130

github-actions bot commented Nov 25, 2024

Performance review

Commit 1cafdf8 - Merge e1dedb4 into ed78130

gw265981 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Commit `27887b6` - Merge `eb68cc6` into `98b6142`

Commit `17ef126` - Merge `e008146` into `ed78130`

Commit `cd03a13` - Merge `1475026` into `ed78130`

Commit `1cafdf8` - Merge `e1dedb4` into `ed78130`