Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: [IMP] base: Speed-up address_get method #165010

Open
wants to merge 3 commits into
base: 15.0
Choose a base branch
from

Conversation

moylop260
Copy link
Contributor

@moylop260 moylop260 commented May 8, 2024

partners with many children need to visit all the children to check if the address match with the prefered one But it is not needed if you know the prefered address doesn't match Searching the children directly with search method it is faster

You can test using the following script

values = []
partner_parent = self.env["res.partner"].create({"name": "big family partner", "type": "contact"})
for i in range(55555):
     values.append({
        "name": "test %d" % i,
        "parent_id": partner_parent.id,
        "type": "delivery"
    })
children = self.env["res.partner"].create(values)
result = partner_parent.address_get(['delivery', 'invoice'])

(or a little more complex but faster using many threads https://gist.github.com/moylop260/c4ae432f2f8ae72b94fcc3bb1a3dc419)

Fast-tracking creating the database I uploaded a backup with the records created:

improve

Using original code the profiler shows:

Total time: 1806.75 s
File: /Users/moylop260/odoo/odoo-15.0/odoo/addons/base/models/res_partner.py
Function: address_get at line 959

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   959                                               @profile
   960                                               def address_get(self, adr_pref=None):
   961                                                   """ Find contacts/addresses of the right type(s) by doing a depth-first-search
   962                                                   through descendants within company boundaries (stop at entities flagged ``is_company``)
   963                                                   then continuing the search at the ancestors that are within the same company boundaries.
   964                                                   Defaults to partners of type ``'default'`` when the exact type is not found, or to the
   965                                                   provided partner itself if no type ``'default'`` is found either. """
   966         1         19.0     19.0      0.0          adr_pref = set(adr_pref or [])
   967         1          6.0      6.0      0.0          if 'contact' not in adr_pref:
   968         1          3.0      3.0      0.0              adr_pref.add('contact')
   969         1          2.0      2.0      0.0          result = {}
   970         1          3.0      3.0      0.0          visited = set()
   971         2        160.0     80.0      0.0          for partner in self:
   972         1          1.0      1.0      0.0              current_partner = partner
   973         1         25.0     25.0      0.0              while current_partner:
   974         1          2.0      2.0      0.0                  to_scan = [current_partner]
   975                                                           # Scan descendants, DFS
   976    555552     560396.0      1.0      0.0                  while to_scan:
   977    555551   28199392.0     50.8      1.6                      record = to_scan.pop(0)
   978    555551    2077545.0      3.7      0.1                      visited.add(record)
   979    555551   13199474.0     23.8      0.7                      if record.type in adr_pref and not result.get(record.type):
   980         2         33.0     16.5      0.0                          result[record.type] = record.id
   981    555551     529671.0      1.0      0.0                      if len(result) == len(adr_pref):
   982                                                                   return result
   983   1111102 1761899692.0   1585.7     97.5                      to_scan = [c for c in record.child_ids
   984                                                                            if c not in visited
   985    555551     283027.0      0.5      0.0                                   if not c.is_company] + to_scan
   986
   987                                                           # Continue scanning at ancestor if current_partner is not a commercial entity
   988         1         59.0     59.0      0.0                  if current_partner.is_company or not current_partner.parent_id:
   989         1          0.0      0.0      0.0                      break
   990                                                           current_partner = current_partner.parent_id
   991
   992                                                   # default to type 'contact' or the partner itself
   993         1          4.0      4.0      0.0          default = result.get('contact', self.id or False)
   994         4          2.0      0.5      0.0          for adr_type in adr_pref:
   995         3          3.0      1.0      0.0              result[adr_type] = result.get(adr_type) or default
   996         1          1.0      1.0      0.0          return result

Notice it spend 1806.75s and max number of hits 1111102

Using this improvement the profiler shows:

Wrote profile results to odoo-bin.lprof
Timer unit: 1e-06 s

Total time: 0.103456 s
File: /Users/moylop260/odoo/odoo-15.0-speedup-address-moy/odoo/addons/base/models/res_partner.py
Function: address_get at line 959

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   959                                               @profile
   960                                               def address_get(self, adr_pref=None):
   961                                                   """ Find contacts/addresses of the right type(s) by doing a depth-first-search
   962                                                   through descendants within company boundaries (stop at entities flagged ``is_company``)
   963                                                   then continuing the search at the ancestors that are within the same company boundaries.
   964                                                   Defaults to partners of type ``'default'`` when the exact type is not found, or to the
   965                                                   provided partner itself if no type ``'default'`` is found either. """
   966         1         17.0     17.0      0.0          adr_pref = set(adr_pref or [])
   967         1         10.0     10.0      0.0          if 'contact' not in adr_pref:
   968         1          3.0      3.0      0.0              adr_pref.add('contact')
   969         1          1.0      1.0      0.0          result = {}
   970         1          2.0      2.0      0.0          visited = set()
   971         2        159.0     79.5      0.2          for partner in self:
   972         1          1.0      1.0      0.0              current_partner = partner
   973         1         41.0     41.0      0.0              while current_partner:
   974         1          2.0      2.0      0.0                  to_scan = [current_partner]
   975                                                           # Scan descendants, DFS
   976         3          3.0      1.0      0.0                  while to_scan:
   977         2          5.0      2.5      0.0                      record = to_scan.pop(0)
   978         2         39.0     19.5      0.0                      visited.add(record)
   979         2      12480.0   6240.0     12.1                      if record.type in adr_pref and not result.get(record.type):
   980         2         36.0     18.0      0.0                          result[record.type] = record.id
   981         2          5.0      2.5      0.0                      if len(result) == len(adr_pref):
   982                                                                   return result
   983         4         21.0      5.2      0.0                      domain = [("id", "not in", [i.id for i in visited]),
   984         2          5.0      2.5      0.0                                ("parent_id", "=", record.id), ("is_company", "!=", True)]
   985         5         16.0      3.2      0.0                      for address_type in adr_pref - set(result):
   986         3      90526.0  30175.3     87.5                          new_address = self.search(domain + [("type", "=", address_type)], limit=1)
   987         3          8.0      2.7      0.0                          if new_address:
   988         1          1.0      1.0      0.0                              to_scan.append(new_address)
   989
   990                                                           # Continue scanning at ancestor if current_partner is not a commercial entity
   991         1         64.0     64.0      0.1                  if current_partner.is_company or not current_partner.parent_id:
   992         1          0.0      0.0      0.0                      break
   993                                                           current_partner = current_partner.parent_id
   994
   995                                                   # default to type 'contact' or the partner itself
   996         1          3.0      3.0      0.0          default = result.get('contact', self.id or False)
   997         4          4.0      1.0      0.0          for adr_type in adr_pref:
   998         3          3.0      1.0      0.0              result[adr_type] = result.get(adr_type) or default
   999         1          1.0      1.0      0.0          return result

Notice it spends 0.10s and max number of hits 5

It means, this fix is 17.5k times faster for this case

It is running extra search instead of using cache for small partner but even it is faster with small partners

Using a partner with only 3 children:

  • before: 0.04s
  • after: 0.02s

(I tried to improve even more using read_group but it is returning all the children and aggregator methods are limited in the odoo API I mean, the odoo API is not supporting the aggregator FIRST(id order by %s) % self._name):

  • odoo/odoo/models.py

    Lines 285 to 288 in d6785a1

    VALID_AGGREGATE_FUNCTIONS = {
    'array_agg', 'count', 'count_distinct',
    'bool_and', 'bool_or', 'max', 'min', 'avg', 'sum',
    }

Another way is to use directly SQL, but I prefer using search(..., limit=1) instead since it is using a max of 2 address so not a big deal here using it

It is improving many process of odoo where this method is called:

@robodoo
Copy link
Contributor

robodoo commented May 8, 2024

@C3POdoo C3POdoo requested review from a team, xmo-odoo and ryv-odoo and removed request for a team May 8, 2024 23:30
@moylop260 moylop260 force-pushed the 15.0-speedup-address-moy branch 2 times, most recently from 0c6cb23 to 9338a6a Compare May 8, 2024 23:52
@moylop260
Copy link
Contributor Author

@ryv-odoo @xmo-odoo

What do you think?

Copy link
Contributor

@luisg123v luisg123v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions, in case you haven't considered them yet.

odoo/addons/base/models/res_partner.py Outdated Show resolved Hide resolved
odoo/addons/base/models/res_partner.py Outdated Show resolved Hide resolved
@ryv-odoo
Copy link
Contributor

Hello @moylop260 , Thank you for your suggestion.

AFAIK, there are two main 'issues' with your solution:

  • It doesn't rely on the field ORM cache of child_ids. If child_ids is already fetched when you call the address_get, the performance should be worse than before.
  • IMO, you should not filter by type ([("type", "=", address_type)]), because you want to check the children recursively, which may be the correct contact type (and I suppose that's why it's much faster; you stop after the first level of the contact hierarchy).

We should know why fetching child_ids is so bad (this is why I prefer to use py-spy to get the full picture). I tested it, and I noticed the quadratic issue of the mapped done by the filtered in One2many.read(), maybe we should backport this PR #105350 in order to mitigate the issue without having the tradeoff explained above.

Note that reading a One2many should also be faster in 17.0.

We can investigate further if this is a bottleneck for you (and perhaps after backporting the mentioned PR).

@moylop260
Copy link
Contributor Author

moylop260 commented May 10, 2024

Hi @ryv-odoo

Thank you for reply!

this is why I prefer to use py-spy to get the full picture

I agree, I share the py-spy result:

odoo original py-spy

odoo_original

odoo fix PR-105350 (without pre-populate cache)

odoo_fix_pr105350

The biggest bar is related to line:

IMO, you should not filter by type ([("type", "=", address_type)]), because you want to check the children recursively

I can work to get an unittest to consider this case

But I'm not sure to understand the case of use

Could you help me to give me more details on where the result could be different using this fix, please?

@ryv-odoo
Copy link
Contributor

Hi,

Could you help me to give me more details on where the result could be different using this fix, please?

Here you go. I am not sure if it is possible to reproduce by hand, but nothing prevents it.

    def test_multiple_level_address_get(self):
        Partner = self.env['res.partner']
        main_p = Partner.create({
            'name': 'main',
            'type': 'contact',
        })
        child_p = Partner.create({
            'name': 'child',
            'type': 'private',
            'parent_id': main_p.id,
        })
        grandchild_p = Partner.create({
            'name': 'grandchild',
            'type': 'invoice',
            'parent_id': child_p.id,
        })
        self.assertEqual(main_p.address_get(['invoice'])['invoice'], grandchild_p.id)

We can see some interesting things in your pyflam:

  • The to_scan = ... + to_scan is very inefficient. In fact, we rebuild the whole to_scan list x times (O(n²) complexity). IMO, this can be easily solved (with expend(...) + pop()).
  • The _RelationalMulti.convert_to_record does a lot of work to filter out inactive childs_ids (mainly due to the problem of the quadratic mapped I suppose). But actually the childs_ids never contain inactive records because there is a static domain on them: [('active', '=', True)]. Then we can add context={'active_test': False} on child_ids, to avoid the inactive filter in convert_to_record.

Have a nice day

@moylop260
Copy link
Contributor Author

moylop260 commented May 13, 2024

@ryv-odoo

I just did a new commit with your findings

If I understood well we need to get all the descendants and get the address type to match with the child but prioritizing the first level (less depth)

So, getting a recursive-query to get all them and ordered by depth it will get the same result but faster

The query generated is:

WITH RECURSIVE descendants AS (
   ...
)
SELECT DISTINCT ON (type)
type, id
FROM descendants
ORDER BY type, path...

And it is passing your unittesting well!

@luisg123v
Copy link
Contributor

@moylop260 what do you think about using the operator child_of to consider recursive descendants?

@moylop260 moylop260 force-pushed the 15.0-speedup-address-moy branch 2 times, most recently from 5413452 to bd0969a Compare May 14, 2024 00:58
@moylop260 moylop260 changed the title [IMP] base: Speed-up address_get method Draft: [IMP] base: Speed-up address_get method May 14, 2024
@moylop260 moylop260 force-pushed the 15.0-speedup-address-moy branch 5 times, most recently from d796401 to bb6b511 Compare May 16, 2024 00:56
@C3POdoo C3POdoo requested review from a team May 16, 2024 00:59
@moylop260 moylop260 force-pushed the 15.0-speedup-address-moy branch 2 times, most recently from cc7e72e to 956d3c9 Compare May 16, 2024 01:37
@seb-odoo seb-odoo removed the request for review from a team May 16, 2024 08:37
@moylop260 moylop260 force-pushed the 15.0-speedup-address-moy branch 9 times, most recently from cf6cd1d to d38805b Compare May 18, 2024 19:33
partners with many children need to visit all the children to check if the address match with the prefered one
But it is not needed if you know the prefered address doesn't match
Searching the children directly with search method it is faster
The csv file was used with the old version of odoo
in order to get the results in order to compare with
new results of latest changes to partner.address_get method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants