Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collections issue : Trying to query across entities using collections but getting error #407

Open
apoorv-yadav opened this issue Jul 19, 2024 · 11 comments

Comments

@apoorv-yadav
Copy link

Describe the bug
Trying to use collection feature to query across 2 separate entities (employee and tasks) for fetching all tasks associated with employee, along with employee details. But Its not working since I have mapped PK and SK with attributes.

ElectroDB Version : 2.14.2

ElectroDB Playground Link
Playground link : https://electrodb.fun/?#code/PQKgBAsg9gJgpgGzARwK5wE4Es4GcA0YuccYGeqCALgUQBYCG5YA7llXWAGbZwB2MXGBDAAUKKwBbAA5QMVMAG8wAUT5V2AT0IBlTADcsAY1IBfbhiiSwAIkRwjVSzABGNgNzijUPrgVUGFwRSAF5bbxkGPk0PLx8-MDgZBChNEjAwvjgWVXUtAApRMCUi4rBJWEQALhKysv4NKk0au2TUkht8Urr9TFwsHxaARk7usuIMQxMWgNwAawZpaRsx0y66hionLBdUKjwaxTH6trS4AEkYQ+O6sqbpOBqAcj9sPgBzJ-Xbn64cBCuYCeAAUANJfG7FNaQsCoCaSOABa4-H73R5A15YD4QlG3P6IQFPHTg74-aF1cllLHwAAeB1qt2IWyx71wyNxZW8CGCjgGfBmDHmuFGHLK0jm7NFdXxAOeYJxUrqEVk-X2NQA2k8ktIUmdLk8ALqkxVgfbJTborWnEgAYgAJIptbqSJdTAqTWAjIKWc8+D44BCYRTjbj5pKTTLCcT3SblVBVejNXDMAiAoaQ1KzTqLc9UwwHcmMHm3RnRV7+h9ff6Y7jKWTjaZSpTlAEgmZRABKTyiby+fyCuYZMBZHJqRqaQrFI51CrwBDhn4NLTPWZzGu3XoYfqDIFDdd1CZTS2rgCCSz0k2MAYzdbKm22u32bIZuKd7QugOnJrRz0x2NLKKRnKJJBmAt6ogOlwLhyP4Yts-6gdK-xRiBorgbcwDAGAADyHCYGA97YI+eA3Oh1JwHSz5friLiaFBL5SlyPIaDuNirsKAE-OK0GKkBQLypxHJxgmGpWjq776kaiEolmCA5kCb5nPajrWh+JbST85Y+kCfpZPuKLoaGEoMR6fFEqhHqclYKrsImTyrpJgm4rJ8n2QOykOTA6mWZy3qVjp1ZOWUhnBTe3zNqagTBGBnbdr2CSLNIQ4jmAF5HvkyiKSQhCrmBXZxH2iSqZcQ42DAAB0dCoJofB0LEoiYbCxAEZ6UDcg4LF8KaUAoOgGCaOUcikBwURgP6iR5E0BEKAwppSHAoiJeVTEdXyuClOV7EZUV4l6jAeUbe8UAZdIDDvPSTwMNyTymPlohAA

Entity/Service Definitions

const employee = new Entity(
  {
    model: {
      entity: "employee",
      version: "1",
      service: "taskapp"
    },
    attributes: {
       employeeId: {
            type: 'string',
            field: 'PK',
        },
        usermeta: {
            type: 'string',
            field: 'SK',
        },
    },
    indexes: {
      settings: {
            collection: "tasks",
            pk: {
                field: 'PK',
                composite: ['employeeId'],
                template: 'employee#${employeeId}',
                casing: 'none',

            },
            sk: {
                field: 'SK',
                composite: ['usermeta'],
                template: 'meta${usermeta}',
                casing: 'none',
            },
        },
    }
  },
  { table }
);
const task = new Entity(
  {
    model: {
          entity: 'task',
          version: '1',
          service: 'taskAppService',
      },
      attributes: {
          employeeId: {
              type: 'string',
              field: 'PK',
          },
          taskId: {
              type: 'string',
              field: 'SK',
          },
          // Other attributes
      },
      indexes: {
          byId: {
              collection: "tasks",
              pk: {
                  field: 'PK',
                  composite: ['employeeId'],
                  template: 'employee#${employeeId}',
                  casing: 'none',
              },
              sk: {
                  field: 'SK',
                  composite: ['taskId'],
                  template: 'task#${taskId}',
                  casing: 'none',
              },
          },
      },
  },
  { table }
);

const app = new Service({ employee, task });
const employeeId = "1233321";

// use a collection to query more than one entity at a time
app.collections
  .tasks({ employeeId })
  .go({pages: 'all'});

Expected behavior
It should return all tasks assigned to employee, along with employee all details (i.e employee metadata)

Errors

Invalid use of a collection on index "(Primary Index)". The pk field "PK" shares a field name with an attribute defined on the Entity, and therefore the index is not allowed to participate in a Collection. Please either change the field name of the attribute, or remove all collection(s) from the index.
@tywalch
Copy link
Owner

tywalch commented Jul 20, 2024

It depends on what you're trying to do. Are you trying to use ElectroDB with existing data? I would make different recommendations depending on if this is a greenfield use case or trying to match an existing model.

@rdzidziguri
Copy link

rdzidziguri commented Jul 21, 2024

I have a similar situation, but in my case, it is not a greenfield project, so I am limited in schema shape; also, as my team learned, one needs to be prepared that as soon as you update even a single attribute, ElectroDB is going to add **__edb_*** attributes. However, I assume that this can not be avoided; however, some schemas already have patterns to track versions and entity types (it would be nice if those properties could be remapped to existing attributes).

I would say that for one, using the library for the first time, At the same time, it works nicely with small to medium schemas if you have table schema and 3 GSI s along with 3 LSI, and for the sake of example, let's say one LSI is a location, where it is like <country>#<State>#<city>#<zip> not only does it require some advanced getter plus filed overrides but also this type of approach which is standard for DDB becomes exceptionally challenging. It is a great library. Still, with an actual single table schema, as probably any other opinionated library, the limitations imposed by this specific one must be considered before deciding on existing projects.

@tywalch
Copy link
Owner

tywalch commented Jul 21, 2024

There are options to map to a current schema, could you maybe share an existing item and I can try to demonstrate? As for your second point, could you provide an example of the challenge you're experiencing?

@rdzidziguri
Copy link

I would instead let the original thread author respond to this (I do not want to take over the issue :) ), but in general, a considerable table (and here I am talking about millions of entries ) in a single table schema would be a prevalent practice to design the following way and use keys to store composite query details and use for example Batch get to get parent entities as well as all the child items. To demonstrate the approach, let's take an example of an Airline and flight management system where
PK could be the airline company ID, and individual flights would be

PK SK GSI1PK GSI1SK meta
AIRLN#<compani_ID> meta {...}
AIRLN#<compani_ID> FLIGHT#<date> {...}
AIRLN#<compani_ID> FLIGHT#<date> {...}
AIRLN#<compani_ID> ASSETS#<asset_id> {...}

In this case, the access pattern is to get all the upcoming flights and airline information for the given company. Again, I apologize if this is not precisely what the thread author asked. I believe it is close enough, so in this case, one would be issuing Batch get to receive data where SK = meta as well as SK BEGINS WITH FLIGHT#

@apoorv-yadav
Copy link
Author

Hey @rdzidziguri perfectly describes my issue here.
@tywalch, Its quite similar to my case. I would like to have all the tasks of an employee along with employee details (meta) and I can't change my schema (unfortunately).

BatchGet or collections either of them will work for me.... Looking forwards to your response.

@tywalch
Copy link
Owner

tywalch commented Jul 22, 2024

@rdzidziguri (and @apoorv-yadav)

Are you using the PK field as companyId or do you also have a separate companyId field? Same question for your SK. For example, which item below matches your item?:

Option 1

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

Option 2

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22",
    "compani_ID": "123",
    "date": "2024-07-22"
}

@apoorv-yadav
Copy link
Author

Hello @tywalch , in my case its option 1

Option 1

{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

@rdzidziguri
Copy link

rdzidziguri commented Jul 22, 2024

It is Option 1 for me as well. Also, let me explain some of the existing arch decisions there. As you know, DDB is not an excellent toy for cost optimization and is even crazy with Global Tables on. Hence, minimizing storage becomes essential at some point, and one has to squish data as much as possible, so, just like @apoorv-yadav, I am also firmly bound to an existing schema. minimizing the number of attributes

  • makes Global Table replication latency smaller
  • saves costs
  • It allows retrieving more data with the same command, considering that 4kb RCU is applied for eventual consistency.
{
    "PK": "AIRLN#123",
    "SK": "FLIGHT#2024-07-22"
}

@tywalch
Copy link
Owner

tywalch commented Jul 22, 2024

Ok, so the attributes involved in your keys don't exist as another item; with your existing implementation, do you "extract" the item from the key value on retrieval? Like do you turn FLIGHT#2024-07-22 into 2024-07-22 or AIRLN#123 into 123?

@rdzidziguri
Copy link

In my case, yes, this is how it is supposed to go.

FLIGHT#2024-07-22 into 2024-07-22 and AIRLN#123 into 123

@rdzidziguri
Copy link

I assume the feature is not supported, and we will have to look for alternatives, as the issue has been open for a while now. Good luck to the thread author, but we will probably move on to native SDK implementation using DDB low-level queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants