Request for Help with Data Filtering in BigTales

Dear colleagues,

I am reaching out for your assistance in configuring data filtering in the BigTales system. I have been grappling with this issue for the past two days and feel like I am at my wit’s end.

Task Description:

I need to establish dependent data filtering based on the following criteria:

  1. By operation type
  2. By counterparty
  3. By client

Requirements:

  1. Dependent Filtering: Selecting one parameter should affect the available options for the subsequent filters.
  2. Summation Post-Filtering: After completing the filtering process, I need to extract the sum of the filtered data.

Approaches I Have Tried:

1. First Approach:

  • Find All Operations:
    • Initially, I attempted to retrieve all operations from the database without applying any filters. This step is necessary to gather the complete dataset for further processing.
  • Filter 1 (Operation Type):
    • For the first filter, I identified all existing operation types and implemented a condition: if the user did not select any specific type, display all types; otherwise, show only the selected type. I stored the list of types in a Join List. Subsequently, I used a query with the condition “is included in” to filter the operations based on the selected type.
  • Subsequent Filters (Counterparty and Client):
    • I applied similar logic for the other filters (counterparty and client). Each time, I updated the Join List with the relevant values and performed a query using the “is included in” condition to ensure that each filter was dependent on the previous one.

2. Second Approach:

  • Find All Operations:
    • Similar to the first approach, I started by retrieving all operations from the database.
  • MAKE ARRAY in BigTables:
    • In BigTables, I created an array and added the word “Not Selected” (to represent all operations) along with the actual operation types. This step ensures that users can see all operations when no specific type is selected.
  • Retrieve Row IDs Based on Selected Type:
    • Next, I searched for all operations based on the selected operation type and extracted the Row IDs of these operations. This step helps in narrowing down the dataset to only those records that match the chosen criteria.
  • Query Using Retrieved Row IDs:
    • Finally, I executed a query using the retrieved Row IDs to fetch the corresponding records. This method allows for more precise filtering but requires additional steps to handle dependencies between filters.

Performance Issues with Large Datasets:

While both approaches work well for smaller datasets (up to 10,000 rows), they encounter significant performance issues as the dataset size increases:

  • Up to 10,000 rows: Works perfectly fine.
  • Around 500,000 rows: Starts showing noticeable performance degradation.
  • 7 million rows: The methods fail completely due to excessive computational demands.

Specific Problems Encountered:

  1. Join List Limitations:
  • I am unable to determine if there are limitations on the number of tokens or entries that can be stored in a Joined List. As the dataset grows, it becomes increasingly difficult to manage large lists efficiently.
  1. Split Text Performance:
  • The Split Text function also starts to perform poorly with larger datasets, leading to delays and potential timeouts.
  1. Calculated Columns:
  • Calculated columns begin to hang or become unresponsive as the number of rows increases. This significantly impacts the usability and reliability of the filtering process.