paradance.evaluation.Calculator
- class paradance.evaluation.Calculator(df: DataFrame, selected_columns: List[str], overall_score_lower_bound: float | None = None, overall_score_upper_bound: float | None = None, equation_type: str = 'product', weights_for_groups: Series | None = None, equation_eval_str: str | None = None, equation_json: Dict | None = None, delimiter: str | None = '#', rerank_eval_str: str | None = None)[source]
A calculator for processing and analyzing data within a DataFrame based on specified equations and methods.
- df
The DataFrame to perform calculations on.
- Type:
pd.DataFrame
- equation_type
The type of equation to use for calculations (“product”, “sum”, “free_style”, or “json”).
- Type:
- selected_values
The values of the selected columns in the DataFrame.
- Type:
np.ndarray
- value_scales
The negative average log10 magnitude of absolute values for selected columns.
- Type:
np.ndarray
- weights_for_groups
A Series containing weights for different groups within the DataFrame.
- Type:
pd.Series
- __init__(df: DataFrame, selected_columns: List[str], overall_score_lower_bound: float | None = None, overall_score_upper_bound: float | None = None, equation_type: str = 'product', weights_for_groups: Series | None = None, equation_eval_str: str | None = None, equation_json: Dict | None = None, delimiter: str | None = '#', rerank_eval_str: str | None = None) None[source]
Initializes the Calculator object.
- Parameters:
df (pd.DataFrame) – The DataFrame to perform calculations on.
selected_columns (List[str]) – The names of the columns to include in calculations.
equation_type (str, optional) – The type of equation to use for score calculation. Defaults to “product”.
weights_for_groups (Optional[pd.Series], optional) – A Series containing weights for different groups. Defaults to None, which sets equal weights.
equation_eval_str (Optional[str], optional) – A string representing a custom equation for free-style calculations. Defaults to None.
rerank_eval_str (Optional[str], optional) – A string representing a custom equation for reranking. Defaults to None.
Methods
__init__(df, selected_columns[, ...])Initializes the Calculator object.
calculate_auc_triple_parameters(grid_interval)calculate_corrcoef(mask_column, ...)calculate_cumulative_deviation(mask_column, ...)calculate_distinct_count_portfolio_concentration(...)calculate_distinct_top_coverage(mask_column, ...)calculate_inverse_pair(target_column[, ...])calculate_log_mse(target_column[, ...])calculate_mean(mask_column, target_column, ...)calculate_neg_rank_ratio([label_column])calculate_portfolio_concentration(...)calculate_proportion(mask_column, ...)calculate_standard_deviation(mask_column, ...)calculate_tau(target_column, groupby[, ...])calculate_top_coverage(mask_column, ...)calculate_woauc(groupby, target_column[, ...])calculate_wuauc(mask_column, target_column, ...)clip_max(left, right)Clips the values in the right array or scalar to a maximum value specified by left.
clip_min(left, right)Clips the values in the right array or scalar to a minimum value specified by left.
create_score_columns(boundary_dict[, ...])Creates new columns in the DataFrame to categorize rows based on score boundaries.
get_overall_score(weights_for_equation)Calculates the overall score for each row in the DataFrame based on the specified equation type and weights.
initialize_fq_sampler(sample_size, score_column)Initializes a frequency sampler for a given score column and applies sampling results to create new columns.
initialize_local_dict(weights_for_equation, ...)Initializes a dictionary that can be used for additional calculations.
rerank_with_side_information()Reranks the rows in the DataFrame based on side information.
Calculates the negative average log10 magnitude of absolute values for selected columns in the dataframe, storing the result in self.value_scales.
- value_scale() None[source]
Calculates the negative average log10 magnitude of absolute values for selected columns in the dataframe, storing the result in self.value_scales.
- get_overall_score(weights_for_equation: List[float]) None[source]
Calculates the overall score for each row in the DataFrame based on the specified equation type and weights.
- Parameters:
weights_for_equation (List[float]) – A list of weights to apply to each selected column for the calculation.
- create_score_columns(boundary_dict: dict, score_column: str = 'score') None[source]
Creates new columns in the DataFrame to categorize rows based on score boundaries.
- Parameters:
boundary_dict (Dict) – A dictionary with score boundaries as keys and conditions as values.
score_column (str, optional) – The name of the column to apply the boundaries to. Defaults to “score”.
- initialize_fq_sampler(sample_size: int, score_column: str, slice_from: float | None = None, slice_to: float | None = None, log_scale: bool | None = True, laplace_smoothing: bool | None = True) None[source]
Initializes a frequency sampler for a given score column and applies sampling results to create new columns.
- Parameters:
sample_size (int) – The size of the sample to generate.
score_column (str) – The name of the score column to sample from.
slice_from (Optional[float], optional) – The lower bound of the score range to sample. Defaults to None.
slice_to (Optional[float], optional) – The upper bound of the score range to sample. Defaults to None.
log_scale (Optional[bool], optional) – Whether to use logarithmic scaling for sampling. Defaults to True.
laplace_smoothing (Optional[bool], optional) – Whether to apply Laplace smoothing to the sampling. Defaults to True.