paradance.evaluation.Calculator

class paradance.evaluation.Calculator(df: DataFrame, selected_columns: List[str], overall_score_lower_bound: float | None = None, overall_score_upper_bound: float | None = None, equation_type: str = 'product', weights_for_groups: Series | None = None, equation_eval_str: str | None = None, equation_json: Dict | None = None, delimiter: str | None = '#', rerank_eval_str: str | None = None)[source]

A calculator for processing and analyzing data within a DataFrame based on specified equations and methods.

df

The DataFrame to perform calculations on.

Type:: pd.DataFrame

selected_columns

The names of the columns to include in calculations.

Type:: List[str]

overall_score_lower_bound

The lower bound for overall scores.

Type:: Optional[float]

overall_score_upper_bound

The upper bound for overall scores.

Type:: Optional[float]

equation_eval_str

A string representing a custom equation to evaluate.

Type:: Optional[str]

equation_type

The type of equation to use for calculations (“product”, “sum”, “free_style”, or “json”).

Type:: str

selected_columns

Columns selected for calculations.

Type:: List[str]

selected_values

The values of the selected columns in the DataFrame.

Type:: np.ndarray

value_scales

The negative average log10 magnitude of absolute values for selected columns.

Type:: np.ndarray

weights_for_groups

A Series containing weights for different groups within the DataFrame.

Type:: pd.Series

__init__(df: DataFrame, selected_columns: List[str], overall_score_lower_bound: float | None = None, overall_score_upper_bound: float | None = None, equation_type: str = 'product', weights_for_groups: Series | None = None, equation_eval_str: str | None = None, equation_json: Dict | None = None, delimiter: str | None = '#', rerank_eval_str: str | None = None) → None[source]

Initializes the Calculator object.

Parameters:

df (pd.DataFrame) – The DataFrame to perform calculations on.
selected_columns (List[str]) – The names of the columns to include in calculations.
equation_type (str, optional) – The type of equation to use for score calculation. Defaults to “product”.
weights_for_groups (Optional[pd.Series], optional) – A Series containing weights for different groups. Defaults to None, which sets equal weights.
equation_eval_str (Optional[str], optional) – A string representing a custom equation for free-style calculations. Defaults to None.
rerank_eval_str (Optional[str], optional) – A string representing a custom equation for reranking. Defaults to None.

Methods

`__init__`(df, selected_columns[, ...])	Initializes the Calculator object.
`calculate_auc_triple_parameters`(grid_interval)
`calculate_corrcoef`(mask_column, ...)
`calculate_cumulative_deviation`(mask_column, ...)
`calculate_distinct_count_portfolio_concentration`(...)
`calculate_distinct_top_coverage`(mask_column, ...)
`calculate_inverse_pair`(target_column[, ...])
`calculate_log_mse`(target_column[, ...])
`calculate_mean`(mask_column, target_column, ...)
`calculate_neg_rank_ratio`([label_column])
`calculate_portfolio_concentration`(...)
`calculate_proportion`(mask_column, ...)
`calculate_standard_deviation`(mask_column, ...)
`calculate_tau`(target_column, groupby[, ...])
`calculate_top_coverage`(mask_column, ...)
`calculate_woauc`(groupby, target_column[, ...])
`calculate_wuauc`(mask_column, target_column, ...)
`clip_max`(left, right)	Clips the values in the right array or scalar to a maximum value specified by left.
`clip_min`(left, right)	Clips the values in the right array or scalar to a minimum value specified by left.
`create_score_columns`(boundary_dict[, ...])	Creates new columns in the DataFrame to categorize rows based on score boundaries.
`get_overall_score`(weights_for_equation)	Calculates the overall score for each row in the DataFrame based on the specified equation type and weights.
`initialize_fq_sampler`(sample_size, score_column)	Initializes a frequency sampler for a given score column and applies sampling results to create new columns.
`initialize_local_dict`(weights_for_equation, ...)	Initializes a dictionary that can be used for additional calculations.
`rerank_with_side_information`()	Reranks the rows in the DataFrame based on side information.
`value_scale`()	Calculates the negative average log10 magnitude of absolute values for selected columns in the dataframe, storing the result in self.value_scales.

value_scale() → None[source]: Calculates the negative average log10 magnitude of absolute values for selected columns in the dataframe, storing the result in self.value_scales.

get_overall_score(weights_for_equation: List[float]) → None[source]

Calculates the overall score for each row in the DataFrame based on the specified equation type and weights.

Parameters:: weights_for_equation (List[float]) – A list of weights to apply to each selected column for the calculation.

create_score_columns(boundary_dict: dict, score_column: str = 'score') → None[source]

Creates new columns in the DataFrame to categorize rows based on score boundaries.

Parameters:

boundary_dict (Dict) – A dictionary with score boundaries as keys and conditions as values.
score_column (str, optional) – The name of the column to apply the boundaries to. Defaults to “score”.

initialize_fq_sampler(sample_size: int, score_column: str, slice_from: float | None = None, slice_to: float | None = None, log_scale: bool | None = True, laplace_smoothing: bool | None = True) → None[source]

Initializes a frequency sampler for a given score column and applies sampling results to create new columns.

Parameters:

sample_size (int) – The size of the sample to generate.
score_column (str) – The name of the score column to sample from.
slice_from (Optional[float], optional) – The lower bound of the score range to sample. Defaults to None.
slice_to (Optional[float], optional) – The upper bound of the score range to sample. Defaults to None.
log_scale (Optional[bool], optional) – Whether to use logarithmic scaling for sampling. Defaults to True.
laplace_smoothing (Optional[bool], optional) – Whether to apply Laplace smoothing to the sampling. Defaults to True.