Grouping and aggregation are two powerful functionalities provided by pandas to easily summarize and analyze data. Let's dive into how to use these functionalities.
The primary method for grouping data in pandas is the groupby()
method. It splits the data into groups based on some criteria.
Example:
Consider a simple DataFrame:
import pandas as pd data = { 'Department': ['HR', 'IT', 'IT', 'Sales', 'HR'], 'Employee': ['John', 'Mike', 'Anna', 'Samantha', 'Chris'], 'Salary': [5000, 7000, 6200, 5500, 5300] } df = pd.DataFrame(data)
Group by Department
:
grouped = df.groupby('Department')
Once you've created a GroupBy object, you can compute aggregate values such as sum, mean, max, min, etc.
Example:
Sum of salaries in each department:
grouped['Salary'].sum()
You can use the agg()
method to perform multiple aggregations at once:
grouped['Salary'].agg(['sum', 'mean', 'min', 'max'])
Example:
Get the highest-paid employee in each department:
def top_salary(s): return s.sort_values(ascending=False).iloc[0] grouped['Salary'].agg(top_salary)
Using the agg()
method, you can specify which aggregations to apply to each column:
grouped.agg({ 'Salary': ['mean', 'sum', 'max'], 'Employee': 'count' })
You can also define custom aggregation functions:
def range_salary(s): return s.max() - s.min() grouped['Salary'].agg(range_salary)
By default, the grouped columns become indices in the aggregated dataframe. To reset the indices, you can use reset_index()
:
grouped['Salary'].sum().reset_index()
You can group by multiple columns by passing a list of columns:
data['Year'] = [2021, 2022, 2021, 2022, 2021] df = pd.DataFrame(data) grouped_multiple = df.groupby(['Department', 'Year']) grouped_multiple['Salary'].sum()
Grouping and aggregating are essential tools when analyzing data in pandas. Depending on the complexity of your dataset and the type of analysis you want to perform, you can combine these functionalities in numerous ways to extract meaningful insights from your data.
coronasdk scheduledexecutorservice runtimeexception sqlresultsetmapping pie-chart substitution alter-table mocking http-status-code-404 extract-text-plugin