Power BI ETL and simple visuals with Python matplotlib

Step-by-step instruction

  1. Open Power BI and navigate to the "Get Data" section. Click on "More options" and select the "Python" option.

  2. In the Python script editor, start by importing the pandas library as pd. This library will be used to handle the data.

  3. Create a variable called "my_data" and assign it the value of reading the Excel file using pandas. Specify the file location as "d:\Python tutorials\Power BI advanced dashboard advanced\dashboard advanced.xlsx".

  4. Use the pivot table function to summarize the data. Create a variable called "summary" and assign it the result of the pivot table operation. Set the index to the "city" column and use the "sum" aggregation function to calculate the sum of revenue and duration for each city. Use the "reset_index()" function to include the city names as part of the resulting table.

  5. Press "OK" to initiate the data processing. Explain that this step may take a while as Power BI negotiates with the Excel file and retrieves the data. Once the processing is complete, two items should appear: "my_data" and "summary". Mention that we will be working with the "summary" table.

  6. Load the "summary" table into Power BI by selecting it and clicking on "Load".

  7. Switch to the report view in Power BI. Here, we will create the visualizations for the report.

  8. Add a Python visual to the canvas. Adjust the size to occupy only half of the canvas, as we will be creating two charts.

  9. Add the "city" and "revenue" columns to create a column chart. Import the matplotlib library as "mb" and matplotlib.pyplot as "plt" to handle the visualization aspects in Python.

  10. Increase the font size of the chart for better readability. Use "mb.rcParams['font.size'] = 19" to set the font size to 19.

  11. Retrieve the data from the "data_set" DataFrame. Create a variable called "numbers" and assign it the values from the "revenue" column. Create another variable called "names" and assign it the values from the "city" column. Convert both variables into lists using the "tolist()" function.

  12. Use the "plt.bar" function to create the column chart. Provide the "names" and "numbers" lists as arguments. Set the color of the bars to red by specifying "color='red'". Adjust the width of the columns to 0.5 by setting "width=0.5".

  13. To display the column chart, use the "plt.show" function.

  14. Copy the code for the pie chart visualization and add another Python visual to the canvas.

  15. Add the "city" and "duration" columns to create the pie chart. Keep the previous code intact, but make the following changes: replace the variable name "revenue" with "duration" throughout the code, including the labels in the chart.

  16. Format the percentage labels on the pie chart. Set the "auto_percentage" parameter to "0.1" to display one decimal place for each slice.

  17. Enhance one slice of the pie chart for emphasis. Set the "explode" parameter as "[0, 0, 0, 0, 0, 0, 0.1]" to explode the last slice slightly.

  18. Run the code to generate the pie chart visualization.

  19. Mention that you have discussed converting the percentage labels into numbers using a lambda function in another video. Provide the link to that video in the video description for interested viewers to explore.

Комментарии

Популярные сообщения из этого блога

Today's activity report #17