This is Part 3/4
The first step of any process should start with get and transform / power query. Even if you have no transformation steps this can be valuable. In Excel 2010 and Excel 2013 Power Query was an add-in. In Excel 2016 Power Query has been renamed to Get and Transform. Power Query / Get and Transform allows us to easily perform tasks that use to require Excel VBA. Learn the hidden tweak that makes Excel 2016 Power Query more flexible and gets it back to the way it use to work in Excel 2010 and Excel 2013. By linking with power query we can refresh our Excel dashboards with a click of a button and focus more on our Excel analytics.
Power Query (aka Get and Transform) is capable of merging data with database joins all without requiring a database. Power Query merge / join in most cases is the better alternative to vlookups and temporary / desktop database solutions. Vlookup calculations are the number one cause of slow performance when developing our Excel dashboards or Excel Analytics. No Excel VBA required.
Use Power Query (aka Get and Transform) to merge a whole folder of files together. Learn about what to look out for when using Power Query for this feature. This helps to support our Excel dashboard development and Excel analytics when working with lots of files. No Excel VBA required.
Data resulting from pivot tables often do not confirm to a tidy data format. We can correct for this using Power Query (aka Get and Transform). Power Query fill and Power Query Unpivot are powerful features that can save you a ton of time and allow you to do complex VBA like functions without having to write any VBA code. Easily transform your data ready for your Excel Dashboard or Excel data analytics.
If you're looking to learn Advanced Excel, Excel VBA or Databases then you need to check out this video series. In this videos series, I will show you how to use Microsoft Excel in different ways that will make you far more effective at working with data. I'm also going to expand your knowledge beyond Excel and show you tips, tricks, and tools from other top data analytics tools such as R Tidyverse, Python, Data Visualisation tools such as Tableau, Qlik View, Qlik Sense, Plotly, AWS Quick Sight and others. We'll start to touch on areas such as big data, machine learning, and cloud computing and see how you can develop your data skills to get involved in these exciting areas.
Excel Formulas such as vlookup and sumifs are some of the top reasons for slow spreadsheets. Alternatives for vlookup include power query (Excel 2010 and Excel 2013) which has recently been renamed to Get and Transform in Excel 2016. Large and complex vlookup formulas can be also done very efficiently in R. Using the R Tidyverse libraries you can use the join functions to merge millions of records effortlessly. In comparison to Excel Vlookup, R Tidyverse Join can pull on multiple columns all at the same time. Microsoft Excel Power Query and R Tidyverse Joins are similar to the joins that you do in databases / SQL. The benefit that they have over relational databases such as Microsoft Access, Microsoft SQL Server, MySQL, etc is that they work in memory so they are actually much faster than a database. Also since they are part of an analytics tool instead of a database it is much faster and easier to build your analysis and queries all in the same tools.
My very first R Tidyverse program was written to replace a Microsoft Access VBA solution which was becoming complicated and slow. Note that Microsoft Access is very limited in analytics functions and is missing things as simple as Median. Even though I had to learn R programming from scratch and completely re-write the Microsoft Access VBA solution it was so much easier and faster. It blew my mind how much easier R programming with R Tidyverse was than Microsoft Access VBA or Microsoft Excel VBA. If you have any VBA skills or are looking to learn VBA you should definitely checkout my videos on R Tidyverse. To understand why R Tidyverse is so much easier to work with than VBA. R Tidyverse is designed to work directly with your data.
Note the most efficient path is to reduce the data pulled down from the database in the first place. This is referring to the amount of data you are pulling down from your data warehouse or data lake. It makes no sense to pull data from a data warehouse / data lake to pull into another database to query add joins / lookups to then pull it into Excel or other analysis tool. Often analyst build these intermediate databases because they either don’t have control of the data warehouse or they need to join additional information. All of these operations are done significantly faster in a tool such as R Tidyverse or Microsoft Excel Power Query.