You can try it out by running this query: SELECT AVG(depdelay) The syntax and the logic are the same as it was for the previous two SQL functions. Well, like it or not, in SQL we have *AVG* for mean. In Python/pandas, for example, the function to calculate the mean is actually called mean - and then there is another one called median to calculate median. And they are right! Median and mode are also averages. Note: well, I have to add that many data scientists find it a bit lazy and ambiguous that in SQL the general word of “average” ( AVG) is used for one specific average type: mean. In SQL, the function called AVG (which of course stands for “average”) returns the mean… so the average type is what we expect from it. (A quick reminder: mean is calculated by calculating the sum of all values in a dataset, then dividing it by the number of values.) But we usually refer to the average type called mean - when we say “average” in everyday life. It’s important to know that there are many types of statistical averages in mathematics. Our next challenge is to calculate the average arrival delay value and the average departure delay value. The total airtime is a massive 748015545 minutes. In this case, it’ll be the airtime column. The only difference is that in the case of SUM you can’t use * - you’ll have to specify a column. The SUM function works with a similar logic as COUNT does. In other words: get the sum of all values in the airtime column. ![]() Now we want to get the airtime for all flights – added up. But I figured that sometimes it might be better to keep it there, so even if you mistype something, your SQL Workbench won’t freeze by accidentally trying to return 7M+ lines of data. ![]() Note 2: in fact, you won’t need the LIMIT clause in this SQL query, as you will have only one line of data on your screen. Note 1: This is true only when you don’t have NULL values (empty cells) in your table! (We don’t have NULL values in the flight_delays data set at all.) I’ll get back to the importance of NULL later. So yes, this means that we have 7275288 lines in our flight_delays table. Copy this query into SQL Workbench and run it: SELECT COUNT(*) Now, let’s expand this list with functions. Remember? It can be everything ( *), or it can be specific columns ( arrdelay, depdelay, etc). The only thing you have to change – compared to the above base query – is what you SELECT from your table. And this is what the COUNT function is for. The easiest aggregation function is to count lines in your SQL table. Okay, let’s see this! SQL COUNT function. So please spare no effort at this point: type in everything you see here into your SQL manager, too, and build a solid foundation of knowledge! Coding is the easiest to learn by doing it. But again: make sure you are doing the coding part with me. Getting the answers to all these questions is going to be very easy, I promise.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |