Summarizing Data

VisiData makes it pleasantly easy to summarize your data. There are three main types of summaries: Frequency tables, one-off calculations, and the Describe Sheet.

Frequency tables

Frequency tables are dead-simple, but also quite powerful. For the dead-simple usage: Navigate to any column, and then press Shift-F. If you did that on the first column (“OPERATOR”) of the FAA dataset, you should get something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 OPERATOR           ↓count♯ percent% histogram                             ~                   
 UNKNOWN            ║ 23076 │   31.42 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ║
 SOUTHWEST AIRLINES   7752    10.55  ■■■■■■■■■■■■                            BUSINESS             5868     7.99  ■■■■■■■■■                               AMERICAN AIRLINES    4337     5.90  ■■■■■■■                                 DELTA AIR LINES      2817     3.84  ■■■■                                    FEDEX EXPRESS        2709     3.69  ■■■■                                    UNITED AIRLINES      2194     2.99  ■■■                                     US AIRWAYS           1885     2.57  ■■■                                     UPS AIRLINES         1773     2.41  ■■                                      SKYWEST AIRLINES     1769     2.41  ■■                                      JETBLUE AIRWAYS      1740     2.37  ■■                                      EXPRESSJET AIRLINES  1347     1.83  ■■                                      AMERICAN EAGLE AIR…  1041     1.42                                        ENVOY AIR             883     1.20                                      ALASKA AIRLINES       835     1.14                                        REPUBLIC AIRLINES     804     1.09                                        MESA AIRLINES         693     0.94                                        AIR WISCONSIN AIRL…   623     0.85                                        PSA AIRLINES          577     0.79                                          PRIVATELY OWNED       516     0.70                                          PHI INC               491     0.67                                         2› faa-wildlife-strikes_OPERATOR_freq|                                  Shift+F           282 bins 

With just one keystroke, VisiData has already told us something useful about the dataset: That the “operators” associated with 31% of the wildlife strikes are, according to the FAA, “unknown.” We have also learned, from the “bins” mini-report at the bottom-right of the screen, that there are 282 distinct values in the “OPERATOR” column.

Multi-column frequencies

Sometimes you want to count how often combinations of columns occur. VisiData also makes this easy. First, turn the columns you want to count into “key” columns, using the ! button. Then, type gF.

For instance, if we wanted to count the combinations of the “OPERATOR” and “PERSON” fields, we’d hit ! on each of those columns — either from the main data sheet or in the Columns Sheet. Once you’ve done that, you should see something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 OPERATOR            PERSON ║<BIRDS_STRUCK  EFFECT              DAMAGE  COST_REPAIRS  REMAINS>
 BUSINESS           │ Tower  ║ 1            │ NONE               │ M      │              │ 0        
 BUSINESS            Tower   1             NONE                M                     0        
 BUSINESS            Tower   1             NONE                M                     0        
 DELTA AIR LINES     Tower   1             NONE                M                     0        
 BUSINESS            Tower   1                                 M                     0        
 DELTA AIR LINES     Tower   1             Other               M?                    0        
 DELTA AIR LINES     Tower   1             NONE                M?                    0        
 BUSINESS            Tower   1             PRECAUTIONARY LAND… M?                    0        
 ALLEGIANT AIR       Tower   1             NONE                M?                    0        
 TRANS STATES AIRLI… Tower   1             NONE                M?                    0        
 BUSINESS            Tower   1             NONE                M?                    0        
 GOVERNMENT          Tower   1             NONE                M?                    0        
 AMERICAN AIRLINES   Tower   1             NONE                N                     0        
 EXPRESSJET AIRLINES Tower   1             NONE                N                     0        
 MESA AIRLINES       Tower   1             NONE                N                     0        
 BUSINESS            Tower   1             PRECAUTIONARY LAND… N                     0        
 DELTA AIR LINES     Tower   1             NONE                N                     0        
 DELTA AIR LINES     Tower   1             NONE                N                     0        
 LUFTHANSA           Tower   1             NONE                N                     0        
 BUSINESS            Tower   1             NONE                N                     0        
 SPIRIT AIRLINES     Tower   1             PRECAUTIONARY LAND… N                     0        
1› faa-wildlife-strikes|                                               !   key-col      73448 rows 

Then, type gF, which should result in something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 OPERATOR            PERSON ↓count♯ percent% histogram                             ~          
 UNKNOWN            │ Carcas…║ 22842 │   31.10 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ║
 SOUTHWEST AIRLINES  Pilot    5481     7.46  ■■■■■■■■■                               AMERICAN AIRLINES   Pilot    2240     3.05  ■■■                                     BUSINESS            Pilot    2061     2.81  ■■■                                     DELTA AIR LINES     Airpor…  1793     2.44  ■■                                      BUSINESS            Airpor…  1642     2.24  ■■                                      AMERICAN AIRLINES   Airpor…  1481     2.02  ■■                                      FEDEX EXPRESS       Air Tr…  1434     1.95  ■■                                      UNITED AIRLINES     Airpor…  1351     1.84  ■■                                      UPS AIRLINES        Air Tr…  1343     1.83  ■■                                      SOUTHWEST AIRLINES  Airpor…  1293     1.76  ■■                                      BUSINESS            Tower    1094     1.49                                        SKYWEST AIRLINES    Airpor…  1024     1.39                                        FEDEX EXPRESS       Pilot     963     1.31                                        EXPRESSJET AIRLINES Airpor…   785     1.07                                        JETBLUE AIRWAYS     Airpor…   749     1.02                                        US AIRWAYS          Air Tr…   672     0.91                                        US AIRWAYS          Airpor…   656     0.89                                        ALASKA AIRLINES     Airpor…   599     0.82                                          DELTA AIR LINES     Tower     572     0.78                                          AMERICAN EAGLE AIR… Pilot     535     0.73                                         2› faa-wildlife-strikes_OPERATOR-PERSON_freq|              processing… gShift+F           879 bins 

To make it easier to read, type g_, which will expand the column widths:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 OPERATOR                 PERSON                   ↓count♯ percent% histogram                  
 UNKNOWN                 │ Carcass Found            ║ 22842 │   31.10 │ ■■■■■■■■■■■■■■■■■■■■■■■■■■…
 SOUTHWEST AIRLINES       Pilot                      5481     7.46  ■■■■■■■■■        
 AMERICAN AIRLINES        Pilot                      2240     3.05  ■■■              
 BUSINESS                 Pilot                      2061     2.81  ■■■              
 DELTA AIR LINES          Airport Operations         1793     2.44  ■■               
 BUSINESS                 Airport Operations         1642     2.24  ■■               
 AMERICAN AIRLINES        Airport Operations         1481     2.02  ■■               
 FEDEX EXPRESS            Air Transport Operations   1434     1.95  ■■               
 UNITED AIRLINES          Airport Operations         1351     1.84  ■■               
 UPS AIRLINES             Air Transport Operations   1343     1.83  ■■               
 SOUTHWEST AIRLINES       Airport Operations         1293     1.76  ■■               
 BUSINESS                 Tower                      1094     1.49                 
 SKYWEST AIRLINES         Airport Operations         1024     1.39                 
 FEDEX EXPRESS            Pilot                       963     1.31                 
 EXPRESSJET AIRLINES      Airport Operations          785     1.07                 
 JETBLUE AIRWAYS          Airport Operations          749     1.02                 
 US AIRWAYS               Air Transport Operations    672     0.91                 
 US AIRWAYS               Airport Operations          656     0.89                 
 ALASKA AIRLINES          Airport Operations          599     0.82                   
 DELTA AIR LINES          Tower                       572     0.78                   
 AMERICAN EAGLE AIRLINES  Pilot                       535     0.73                   
2› faa-wildlife-strikes_OPERATOR-PERSON_freq|                 g_   resize-cols-max        879 bins 

With just a few more keystrokes, we’ve learned something else: Virtually all wildlife strikes with “unknown” operators were identified based on carcasses found at the airport (rather than reports from pilots, for example).

Adding “aggregators”

By default, frequency tables just count the number of times each value appears. But in VisiData, you can specify additional calculations by setting the column’s “aggregators”. (You might remember this field from the Columns Sheet.) In VisiData’s aggregators include min, max, mean, median, sum, distinct, and others.

To add an aggregator to a column, navigate to that column and press +. VisiData will then prompt you to specify which aggregator you would like to add. You can type out your desired aggregator, or use Control-x to open the interactive chooser and select from a list of options.

For example, let’s go back to the original FAA data sheet. Let’s navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column
  • Press + to tell VisiData you want to add an aggregator
  • At the prompt, type sum and then hit Enter to add the summation aggregator
  • Navigate to the “AIRPORT” column, and press Shift-F

You should see something like this, with the sum calculation now appearing in your frequency table:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 AIRPORT            ↓count♯ COST_REPAIRS_sum#                                                   
 UNKNOWN            ║  8424 │         17931422 ║                                                   
 DENVER INTL AIRPORT  2756           9033148                                                    
 DALLAS/FORT WORTH …  2392            927920                                                    
 CHICAGO O'HARE INT…  1583           1146738                                                    
 JOHN F KENNEDY INTL  1302            310917                                                    
 MEMPHIS INTL         1217           4507487                                                    
 SALT LAKE CITY INTL  1179           6031324                                                    
 PHILADELPHIA INTL    1131           2213030                                                    
 ORLANDO INTL         1026             71500                                                    
 SACRAMENTO INTL      1021           3634816                                                    
 LA GUARDIA ARPT       974            381504                                                    
 CHARLOTTE/DOUGLAS …   960             86960                                                    
 NEWARK LIBERTY INT…   917            506489                                                    
 LOUISVILLE INTL AR…   841           1366299                                                    
 AUSTIN-BERGSTROM I…   817            545500                                                    
 LOUIE ARMSTRONG NE…   809            406000                                                    
 KANSAS CITY INTL      807             75660                                                    
 HARTSFIELD - JACKS…   775            284232                                                    
 GEORGE BUSH INTERC…   746             52000                                                    
 DETROIT METRO WAYN…   734             48983                                                    
 BALTIMORE/WASH INT…   691            339300                                                    
3› faa-wildlife-strikes_AIRPORT_freq|                                   Shift+F          1512 bins 

By default, frequency tables are sorted by the “count” column, but you can sort them by any other column.

Note

When using aggregators, make sure that you’ve assigned the proper type (# for integer columns, etc.) to the columns of interest, so that VisiData knows how to calculate the aggregations correctly.

One-off calculations

From any data sheet, you can also run a single calculation on all rows — or all selected rows — in a column. To do that, navigate to the column and type z+, which will bring up the same aggregator-choice prompt as above. Type the aggregator you want, and press Enter. At the bottom of the screen, you’ll see the result of the calculation.

You can try this on the FAA data sheet we’ve been working with. Navigate to the “COST_REPAIRS” column, and then do the following:

  • Press # to tell VisiData this is an integer column (if you haven’t already)
  • Type z+ to see the result of an aggregator
  • Type sum and then hit Enter

At the bottom of the screen, you should see something like this, indicating that the total reported cost of repairs is $161,868,071:

 DELTA AIR LINES     Tower   1             NONE                N                   ! 0       
 LUFTHANSA           Tower   1             NONE     ┌─────────────────────────────| statuses |─┐
 BUSINESS            Tower   1             NONE     │ COST_REPAIRS_sum=161868071                SPIRIT AIRLINES     Tower   1             PRECAUTIO└──────────────────────────────────────────┘
1› faa-wildlife-strikes|                                       z+   memo-aggregate      73448 rows 

Summarizing all columns

To get a bird’s-eye view of your entire dataset, press Shift-I, which will provide summary statistics for each of your columns:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 column             errors  ♯ nulls   ♯ distinct♯ mode    ~ min     ~ max     ~ sum       >
 OPERATOR          ║        0 │        0 │      282 │ UNKNOWN  │          │          │          │  
 ATYPE                     0         0       400  UNKNOWN                                  
 INCIDENT_DATE             0         0      2343  10/30/14…                                
 STATE                     0         0        63                                           
 AIRPORT                   0         0      1512  UNKNOWN                                  
 PHASE_OF_FLT              0         0        21                                           
 HEIGHT                    0         0       319                                           
 SPEED                     0         0       225                                           
 SPECIES                   0         0       641  Unknown …                                
 BIRDS_STRUCK              0         0         5  1                                        
 EFFECT                    0         0         8  None                                     
 DAMAGE                    0         0         6  N                                        
 COST_REPAIRS      ║    72145 │        0 │      478 │ 5000     │ 10       │ 6500000  │ 1618680…#│ 1
 PERSON            ║        0 │        0 │        7 │ Carcass …│          │          │          │  
 REMAINS_COLLECTED         0         0         2  1                                        
 REMARKS                   0         0     61240                                           





4› faa-wildlife-strikes_describe|                                    Shift+I            16 columns 

In VisiData, this is called the “Describe Sheet”. You’ll notice that there are only min/max/median/etc. calculations for the columns we’ve given types — just the COST_REPAIRS column so far. If we go back to the data sheet and tell VisiData that the HEIGHT, SPEED, and BIRDS_STRUCK fields are numbers, too, then pressing Shift-I will result in something like this:

  File  Edit  View  Column  Row  Data  Plot  System  Help     | VisiData 3.0.2 | Alt+H for help men
 column             errors  ♯ nulls   ♯ distinct♯ mode    ~ min     ~ max     ~ sum       >
 OPERATOR          ║        0 │        0 │      282 │ UNKNOWN  │          │          │          │  
 ATYPE                     0         0       400  UNKNOWN                                  
 INCIDENT_DATE             0         0      2343  10/30/14…                                
 STATE                     0         0        63                                           
 AIRPORT                   0         0      1512  UNKNOWN                                  
 PHASE_OF_FLT              0         0        21                                           
 HEIGHT                35629         0       318  0         0         31300     33937719# 5
 SPEED                 48384         0       224  140       0         374       3614411 # 1
 SPECIES                   0         0       641  Unknown …                                
 BIRDS_STRUCK      ║     7966 │        0 │        1 │ 1        │ 1        │ 1        │ 65482   #│ 1
 EFFECT                    0         0         8  None                                     
 DAMAGE                    0         0         6  N                                        
 COST_REPAIRS          72145         0       478  5000      10        6500000   1618680…# 1
 PERSON            ║        0 │        0 │        7 │ Carcass …│          │          │          │  
 REMAINS_COLLECTED         0         0         2  1                                        
 REMARKS                   0         0     61240                                           





5› faa-wildlife-strikes_describe|                                    Shift+I            16 columns