# 1. Intro to basics

## 1.1 Arithmetic with R

• Addition: `+`
• Subtraction: `-`
• Multiplication: `*`
• Division: `/`
• Exponentiation 幂运算: `^`
• Modulo 取余: `%%`, `如 5 %% 3 is 2.`
• x %/% y，如`7 %/% 3 = 2`

## 1.3 Basic data types in R

• Decimals values like 4.5 are called `numerics`.
• Natural numbers like 4 are called integers. Integers are also `numerics`.
• Boolean values (TRUE or FALSE) are called `logical`.
• Text (or string) values are called `characters`.
``````> # Declare variables of different types
> my_numeric <- 42
> my_character <- "universe"
> my_logical <- FALSE
>
> # Check class of my_numeric
> class(my_numeric)
[1] "numeric"

> # Check class of my_character
> class(my_character)
[1] "character"

> # Check class of my_logical
> class(my_logical)
[1] "logical"

``````

# 2. Vectors

## 2.1 Create a vector

Vectors are one-dimension arrays that can hold numeric data, character data, or logical data. In other words, a vector is a simple tool to store data.

vector使用`c()`连接函数创建，如下示例，注意vector中可以混合不同的数据类型:

``````> numeric_vector <- c(1, 10, 49)
> numeric_vector
[1]  1 10 49
>
> character_vector <- c("a", "b", "c")
> character_vector
[1] "a" "b" "c"
>

> boolean_vector <- c(TRUE,"a",TRUE)
> boolean_vector
[1] "TRUE" "a"    "TRUE"
``````

## 2.2 Naming a vector

``````> # Poker winnings from Monday to Friday
> poker_vector <- c(140, -50, 20, -120, 240)
>
> # Roulette winnings from Monday to Friday
> roulette_vector <- c(-24, -50, 100, -350, 10)
>
> # Assign days as names of poker_vector
> names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> poker_vector
Monday   Tuesday Wednesday  Thursday    Friday
140       -50        20      -120       240
>
> # Assign days as names of roulette_vectors
> names(roulette_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> roulette_vector
Monday   Tuesday Wednesday  Thursday    Friday
-24       -50       100      -350        10
``````

``````> # The variable days_vector
> days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
>
> # Assign the names of the day to roulette_vector and poker_vector
> names(poker_vector) <-   days_vector
> names(roulette_vector) <- days_vector
``````

## 2.3 计算vector

• 一维数组的计算:
``````> A_vector <- c(1, 2, 3)
> B_vector <- c(4, 5, 6)
>
> # Take the sum of A_vector and B_vector
> total_vector <- A_vector + B_vector
>
> # Print out total_vector
> total_vector
[1] 5 7 9
``````
• 如果维度不同会出现警告信息，但是可以计算:
``````> A_vector <- c(1, 2, 3,4,5)
> B_vector <- c(4, 5, 6)

> # Take the sum of A_vector and B_vector
> total_vector <- A_vector + B_vector
Warning message:
In A_vector + B_vector :
longer object length is not a multiple of shorter object length

> # Print out total_vector
> total_vector
[1]  5  7  9  8 10
``````
• 如果计算用的vector有name属性，相加之后，name属性会被结果继承:
``````> # Poker and roulette winnings from Monday to Friday:
> poker_vector <- c(140, -50, 20, -120, 240)
> roulette_vector <- c(-24, -50, 100, -350, 10)
> days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> names(poker_vector) <- days_vector
> names(roulette_vector) <- days_vector

> # Assign to total_daily how much you won/lost on each day
> total_daily <- poker_vector + roulette_vector
> total_daily
Monday   Tuesday Wednesday  Thursday    Friday
116      -100       120      -470       250
>
``````
• sum用于计算vector内各元素的和:
``````> # Poker and roulette winnings from Monday to Friday:
> poker_vector <- c(140, -50, 20, -120, 240)
> roulette_vector <- c(-24, -50, 100, -350, 10)
> days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> names(poker_vector) <- days_vector
> names(roulette_vector) <- days_vector

> # Total winnings with poker
> total_poker <- sum(poker_vector)

> # Total winnings with roulette
> total_roulette <-  sum(roulette_vector)

> # Total winnings overall
> total_week <- total_poker + total_roulette

> # Print out total_week
> total_week
[1] -84

``````
• mean(poker_start)用于计算平均值。

## 2.4 通过下标取vector中的元素

• select the `first` element of the vector, you type poker_vector[1].
• To select the `second` element of the vector, you type poker_vector[2]
• 选取第一天和第五天: use the vector c(1, 5)，`poker_vector[c(1, 5)]`
• 选取第一天至第五天，`poker_vector[1:5]`
• 还可以通过name 标签选取 `poker_vector[c("Monday","Tuesday")]`

## 2.5 通过比较运算符选取

• `<` for less than
• `>` for greater than
• `<=` for less than or equal to
• `>=` for greater than or equal to
• `==` for equal to each other
• `!=` not equal to each other
``````> c(4, 5, 6) > 5
[1] FALSE FALSE TRUE
``````

``````> poker_vector <- c(140, -50, 20, -120, 240)
> roulette_vector <- c(-24, -50, 100, -350, 10)
> days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
> names(poker_vector) <- days_vector
> names(roulette_vector) <- days_vector
>
> selection_vector <- poker_vector > 0
>
> poker_winning_days <- poker_vector[selection_vector]
> poker_winning_days
Monday Wednesday    Friday
140        20       240
``````

# 3. Matrices 矩阵

1. 第一个参数表示填充到矩阵的参数，`c(1,2,3,4,5,6,7,8,9)`与其等同。
2. 第二个参数byrow，表示是否是按照行进行填充，TRUE的话，从行开始填充，否则从列开始填充
3. 第三个参数表示行数目。
``````> # Construct a matrix with 3 rows that contain the numbers 1 up to 9
> matrix(1:9, byrow = TRUE, nrow = 3)
[,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

> # Construct a matrix with 3 rows that contain the numbers 1 up to 9
> matrix(1:9, byrow = FALSE, nrow = 3)
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
``````

## 3.1 分析矩阵

``````> # Box office Star Wars (in millions!)
> new_hope <- c(460.998, 314.4)
> empire_strikes <- c(290.475, 247.900)
> return_jedi <- c(309.306, 165.8)

> # Create box_office
> box_office <- c(new_hope,empire_strikes,return_jedi)

> # Construct star_wars_matrix
> star_wars_matrix <- matrix(box_office,byrow=TRUE,nrow=3)
> star_wars_matrix
[,1]  [,2]
[1,] 460.998 314.4
[2,] 290.475 247.9
[3,] 309.306 165.8
``````

## 3.2 矩阵命名

``````# Box office Star Wars (in millions!)
> new_hope <- c(460.998, 314.4)
> empire_strikes <- c(290.475, 247.900)
> return_jedi <- c(309.306, 165.8)
>
> # Construct matrix
> star_wars_matrix <- matrix(c(new_hope, empire_strikes, return_jedi), nrow = 3, byrow = TRUE)
>
> # Vectors region and titles, used for naming
> region <- c("US", "non-US")
> titles <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi")
>
> # Name the columns with region
> colnames(star_wars_matrix) <- region
>
> # Name the rows with titles
> rownames(star_wars_matrix) <- titles
>
> # Print out star_wars_matrix
> star_wars_matrix
US non-US
A New Hope              460.998  314.4
The Empire Strikes Back 290.475  247.9
Return of the Jedi      309.306  165.8

``````

## 3.3 计算

• rowSums()，行求和 注意下面的`matrix()`，在定义矩阵的时候就直接给其命名了，另外通过`rowSums`计算每行数据的和，即每部电影的所有票房。
``````# Calculate worldwide box office figures
> worldwide_vector <- rowSums(star_wars_matrix)
> # Construct star_wars_matrix
> box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)

> star_wars_matrix <- matrix(box_office, nrow = 3, byrow = TRUE,
dimnames = list(c("A New Hope", "The Empire Strikes Back", "Return of the Jedi"),
c("US", "non-US")))

> star_wars_matrix
US non-US
A New Hope              460.998  314.4
The Empire Strikes Back 290.475  247.9
Return of the Jedi      309.306  165.8

> # Calculate worldwide box office figures
> worldwide_vector <- rowSums(star_wars_matrix)
> worldwide_vector
A New Hope The Empire Strikes Back      Return of the Jedi
775.398                 538.375                 475.106
>
``````

## 3.4 矩阵合并

• cbind()，列增加

`````` # Construct star_wars_matrix
> box_office <- c(460.998, 314.4, 290.475, 247.900, 309.306, 165.8)
> star_wars_matrix <- matrix(box_office, nrow = 3, byrow = TRUE,
dimnames = list(c("A New Hope", "The Empire Strikes Back", "Return of the Jedi"),
c("US", "non-US")))
>
> # The worldwide box office figures
> worldwide_vector <- rowSums(star_wars_matrix)
> worldwide_vector
A New Hope The Empire Strikes Back      Return of the Jedi
775.398                 538.375                 475.106
>
> # Bind the new variable worldwide_vector as a column to star_wars_matrix
> all_wars_matrix <- cbind(star_wars_matrix,worldwide_vector)
> all_wars_matrix
US non-US worldwide_vector
A New Hope              460.998  314.4          775.398
The Empire Strikes Back 290.475  247.9          538.375
Return of the Jedi      309.306  165.8          475.106
``````
• rbind()，行增加
``````> # star_wars_matrix and star_wars_matrix2 are available in your workspace
> star_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
> star_wars_matrix2
US non-US
The Phantom Menace   474.5  552.5
Attack of the Clones 310.7  338.7
Revenge of the Sith  380.3  468.5
>
> # Combine both Star Wars trilogies in one matrix
> all_wars_matrix <- cbind(star_wars_matrix,star_wars_matrix2)
> all_wars_matrix
US non-US    US non-US
A New Hope              461.0  314.4 474.5  552.5
The Empire Strikes Back 290.5  247.9 310.7  338.7
Return of the Jedi      309.3  165.8 380.3  468.5

# Combine both Star Wars trilogies in one matrix
> all_wars_matrix <- rbind(star_wars_matrix,star_wars_matrix2)
> all_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
``````
• colSums()，列求和
`````` # all_wars_matrix is available in your workspace
> all_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
>
> # Total revenue for US and non-US
> total_revenue_vector <- colSums(all_wars_matrix)
>
> # Print out total_revenue_vector
> total_revenue_vector
US non-US
2226.3 2087.8
``````

## 3.5 矩阵中的元素选取

• my_matrix[1,2]，选取第一行和第二列
• my_matrix[1:3,2:4] ，选取1,2,3行 的 2,3,4列
• my_matrix[,1],所有行的第一列
• my_matrix[1,]，所有列的第一行
``````> # all_wars_matrix is available in your workspace
> all_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
>
> # Select the non-US revenue for all movies
> non_us_all <- all_wars_matrix[,2]
>
> # Average non-US revenue
> mean(non_us_all)
[1] 347.9667
>
> # Select the non-US revenue for first two movies
> non_us_some <- all_wars_matrix[1:2,2]
>
> # Average non-US revenue for first two movies
> mean(non_us_some)
[1] 281.15
``````

## 3.6 矩阵的数学运算

`+, -, /, *`，这些标准数学运算符，同样适合于矩阵中，针对矩阵中每个元素进行运算。

`````` # all_wars_matrix and ticket_prices_matrix are available in your workspace
> all_wars_matrix
US non-US
A New Hope              461.0  314.4
The Empire Strikes Back 290.5  247.9
Return of the Jedi      309.3  165.8
The Phantom Menace      474.5  552.5
Attack of the Clones    310.7  338.7
Revenge of the Sith     380.3  468.5
> ticket_prices_matrix
US non-US
A New Hope              5.0    5.0
The Empire Strikes Back 6.0    6.0
Return of the Jedi      7.0    7.0
The Phantom Menace      4.0    4.0
Attack of the Clones    4.5    4.5
Revenge of the Sith     4.9    4.9
>
> # Estimated number of visitors
> visitors <- all_wars_matrix / ticket_prices_matrix
>
> # US visitors
> us_visitors <- all_wars_matrix[,1]/ticket_prices_matrix[,1]
>
> # Average number of US visitors
> mean(us_visitors)
[1] 75.01401
``````

# 4. Factors(因素,因子)

factor是一种用于存储分类变量(categorical variables)的统计型数据类型，分类变量从属于一组有限个数的分类集合，比如性别。连续变量(continuous variable)对应无限的数据值。

## 4.1 factor()

``````> # Gender vector
> gender_vector <- c("Male", "Female", "Female", "Male", "Male")
>
> # Convert gender_vector to a factor
> factor_gender_vector <-factor(gender_vector)
>
> # Print out factor_gender_vector
> factor_gender_vector
[1] Male   Female Female Male   Male
Levels: Female Male
``````

• nominal categorical variable，比如`猩猩，大象，鳄鱼`等分类之间没有等级和大小之分
• ordinal categorical variable，比如`大，中，小`，分类之间有等级之分。

## 4.2 Factor levels

``````> # Code to build factor_survey_vector
> survey_vector <- c("M", "F", "F", "M", "M")
> factor_survey_vector <- factor(survey_vector)
> factor_survey_vector
[1] M F F M M
Levels: F M
>
> # Specify the levels of factor_survey_vector
> levels(factor_survey_vector) <-c("Female", "Male")
>
> factor_survey_vector
[1] Male   Female Female Male   Male
Levels: Female Male
``````

## 4.3 summary()概要函数

``````> # Build factor_survey_vector with clean levels
> survey_vector <- c("M", "F", "F", "M", "M")
> factor_survey_vector <- factor(survey_vector)
> levels(factor_survey_vector) <- c("Female", "Male")
> factor_survey_vector
[1] Male   Female Female Male   Male
Levels: Female Male
>
> # Generate summary for survey_vector
> summary(survey_vector)
Length     Class      Mode
5 character character
>
> # Generate summary for factor_survey_vector
> summary(factor_survey_vector)
Female   Male
2      3

``````

## 4.4 factor的比较运算

factor在使用下标运算后，得到的factor，其level不变。factor之间不能进行比较运算。

`````` survey_vector <- c("M", "F", "F", "M", "M","N")
> factor_survey_vector <- factor(survey_vector)
> levels(factor_survey_vector) <- c("Female", "Male","Newhalf")
>
> # Male
> male <- factor_survey_vector[1]
> male
[1] Male
Levels: Female Male Newhalf
>
> # Female
> female <- factor_survey_vector[2]
> female
[1] Female
Levels: Female Male Newhalf
>
> # Battle of the sexes: Male 'larger' than female?
> male > female
Warning message: '>' not meaningful for factors
[1] NA
``````

## 4.5 Ordered factors

``````factor(some_vector,
ordered = TRUE,
levels = c("lev1", "lev2" ...))
``````
``````> # Create speed_vector
> speed_vector <- c("fast", "slow", "slow", "fast", "insane")
>
> # Convert speed_vector to ordered factor vector
> factor_speed_vector <- factor(speed_vector,ordered = TRUE,levels=c("slow","fast","insane"))
>
> # Print factor_speed_vector
> factor_speed_vector
[1] fast   slow   slow   fast   insane
Levels: slow < fast < insane
> summary(factor_speed_vector)
slow   fast insane
2      2      1
``````

## 4.6 比较ordered facotrs

ordered factors在定义的时候，已经给出了顺序，所以能使用比较运算符

``````> # Create factor_speed_vector
> speed_vector <- c("fast", "slow", "slow", "fast", "insane")
> factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "fast", "insane"))
>
> # Factor value for second data analyst
> da2 <- factor_speed_vector[2]
>
> # Factor value for fifth data analyst
> da5 <- factor_speed_vector[5]
>
> # Is data analyst 2 faster than data analyst 5?
> da2 > da5
[1] FALSE
``````

# 5. Data frames

``````
mtcars
mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> tail(mtcars)
mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2
>
``````
• 通过`str()`也可以获取data frame的概要
``````> # Investigate the structure of mtcars
> str(mtcars)
'data.frame':	32 obs. of  11 variables:
\$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
\$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
\$ disp: num  160 160 108 258 360 ...
\$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
\$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
\$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
\$ qsec: num  16.5 17 18.6 19.4 17 ...
\$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
\$ am  : num  1 1 1 0 0 0 0 0 0 0 ...
\$ gear: num  4 4 4 3 3 3 3 4 4 4 ...
\$ carb: num  4 4 1 1 2 1 4 2 2 4 ...
``````

## 5.1 创建 data frame `data.frame()`

``````> # Definition of vectors
> name <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune")
> type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet",
"Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
> diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883)
> rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67)
> rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE)

> # Create a data frame from the vectors
> planets_df <- data.frame(name,type,diameter,rotation,rings)
> planets_df
name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
3   Earth Terrestrial planet    1.000     1.00 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
5 Jupiter          Gas giant   11.209     0.41  TRUE
6  Saturn          Gas giant    9.449     0.43  TRUE
7  Uranus          Gas giant    4.007    -0.72  TRUE
8 Neptune          Gas giant    3.883     0.67  TRUE

> # Check the structure of planets_df
> str(planets_df)
'data.frame':	8 obs. of  5 variables:
\$ name    : Factor w/ 8 levels "Earth","Jupiter",..: 4 8 1 3 2 6 7 5
\$ type    : Factor w/ 2 levels "Gas giant","Terrestrial planet": 2 2 2 2 1 1 1 1
\$ diameter: num  0.382 0.949 1 0.532 11.209 ...
\$ rotation: num  58.64 -243.02 1 1.03 0.41 ...
\$ rings   : logi  FALSE FALSE FALSE FALSE TRUE TRUE ...
``````

``````> # Adapt the code to select all columns for planets with rings
> rings_vector
[1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
> planets_df[rings_vector, ]
name      type diameter rotation rings
5 Jupiter Gas giant   11.209     0.41  TRUE
6  Saturn Gas giant    9.449     0.43  TRUE
7  Uranus Gas giant    4.007    -0.72  TRUE
8 Neptune Gas giant    3.883     0.67  TRUE
>
``````

## 5.2 通过subset()获取子dataframe

``````> subset(planets_df, subset = diameter < 1)
name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
>
``````

## 5.3 排序

``````> a <- c(100, 10, 1000)
> order(a)
[1] 2 1 3
> a[order(a)]
[1]   10  100 1000
``````

``````# planets_df is pre-loaded in your workspace
>
> # Use order() to create positions
> positions <-  order(planets_df\$diameter)
> positions
[1] 1 4 2 3 8 7 6 5
>
> # Use positions to sort planets_df
> planets_df[positions,]
name               type diameter rotation rings
1 Mercury Terrestrial planet    0.382    58.64 FALSE
4    Mars Terrestrial planet    0.532     1.03 FALSE
2   Venus Terrestrial planet    0.949  -243.02 FALSE
3   Earth Terrestrial planet    1.000     1.00 FALSE
8 Neptune          Gas giant    3.883     0.67  TRUE
7  Uranus          Gas giant    4.007    -0.72  TRUE
6  Saturn          Gas giant    9.449     0.43  TRUE
5 Jupiter          Gas giant   11.209     0.41  TRUE
>
``````

# 6. Lists

• vectors,能存储数值，字符和bool，vector中的数据有相同的数据类型
• matrices，二维，也只能存储相同数据类型
• data frames,二维，同一列数据类型相同，但是不同列之间可以是不同的数据类型

## 6.1 生成一个list，`list()`

list中可以存储完全不同的数据结构。

`````` # Vector with numerics from 1 up to 10
> my_vector <- 1:10
>
> # Matrix with numerics from 1 up to 9
> my_matrix <- matrix(1:9, ncol = 3)
>
> # First 10 elements of the built-in data frame mtcars
> my_df <- mtcars[1:10,]
>
> # Construct list with these different elements:
> my_list <- list(my_vector,my_matrix,my_df)
> my_list
[[1]]
[1]  1  2  3  4  5  6  7  8  9 10

[[2]]
[,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

[[3]]
mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
``````

## 6.2 给list命名

``````# 定义时命名
my_list <- list(name1 = your_comp1,
name2 = your_comp2)

# 定义之后再命名
my_list <- list(your_comp1, your_comp2)
names(my_list) <- c("name1", "name2")
``````
`````` # Finish the code to build shining_list
> shining_list <- list(moviename = mov,actors=act,reviews=rev)
> shining_list
\$moviename
[1] "The Shining"

\$actors
[1] "Jack Nicholson"   "Shelley Duvall"   "Danny Lloyd"      "Scatman Crothers"
[5] "Barry Nelson"

\$reviews
1    4.5   IMDb1                     Best Horror Film I Have Ever Seen
2    4.0   IMDb2 A truly brilliant and scary film from Stanley Kubrick
3    5.0   IMDb3                 A masterpiece of psychological horror
``````

## 6.4 向list中添加movie信息

``````ext_list <- c(my_list, my_name = my_val)
``````

``````> shining_list_full <- c(shining_list,year=1980)
>
> # Have a look at shining_list_full
> str(shining_list_full)
List of 4
\$ moviename: chr "The Shining"
\$ actors   : chr [1:5] "Jack Nicholson" "Shelley Duvall" "Danny Lloyd" "Scatman Crothers" ...
\$ reviews  :'data.frame':	3 obs. of  3 variables:
..\$ scores  : num [1:3] 4.5 4 5
..\$ sources : Factor w/ 3 levels "IMDb1","IMDb2",..: 1 2 3
..\$ comments: Factor w/ 3 levels "A masterpiece of psychological horror",..: 3 2 1
\$ year     : num 1980
>
>
``````