Используя данные с сайта kaggle.com из соревнования House Prices: Advanced Regression Techniques, построить следующие модели:
Logistic Regression
?glm(Survived ~.,family=binomial(link=’logit’),data=train)
K-Means Clustering
?kmeans
Hierarchical clustering
d <- dist(mydata, method = «euclidean») # distance matrix
fit <- hclust(d, method=»ward»)
groups <- cutree(fit, k=5) # cut tree into 5 clusters
Classification Tree
library(rpart)
fit <- rpart(Kyphosis ~ Age + Number + Start, method=’class’, data=kyphosis)
Random Forest
library(randomForest)
rf <- randomForest(label ~ ., train)
predictions <- predict(rf, test)
Support Vector Machines
library(MASS)
data(cats)
model <- svm(Sex~., data = cats)
L1, L2 regularization
library(glmnet)
?glmnet
Library caret
?train
more information: http://topepo.github.io/caret/