200字范文,内容丰富有趣,生活中的好帮手!
200字范文 > Titanic(泰坦尼克号数据集)

Titanic(泰坦尼克号数据集)

时间:2020-05-20 11:59:02

相关推荐

Titanic(泰坦尼克号数据集)

原文:

Overview

The data has been split into two groups:

training set (train.csv)

test set (test.csv)

The training setshould be used to build your machine learning models. For the training set, we provide the outcome (also known as the “ground truth”) for each passenger. Your model will be based on “features” like passengers’ gender and class. You can also usefeature engineeringto create new features.

The test setshould be used to see how well your model performs on unseen data. For the test set, we do not provide the ground truth for each passenger. It is your job to predict these outcomes. For each passenger in the test set, use the model you trained to predict whether or not they survived the sinking of the Titanic.

We also includegender_submission.csv, a set of predictions that assume all and only female passengers survive, as an example of what a submission file should look like.

Data Dictionary

Variable Notes

pclass: A proxy for socio-economic status (SES)

1st = Upper

2nd = Middle

3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way...

Sibling = brother, sister, stepbrother, stepsister

Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way...

Parent = mother, father

Child = daughter, son, stepdaughter, stepson

Some children travelled only with a nanny, therefore parch=0 for them.

译:

概述

数据分为两组:

训练集(train.csv)

试验装置(test.csv)

训练集应该用来建立你的机器学习模型。对于训练集,我们为每个乘客提供结果(也称为“基本事实”)。你的模型将基于乘客的性别和等级等“特征”。也可以使用特征工程来创建新特征。

应该使用测试集来查看模型对未查看数据的执行情况。对于测试集,我们不提供每个乘客的真实情况。你的工作就是预测这些结果。对于测试集中的每个乘客,使用你训练过的模型来预测他们是否在泰坦尼克号沉没后幸存下来。

我们还包括gender_submission.csv,一组假设所有且只有女性乘客幸存的预测,作为提交文件应该是什么样子的一个例子。

数据字典:

Variable Notes:

pclass: A proxy for socio-economic status (SES)

1st = Upper

2nd = Middle

3rd = Lower

age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

sibsp: The dataset defines family relations in this way...

Sibling = brother, sister, stepbrother, stepsister

Spouse = husband, wife (mistresses and fiancés were ignored)

parch: The dataset defines family relations in this way...

Parent = mother, father

Child = daughter, son, stepdaughter, stepson

Some children travelled only with a nanny, therefore parch=0 for them.

大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“203001”获取下载链接。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。