반응형
In [ ]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [ ]:
%cd /content/drive/MyDrive/Study
/content/drive/MyDrive/Study
Bring Data¶
In [ ]:
import pandas as pd
import os
In [ ]:
dataset = pd.read_excel("Data.xlsx", sheet_name="Edited_Data")
dataset.head()
Out[ ]:
Respondent ID | Start Date | End Date | Email Address | First Name | Last Name | Custom Data 1 | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | ... | Question 29-Response 8 | Question 29-Response 9 | Question 29-Response 10 | Question 29-Response 11 | Question 29-Response 12 | Question 29-Response 13 | Question 29-Response 14 | Question 30-Response 1 | Question 30-Response 2 | Question 30-Response 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | 2021-01-22 12:01:17 | 2021-01-22 12:40:34 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Staff | ... | NaN | Answer 8 | Answer 8 | Answer 4 | NaN | NaN | Answer 5 | NaN | NaN | NaN |
1 | 2658722536 | 2021-01-22 06:56:37 | 2021-01-22 07:34:10 | NaN | NaN | NaN | NaN | Finance | NaN | Staff | ... | NaN | Answer 5 | NaN | NaN | Answer 2 | NaN | Answer 5 | NaN | NaN | Answer 1 |
2 | 4044163394 | 2021-01-22 06:35:18 | 2021-01-22 06:47:32 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Department Lead | ... | NaN | NaN | Answer 4 | Answer 4 | Answer 6 | NaN | Answer 6 | NaN | Answer 1 | NaN |
3 | 5535865599 | 2021-01-21 21:29:32 | 2021-01-21 21:40:24 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Manager | ... | Answer 2 | Answer 5 | Answer 7 | NaN | Answer 6 | NaN | Answer 7 | Answer 7 | Answer 1 | Answer 6 |
4 | 3356802928 | 2021-01-21 17:26:39 | 2021-01-21 17:44:40 | NaN | NaN | NaN | NaN | Port Operations | NaN | Manager | ... | NaN | Answer 5 | Answer 4 | Answer 4 | Answer 7 | Answer 7 | NaN | Answer 7 | NaN | Answer 8 |
5 rows × 100 columns
Data copy¶
In [ ]:
from google.colab.output import eval_js
eval_js('google.colab.output.setIframeHeight("500")')
In [ ]:
# colab에서 표줄이는데 사용하는 코드?
from IPython.display import Javascript
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 500})'''))
In [ ]:
from IPython.display import Javascript
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 500})'''))
# 사본으로 작업
dataset_modified=dataset.copy()
dataset_modified.head()
Out[ ]:
Respondent ID | Start Date | End Date | Email Address | First Name | Last Name | Custom Data 1 | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | ... | Question 29-Response 8 | Question 29-Response 9 | Question 29-Response 10 | Question 29-Response 11 | Question 29-Response 12 | Question 29-Response 13 | Question 29-Response 14 | Question 30-Response 1 | Question 30-Response 2 | Question 30-Response 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | 2021-01-22 12:01:17 | 2021-01-22 12:40:34 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Staff | ... | NaN | Answer 8 | Answer 8 | Answer 4 | NaN | NaN | Answer 5 | NaN | NaN | NaN |
1 | 2658722536 | 2021-01-22 06:56:37 | 2021-01-22 07:34:10 | NaN | NaN | NaN | NaN | Finance | NaN | Staff | ... | NaN | Answer 5 | NaN | NaN | Answer 2 | NaN | Answer 5 | NaN | NaN | Answer 1 |
2 | 4044163394 | 2021-01-22 06:35:18 | 2021-01-22 06:47:32 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Department Lead | ... | NaN | NaN | Answer 4 | Answer 4 | Answer 6 | NaN | Answer 6 | NaN | Answer 1 | NaN |
3 | 5535865599 | 2021-01-21 21:29:32 | 2021-01-21 21:40:24 | NaN | NaN | NaN | NaN | Infrastructure | NaN | Manager | ... | Answer 2 | Answer 5 | Answer 7 | NaN | Answer 6 | NaN | Answer 7 | Answer 7 | Answer 1 | Answer 6 |
4 | 3356802928 | 2021-01-21 17:26:39 | 2021-01-21 17:44:40 | NaN | NaN | NaN | NaN | Port Operations | NaN | Manager | ... | NaN | Answer 5 | Answer 4 | Answer 4 | Answer 7 | Answer 7 | NaN | Answer 7 | NaN | Answer 8 |
5 rows × 100 columns
In [ ]:
dataset.columns
columns_to_drop = ['Start Date', 'End Date', 'Email Address', 'First Name', 'Last Name', 'Custom Data 1']
In [ ]:
dataset_modified = dataset_modified.drop(columns=columns_to_drop)
In [ ]:
from IPython.display import Javascript
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 500})'''))
dataset_modified.head()
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question 1-Response | Question 2-Response | ... | Question 29-Response 8 | Question 29-Response 9 | Question 29-Response 10 | Question 29-Response 11 | Question 29-Response 12 | Question 29-Response 13 | Question 29-Response 14 | Question 30-Response 1 | Question 30-Response 2 | Question 30-Response 3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | NaN | Answer 6 | ... | NaN | Answer 8 | Answer 8 | Answer 4 | NaN | NaN | Answer 5 | NaN | NaN | NaN |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Answer 4 | Answer 2 | ... | NaN | Answer 5 | NaN | NaN | Answer 2 | NaN | Answer 5 | NaN | NaN | Answer 1 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Answer 5 | Answer 7 | ... | NaN | NaN | Answer 4 | Answer 4 | Answer 6 | NaN | Answer 6 | NaN | Answer 1 | NaN |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Answer 1 | Answer 1 | ... | Answer 2 | Answer 5 | Answer 7 | NaN | Answer 6 | NaN | Answer 7 | Answer 7 | Answer 1 | Answer 6 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | NaN | Answer 3 | ... | NaN | Answer 5 | Answer 4 | Answer 4 | Answer 7 | Answer 7 | NaN | Answer 7 | NaN | Answer 8 |
5 rows × 94 columns
In [ ]:
id_vars = list(dataset_modified.columns)[:8]
value_vars = list(dataset_modified.columns) [8:]
# value_vars
In [ ]:
from IPython.display import Javascript
display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 500})'''))
dataset_melted = dataset_modified.melt(id_vars=id_vars, value_vars=value_vars, var_name="Question + Subquestion", value_name="Answer")
dataset_melted
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question + Subquestion | Answer | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | Question 1-Response | NaN |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | Question 1-Response | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17023 | 7940065082 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 30-Response 3 | Answer 8 |
17024 | 5157705612 | Finance | NaN | Staff | Millennial (born between 1981-2000) | Female | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 6 |
17025 | 9920755555 | Port Operations | NaN | Staff | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN |
17026 | 6638341389 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN |
17027 | 8114622230 | Information Technology | NaN | Staff | Prefer not to answer | Male | 5-10 years | Full time Employee | Question 30-Response 3 | NaN |
17028 rows × 10 columns
In [ ]:
questions_import = pd.read_excel("/content/drive/MyDrive/Study/Data.xlsx", sheet_name="Question")
questions_import
Out[ ]:
Raw Question | Raw Subquestion | Question | Subquestion | Question + Subquestion | |
---|---|---|---|---|---|
0 | Respondent ID | NaN | Respondent ID | NaN | Respondent ID |
1 | Start Date | NaN | Start Date | NaN | Start Date |
2 | End Date | NaN | End Date | NaN | End Date |
3 | Email Address | NaN | Email Address | NaN | Email Address |
4 | First Name | NaN | First Name | NaN | First Name |
... | ... | ... | ... | ... | ... |
95 | NaN | Response 13 | Question 29 | Response 13 | Question 29-Response 13 |
96 | NaN | Response 14 | Question 29 | Response 14 | Question 29-Response 14 |
97 | Question 30 | Response 1 | Question 30 | Response 1 | Question 30-Response 1 |
98 | NaN | Response 2 | Question 30 | Response 2 | Question 30-Response 2 |
99 | NaN | Response 3 | Question 30 | Response 3 | Question 30-Response 3 |
100 rows × 5 columns
In [ ]:
questions = questions_import.copy()
questions.columns
Out[ ]:
Index(['Raw Question', 'Raw Subquestion', 'Question', 'Subquestion',
'Question + Subquestion'],
dtype='object')
In [ ]:
questions.drop(columns=['Raw Question', 'Raw Subquestion', 'Subquestion'], inplace=True)
In [ ]:
questions
Out[ ]:
Question | Question + Subquestion | |
---|---|---|
0 | Respondent ID | Respondent ID |
1 | Start Date | Start Date |
2 | End Date | End Date |
3 | Email Address | Email Address |
4 | First Name | First Name |
... | ... | ... |
95 | Question 29 | Question 29-Response 13 |
96 | Question 29 | Question 29-Response 14 |
97 | Question 30 | Question 30-Response 1 |
98 | Question 30 | Question 30-Response 2 |
99 | Question 30 | Question 30-Response 3 |
100 rows × 2 columns
In [ ]:
#questions.dropna(inplace=True)
questions
Out[ ]:
Question | Question + Subquestion | |
---|---|---|
0 | Respondent ID | Respondent ID |
1 | Start Date | Start Date |
2 | End Date | End Date |
3 | Email Address | Email Address |
4 | First Name | First Name |
... | ... | ... |
95 | Question 29 | Question 29-Response 13 |
96 | Question 29 | Question 29-Response 14 |
97 | Question 30 | Question 30-Response 1 |
98 | Question 30 | Question 30-Response 2 |
99 | Question 30 | Question 30-Response 3 |
100 rows × 2 columns
In [ ]:
dataset_merged = pd.merge(left=dataset_melted, right=questions, how="left", left_on="Question + Subquestion", right_on="Question + Subquestion")
print("Original Data", len(dataset_melted))
print("Merged Data", len(dataset_merged))
Original Data 17028
Merged Data 17028
In [ ]:
dataset_merged.head()
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question + Subquestion | Answer | Question | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | Question 1-Response | NaN | Question 1 |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 | Question 1 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 | Question 1 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 | Question 1 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | Question 1-Response | NaN | Question 1 |
In [ ]:
dataset_merged.groupby("Question")["Answer"].nunique()
Out[ ]:
Question
Question 1 8
Question 10 8
Question 11 8
Question 12 8
Question 13 8
Question 14 8
Question 15 8
Question 16 8
Question 17 8
Question 18 8
Question 19 8
Question 2 8
Question 20 8
Question 21 8
Question 22 8
Question 23 8
Question 24 8
Question 25 8
Question 26 8
Question 27 8
Question 28 8
Question 29 8
Question 3 8
Question 30 8
Question 4 8
Question 5 8
Question 6 8
Question 7 8
Question 8 8
Question 9 8
Name: Answer, dtype: int64
In [ ]:
dataset_merged[dataset_merged["Answer"].notna()]
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question + Subquestion | Answer | Question | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 | Question 1 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 | Question 1 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 | Question 1 |
5 | 3399511781 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 8 | Question 1 |
6 | 9860597462 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 1-Response | Answer 8 | Question 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17017 | 9735550076 | People | NaN | Staff | Generation X (born between 1965-1980) | Male | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 5 | Question 30 |
17020 | 7325851635 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Prefer not to answer | 3-5 years | Full time Employee | Question 30-Response 3 | Answer 5 | Question 30 |
17021 | 3370365802 | Port Security & Emergency Operations | NaN | Staff | Generation X (born between 1965-1980) | Female | 0-2 years | Full time Employee | Question 30-Response 3 | Answer 6 | Question 30 |
17023 | 7940065082 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 30-Response 3 | Answer 8 | Question 30 |
17024 | 5157705612 | Finance | NaN | Staff | Millennial (born between 1981-2000) | Female | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 6 | Question 30 |
9664 rows × 11 columns
In [ ]:
respondents = dataset_merged[dataset_merged["Answer"].notna()]
respondents = respondents.groupby("Question")["Respondent ID"].nunique().reset_index()
respondents.rename(columns={"Respondent ID":"Respondents"}, inplace=True)
respondents
Out[ ]:
Question | Respondents | |
---|---|---|
0 | Question 1 | 119 |
1 | Question 10 | 198 |
2 | Question 11 | 164 |
3 | Question 12 | 114 |
4 | Question 13 | 108 |
5 | Question 14 | 105 |
6 | Question 15 | 114 |
7 | Question 16 | 117 |
8 | Question 17 | 135 |
9 | Question 18 | 109 |
10 | Question 19 | 157 |
11 | Question 2 | 118 |
12 | Question 20 | 105 |
13 | Question 21 | 127 |
14 | Question 22 | 160 |
15 | Question 23 | 120 |
16 | Question 24 | 195 |
17 | Question 25 | 198 |
18 | Question 26 | 193 |
19 | Question 27 | 153 |
20 | Question 28 | 119 |
21 | Question 29 | 198 |
22 | Question 3 | 117 |
23 | Question 30 | 182 |
24 | Question 4 | 161 |
25 | Question 5 | 194 |
26 | Question 6 | 196 |
27 | Question 7 | 162 |
28 | Question 8 | 190 |
29 | Question 9 | 188 |
In [ ]:
dataset_merged_two = pd.merge(left=dataset_merged, right=respondents, how="left", left_on="Question", right_on="Question")
print("Original Data", len(dataset_merged))
print("Merged Data", len(dataset_merged_two))
dataset_merged_two
Original Data 17028
Merged Data 17028
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question + Subquestion | Answer | Question | Respondents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 | Question 1 | 119 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 | Question 1 | 119 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 | Question 1 | 119 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17023 | 7940065082 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 30-Response 3 | Answer 8 | Question 30 | 182 |
17024 | 5157705612 | Finance | NaN | Staff | Millennial (born between 1981-2000) | Female | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 6 | Question 30 | 182 |
17025 | 9920755555 | Port Operations | NaN | Staff | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 |
17026 | 6638341389 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 |
17027 | 8114622230 | Information Technology | NaN | Staff | Prefer not to answer | Male | 5-10 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 |
17028 rows × 12 columns
In [ ]:
same_answer = dataset_merged #dataset_merged[dataset_merged["Answer"].notna()]
same_answer = same_answer.groupby(["Question + Subquestion", "Answer"])["Respondent ID"].nunique().reset_index()
same_answer.rename(columns={"Respondent ID":"Same Answer"}, inplace=True)
same_answer
Out[ ]:
Question + Subquestion | Answer | Same Answer | |
---|---|---|---|
0 | Question 1-Response | Answer 1 | 14 |
1 | Question 1-Response | Answer 2 | 10 |
2 | Question 1-Response | Answer 3 | 13 |
3 | Question 1-Response | Answer 4 | 17 |
4 | Question 1-Response | Answer 5 | 22 |
... | ... | ... | ... |
683 | Question 9-Response 4 | Answer 4 | 16 |
684 | Question 9-Response 4 | Answer 5 | 13 |
685 | Question 9-Response 4 | Answer 6 | 14 |
686 | Question 9-Response 4 | Answer 7 | 12 |
687 | Question 9-Response 4 | Answer 8 | 19 |
688 rows × 3 columns
In [ ]:
dataset_merged_three = pd.merge(left=dataset_merged_two, right=same_answer, how="left", left_on=["Question + Subquestion", "Answer"], right_on=["Question + Subquestion", "Answer"])
dataset_merged_three["Same Answer"].fillna(0, inplace=True)
print("Original Data", len(dataset_merged_two))
print("Merged Data", len(dataset_merged_three))
dataset_merged_three
Original Data 17028
Merged Data 17028
Out[ ]:
Respondent ID | Identify which division you work in.-Response | Identify which division you work in.-Other (please specify) | Which of the following best describes your position level?-Response | Which generation are you apart of?-Response | Please select the gender in which you identify.-Response | Which duration range best aligns with your tenure at your company?-Response | Which of the following best describes your employment type?-Response | Question + Subquestion | Answer | Question | Respondents | Same Answer | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 | 0.0 |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 | Question 1 | 119 | 17.0 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 | Question 1 | 119 | 22.0 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 | Question 1 | 119 | 14.0 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17023 | 7940065082 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 30-Response 3 | Answer 8 | Question 30 | 182 | 14.0 |
17024 | 5157705612 | Finance | NaN | Staff | Millennial (born between 1981-2000) | Female | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 6 | Question 30 | 182 | 20.0 |
17025 | 9920755555 | Port Operations | NaN | Staff | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17026 | 6638341389 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17027 | 8114622230 | Information Technology | NaN | Staff | Prefer not to answer | Male | 5-10 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17028 rows × 13 columns
In [ ]:
output = dataset_merged_three.copy()
output.rename(columns={"Identify which division you work in.-Response":"Division Primary", "Identify which division you work in.-Other (please specify)":"Division Secondary",
"Which of the following best describes your position level?-Response":"Position", "Which generation are you apart of?-Response":"Generation",
"Please select the gender in which you identify.-Response":"Gender", "Which duration range best aligns with your tenure at your company?-Response":"Tenure",
"Which of the following best describes your employment type?-Response":"Employment Type"}, inplace=True)
output
Out[ ]:
Respondent ID | Division Primary | Division Secondary | Position | Generation | Gender | Tenure | Employment Type | Question + Subquestion | Answer | Question | Respondents | Same Answer | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5379192392 | Infrastructure | NaN | Staff | Generation X (born between 1965-1980) | Male | 0-2 years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 | 0.0 |
1 | 2658722536 | Finance | NaN | Staff | NaN | NaN | 10+ years | Full time Employee | Question 1-Response | Answer 4 | Question 1 | 119 | 17.0 |
2 | 4044163394 | Infrastructure | NaN | Department Lead | Generation X (born between 1965-1980) | Male | 3-5 years | Full time Employee | Question 1-Response | Answer 5 | Question 1 | 119 | 22.0 |
3 | 5535865599 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Non-Binary | 5-10 years | Full time Employee | Question 1-Response | Answer 1 | Question 1 | 119 | 14.0 |
4 | 3356802928 | Port Operations | NaN | Manager | Generation X (born between 1965-1980) | Female | 10+ years | Full time Employee | Question 1-Response | NaN | Question 1 | 119 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
17023 | 7940065082 | Infrastructure | NaN | Department Lead | Baby Boomer (born between 1946-1964) | Male | 10+ years | Full time Employee | Question 30-Response 3 | Answer 8 | Question 30 | 182 | 14.0 |
17024 | 5157705612 | Finance | NaN | Staff | Millennial (born between 1981-2000) | Female | 5-10 years | Full time Employee | Question 30-Response 3 | Answer 6 | Question 30 | 182 | 20.0 |
17025 | 9920755555 | Port Operations | NaN | Staff | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17026 | 6638341389 | Infrastructure | NaN | Manager | Millennial (born between 1981-2000) | Female | 3-5 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17027 | 8114622230 | Information Technology | NaN | Staff | Prefer not to answer | Male | 5-10 years | Full time Employee | Question 30-Response 3 | NaN | Question 30 | 182 | 0.0 |
17028 rows × 13 columns
In [ ]:
output.columns
Out[ ]:
Index(['Respondent ID', 'Identify which division you work in.-Response',
'Identify which division you work in.-Other (please specify)',
'Which of the following best describes your position level?-Response',
'Which generation are you apart of?-Response',
'Please select the gender in which you identify.-Response',
'Which duration range best aligns with your tenure at your company?-Response',
'Which of the following best describes your employment type?-Response',
'Question + Subquestion', 'Answer', 'Question', 'Respondents',
'Same Answer'],
dtype='object')
In [ ]:
output.to_excel("Final_Output.xlsx", index=False)
In [ ]:
반응형
'Python' 카테고리의 다른 글
python) seaborn(sns) scatterplot 그래프 색상 지정하기 (0) | 2022.06.27 |
---|---|
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 17: invalid start byte (0) | 2022.06.27 |
Resample, 시간대, 월, 년, 일별 데이터 값 계산 (0) | 2022.04.28 |
Colab) Wordcloud 한글이 보이지 않을 때, 워드클라우드 만들기, 주피터 노트북, R프로그램 KoNLP 설치 실패. 워드 클라우드 저장하는 코드 (0) | 2022.04.25 |
[colab] mp3 두 개 합치기/mov -> mp4 -> wav /audiodisplay (0) | 2022.04.23 |