In [1]:
import pandas as pd
titanic_df = pd.read_csv('./titanic_train.csv')
titanic_df
Out[1]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
In [3]:
print('단일 칼럼 데이터 추출: \n', titanic_df['Pclass'].head(3)) #칼럼명 문자
print('\n 여러 칼럼의 데이터 추출: \n', titanic_df[ ['Survived', 'Pclass'] ].head(3)) #칼럼명의 리스트 객체
print('[]안에 숫자 index는 keyerror 오류 발생: \n', titanic_df[0])
단일 칼럼 데이터 추출:
0 3
1 1
2 3
Name: Pclass, dtype: int64
여러 칼럼의 데이터 추출:
Survived Pclass
0 0 3
1 1 1
2 1 3
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\envs\test\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3802 try:
-> 3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
~\anaconda3\envs\test\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
~\anaconda3\envs\test\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_32016\1187408095.py in <module>
1 print('단일 칼럼 데이터 추출: \n', titanic_df['Pclass'].head(3))
2 print('\n 여러 칼럼의 데이터 추출: \n', titanic_df[ ['Survived', 'Pclass'] ].head(3))
----> 3 print('[]안에 숫자 index는 keyerror 오류 발생: \n', titanic_df[0])
~\anaconda3\envs\test\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
3803 if self.columns.nlevels > 1:
3804 return self._getitem_multilevel(key)
-> 3805 indexer = self.columns.get_loc(key)
3806 if is_integer(indexer):
3807 indexer = [indexer]
~\anaconda3\envs\test\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3803 return self._engine.get_loc(casted_key)
3804 except KeyError as err:
-> 3805 raise KeyError(key) from err
3806 except TypeError:
3807 # If we have a listlike key, _check_indexing_error will raise
KeyError: 0
In [4]:
#인덱스 형태로 변환 가능한 표현식
titanic_df[0:2]
Out[4]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
In [5]:
#불린 인덱싱 기능
titanic_df[ titanic_df['Pclass'] == 3].head(3)
Out[5]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.250 | NaN | S |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.925 | NaN | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.050 | NaN | S |
iloc[ ] 연산자 >> 위치 기반 인덱싱¶
- 정숫값 또는 정수형의 슬라이싱, fancy list값을 입력해야 함
- 열 위치에 -1 입력해서 가장 마지막 열 데이터를 가져오는데 자주 사용
In [6]:
data = { 'Name': ['Chulmin', 'Eunkyung', 'JInwoong', 'Soobeom'],
'Year': [2011, 2016, 2015, 2015],
'Gender': ['Male', 'Female', 'Male', 'Male']}
data_df = pd.DataFrame(data, index=['one', 'two', 'three', 'four'])
data_df
Out[6]:
Name | Year | Gender | |
---|---|---|---|
one | Chulmin | 2011 | Male |
two | Eunkyung | 2016 | Female |
three | JInwoong | 2015 | Male |
four | Soobeom | 2015 | Male |
In [8]:
data_df.iloc[0, 0]
#iloc에는 칼럼명 입력하면 안됌
Out[8]:
'Chulmin'
In [10]:
data_df.iloc[ 0:2, [0,1]]
Out[10]:
Name | Year | |
---|---|---|
one | Chulmin | 2011 |
two | Eunkyung | 2016 |
In [11]:
data_df.iloc[ 1:3, [0,1]]
Out[11]:
Name | Year | |
---|---|---|
two | Eunkyung | 2016 |
three | JInwoong | 2015 |
In [12]:
print('\n 맨 마지막 칼럼 데이터[:, -1]\n', data_df.iloc[:, -1])
print('\n 맨 마지막 칼럼을 제외한 모든 데이터[:, :-1]\n', data_df.iloc[:, :-1])
맨 마지막 칼럼 데이터[:, -1]
one Male
two Female
three Male
four Male
Name: Gender, dtype: object
맨 마지막 칼럼을 제외한 모든 데이터[:, :-1]
Name Year
one Chulmin 2011
two Eunkyung 2016
three JInwoong 2015
four Soobeom 2015
loc[ ] 연산자 >> 명칭 기반 인덱싱¶
- loc[인덱스값, 칼럼명] 으로 추출 >> 인덱스값을 dataframe의 행 위치를 나타내는 고유한 '명칭'으로 생각
- loc에 슬라이싱 쓰면 '종료값-1'이 아니라 종료값까지 포함하는 것을 의미, 명칭은 숫자형이 아닐 수 있기 때문에 -1을 할 수 없음
In [13]:
data_df.loc['one', 'Name']
Out[13]:
'Chulmin'
In [14]:
print('위치 기반 iloc slicing \n', data_df.iloc[0:1, 0], '\n')
print('명칭 기반 loc slicing \n', data_df.loc['one':'two', 'Name'])
위치 기반 iloc slicing
one Chulmin
Name: Name, dtype: object
명칭 기반 loc slicing
one Chulmin
two Eunkyung
Name: Name, dtype: object
In [15]:
data_df.loc['one':'three', ['Name', 'Gender']]
Out[15]:
Name | Gender | |
---|---|---|
one | Chulmin | Male |
two | Eunkyung | Female |
three | JInwoong | Male |
In [23]:
data_df.loc[data_df['Year'] >= 2015]
Out[23]:
Name | Year | Gender | |
---|---|---|---|
two | Eunkyung | 2016 | Female |
three | JInwoong | 2015 | Male |
four | Soobeom | 2015 | Male |
In [24]:
data_df.loc['three', 'Gender']
Out[24]:
'Male'
In [26]:
titanic_boolean = titanic_df[titanic_df['Age'] > 60 ]
print(type(titanic_boolean))
titanic_boolean
<class 'pandas.core.frame.DataFrame'>
Out[26]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
33 | 34 | 0 | 2 | Wheadon, Mr. Edward H | male | 66.0 | 0 | 0 | C.A. 24579 | 10.5000 | NaN | S |
54 | 55 | 0 | 1 | Ostby, Mr. Engelhart Cornelius | male | 65.0 | 0 | 1 | 113509 | 61.9792 | B30 | C |
96 | 97 | 0 | 1 | Goldschmidt, Mr. George B | male | 71.0 | 0 | 0 | PC 17754 | 34.6542 | A5 | C |
116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 | 0 | 0 | 370369 | 7.7500 | NaN | Q |
170 | 171 | 0 | 1 | Van der hoef, Mr. Wyckoff | male | 61.0 | 0 | 0 | 111240 | 33.5000 | B19 | S |
252 | 253 | 0 | 1 | Stead, Mr. William Thomas | male | 62.0 | 0 | 0 | 113514 | 26.5500 | C87 | S |
275 | 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 | 1 | 0 | 13502 | 77.9583 | D7 | S |
280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | NaN | Q |
326 | 327 | 0 | 3 | Nysveen, Mr. Johan Hansen | male | 61.0 | 0 | 0 | 345364 | 6.2375 | NaN | S |
438 | 439 | 0 | 1 | Fortune, Mr. Mark | male | 64.0 | 1 | 4 | 19950 | 263.0000 | C23 C25 C27 | S |
456 | 457 | 0 | 1 | Millet, Mr. Francis Davis | male | 65.0 | 0 | 0 | 13509 | 26.5500 | E38 | S |
483 | 484 | 1 | 3 | Turkula, Mrs. (Hedwig) | female | 63.0 | 0 | 0 | 4134 | 9.5875 | NaN | S |
493 | 494 | 0 | 1 | Artagaveytia, Mr. Ramon | male | 71.0 | 0 | 0 | PC 17609 | 49.5042 | NaN | C |
545 | 546 | 0 | 1 | Nicholson, Mr. Arthur Ernest | male | 64.0 | 0 | 0 | 693 | 26.0000 | NaN | S |
555 | 556 | 0 | 1 | Wright, Mr. George | male | 62.0 | 0 | 0 | 113807 | 26.5500 | NaN | S |
570 | 571 | 1 | 2 | Harris, Mr. George | male | 62.0 | 0 | 0 | S.W./PP 752 | 10.5000 | NaN | S |
625 | 626 | 0 | 1 | Sutton, Mr. Frederick | male | 61.0 | 0 | 0 | 36963 | 32.3208 | D50 | S |
630 | 631 | 1 | 1 | Barkworth, Mr. Algernon Henry Wilson | male | 80.0 | 0 | 0 | 27042 | 30.0000 | A23 | S |
672 | 673 | 0 | 2 | Mitchell, Mr. Henry Michael | male | 70.0 | 0 | 0 | C.A. 24580 | 10.5000 | NaN | S |
745 | 746 | 0 | 1 | Crosby, Capt. Edward Gifford | male | 70.0 | 1 | 1 | WE/P 5735 | 71.0000 | B22 | S |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0000 | B28 | NaN |
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 | 0 | 0 | 347060 | 7.7750 | NaN | S |
In [28]:
titanic_df[titanic_df['Age'] > 60][['Name', 'Age']]
Out[28]:
Name | Age | |
---|---|---|
33 | Wheadon, Mr. Edward H | 66.0 |
54 | Ostby, Mr. Engelhart Cornelius | 65.0 |
96 | Goldschmidt, Mr. George B | 71.0 |
116 | Connors, Mr. Patrick | 70.5 |
170 | Van der hoef, Mr. Wyckoff | 61.0 |
252 | Stead, Mr. William Thomas | 62.0 |
275 | Andrews, Miss. Kornelia Theodosia | 63.0 |
280 | Duane, Mr. Frank | 65.0 |
326 | Nysveen, Mr. Johan Hansen | 61.0 |
438 | Fortune, Mr. Mark | 64.0 |
456 | Millet, Mr. Francis Davis | 65.0 |
483 | Turkula, Mrs. (Hedwig) | 63.0 |
493 | Artagaveytia, Mr. Ramon | 71.0 |
545 | Nicholson, Mr. Arthur Ernest | 64.0 |
555 | Wright, Mr. George | 62.0 |
570 | Harris, Mr. George | 62.0 |
625 | Sutton, Mr. Frederick | 61.0 |
630 | Barkworth, Mr. Algernon Henry Wilson | 80.0 |
672 | Mitchell, Mr. Henry Michael | 70.0 |
745 | Crosby, Capt. Edward Gifford | 70.0 |
829 | Stone, Mrs. George Nelson (Martha Evelyn) | 62.0 |
851 | Svensson, Mr. Johan | 74.0 |
In [30]:
titanic_df[titanic_df['Pclass'] == 2][['Name', 'Pclass']]
Out[30]:
Name | Pclass | |
---|---|---|
9 | Nasser, Mrs. Nicholas (Adele Achem) | 2 |
15 | Hewlett, Mrs. (Mary D Kingcome) | 2 |
17 | Williams, Mr. Charles Eugene | 2 |
20 | Fynney, Mr. Joseph J | 2 |
21 | Beesley, Mr. Lawrence | 2 |
... | ... | ... |
866 | Duran y More, Miss. Asuncion | 2 |
874 | Abelson, Mrs. Samuel (Hannah Wizosky) | 2 |
880 | Shelley, Mrs. William (Imanita Parrish Hall) | 2 |
883 | Banfield, Mr. Frederick James | 2 |
886 | Montvila, Rev. Juozas | 2 |
184 rows × 2 columns
In [31]:
titanic_df.loc[titanic_df['Age'] > 60][['Name', 'Age']].head(3)
Out[31]:
Name | Age | |
---|---|---|
33 | Wheadon, Mr. Edward H | 66.0 |
54 | Ostby, Mr. Engelhart Cornelius | 65.0 |
96 | Goldschmidt, Mr. George B | 71.0 |
In [33]:
titanic_df[ (titanic_df['Age'] > 60) & (titanic_df['Pclass'] > 2) ]
Out[33]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
116 | 117 | 0 | 3 | Connors, Mr. Patrick | male | 70.5 | 0 | 0 | 370369 | 7.7500 | NaN | Q |
280 | 281 | 0 | 3 | Duane, Mr. Frank | male | 65.0 | 0 | 0 | 336439 | 7.7500 | NaN | Q |
326 | 327 | 0 | 3 | Nysveen, Mr. Johan Hansen | male | 61.0 | 0 | 0 | 345364 | 6.2375 | NaN | S |
483 | 484 | 1 | 3 | Turkula, Mrs. (Hedwig) | female | 63.0 | 0 | 0 | 4134 | 9.5875 | NaN | S |
851 | 852 | 0 | 3 | Svensson, Mr. Johan | male | 74.0 | 0 | 0 | 347060 | 7.7750 | NaN | S |
In [39]:
titanic_df[ (titanic_df['Age'] > 60) & (titanic_df['Pclass'] == 1) & (titanic_df['Sex'] == 'female') ]
Out[39]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
275 | 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 | 1 | 0 | 13502 | 77.9583 | D7 | S |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0000 | B28 | NaN |
In [40]:
# 개별 조건을 변수에 할당하고 다시 결합하는 것도 가능
cond1 = titanic_df['Age'] > 60
cond2 = titanic_df['Pclass'] == 1
cond3 = titanic_df['Sex'] == 'female'
titanic_df[cond1 & cond2 & cond3]
Out[40]:
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
275 | 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 | 1 | 0 | 13502 | 77.9583 | D7 | S |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0000 | B28 | NaN |
'Python > Pandas' 카테고리의 다른 글
판다스 pandas (6) - isna( ), fillna( ) (0) | 2023.04.30 |
---|---|
판다스 pandas (5) - aggregation, groupby (0) | 2023.04.30 |
판다스 pandas (3) - Index 객체, reset_index (0) | 2023.04.30 |
판다스 pandas (2) - ndarray, list, dict <-> df 변환 (0) | 2023.04.25 |
판다스 pandas (1) - 개요, DataFrame (0) | 2023.04.25 |