의학 데이터는 대부분 엑셀 파일로 테이블 형태로 구성됩니다. 그러나 때에 따라서는 Row – Column으로 구성된 형식을 Key – Value 형식으로 바꿔야 할 때가 있습니다.

간단한 테이블 구조는 엑셀에서 로데이터 (raw data)를 바꿔버리는 것이 빠를 수도 있으나, 특히 key – value 페어링은 엑셀에서 바로 하기 쉽지 않습니다.

이에 대해 간단히 정리한 예제를 업로드합니다.

ipynb 파일 [링크]

유명한 dataset 인 iris 세트를 이용했습니다.
tidyr / dplyr 라이브러리를 사용합니다.
중요한 함수인 gather / group_by / spread 의 예제들입니다.
특히 gather와 spread는 반대되는 역할을 합니다.

작성일: 2021.1.11 by Junn¶

Gather function example:¶

In [20]:

library(datasets)
data(iris)
head(iris)

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

In [21]:

library(tidyr)
library(dplyr)

In [31]:

head(gather(iris,key="attr", value="measure", -Species))
# 나머지는 자연스럽게 (+) 가 붙는 식

head(gather(iris,key="attr", value="measure", Sepal.Length, Sepal.Width))
# 나머지는 자연스럽게 (-) 가 붙는 식

Species	attr	measure
setosa	Sepal.Length	5.1
setosa	Sepal.Length	4.9
setosa	Sepal.Length	4.7
setosa	Sepal.Length	4.6
setosa	Sepal.Length	5.0
setosa	Sepal.Length	5.4

Petal.Length	Petal.Width	Species	attr	measure
1.4	0.2	setosa	Sepal.Length	5.1
1.4	0.2	setosa	Sepal.Length	4.9
1.3	0.2	setosa	Sepal.Length	4.7
1.5	0.2	setosa	Sepal.Length	4.6
1.4	0.2	setosa	Sepal.Length	5.0
1.7	0.4	setosa	Sepal.Length	5.4

Group_by function example:¶

In [24]:

df.new = iris %>% gather(key="attr", value="measure", -Species)

In [25]:

head(df.new)

Species	attr	measure
setosa	Sepal.Length	5.1
setosa	Sepal.Length	4.9
setosa	Sepal.Length	4.7
setosa	Sepal.Length	4.6
setosa	Sepal.Length	5.0
setosa	Sepal.Length	5.4

In [27]:

df.new %>% summarise(
    mean_measure = mean(measure)
)

mean_measure
3.4645

In [29]:

df.new %>% group_by(Species) %>% summarise (
    mean_measure = mean(measure)
)

`summarise()` ungrouping output (override with `.groups` argument)

Species	mean_measure
setosa	2.5355
versicolor	3.5730
virginica	4.2850

In [30]:

df.new %>% group_by(Species, attr) %>% summarise (
    mean_measure = mean(measure)
)

`summarise()` regrouping output by 'Species' (override with `.groups` argument)

Species	attr	mean_measure
setosa	Petal.Length	1.462
setosa	Petal.Width	0.246
setosa	Sepal.Length	5.006
setosa	Sepal.Width	3.428
versicolor	Petal.Length	4.260
versicolor	Petal.Width	1.326
versicolor	Sepal.Length	5.936
versicolor	Sepal.Width	2.770
virginica	Petal.Length	5.552
virginica	Petal.Width	2.026
virginica	Sepal.Length	6.588
virginica	Sepal.Width	2.974

Table -> Key-value pair > Table¶

In [42]:

head(iris)
head(iris %>% mutate(idx=row_number()) %>% relocate(idx))
# relocate: 해당 컬럼을 맨 좌측으로 옮김
kv_table = iris %>% mutate(idx=row_number()) %>% relocate(idx) %>% 
            gather(key="key", value="value", -idx)

Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa

idx	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3.0	1.4	0.2	setosa
3	4.7	3.2	1.3	0.2	setosa
4	4.6	3.1	1.5	0.2	setosa
5	5.0	3.6	1.4	0.2	setosa
6	5.4	3.9	1.7	0.4	setosa

Warning message:
"attributes are not identical across measure variables;
they will be dropped"

In [43]:

head(kv_table)

idx	key	value
1	Sepal.Length	5.1
2	Sepal.Length	4.9
3	Sepal.Length	4.7
4	Sepal.Length	4.6
5	Sepal.Length	5
6	Sepal.Length	5.4

In [45]:

df = kv_table %>% spread(key="key",value="value") %>% relocate(idx,Sepal.Length,Sepal.Width)
head(df)

idx	Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3	1.4	0.2	setosa
3	4.7	3.2	1.3	0.2	setosa
4	4.6	3.1	1.5	0.2	setosa
5	5	3.6	1.4	0.2	setosa
6	5.4	3.9	1.7	0.4	setosa

그 외, separate_rows, separate, unite¶

결측치 다루기: drop_na, fill, replace_na¶

In [ ]:

작성일: 2021.1.11 by Junn¶

Gather function example:¶

Group_by function example:¶

Table -> Key-value pair > Table¶

그 외, separate_rows, separate, unite¶

결측치 다루기: drop_na, fill, replace_na¶

J Seok

[R] TimeDependentROC curve

[OST] On the Nature of Daylight (Arrival, 2016; Shutter Island, 2010)

[국제학회] 10th IAOO (구강암 학회) 참석 후기

갑상선 고주파수술(RFA) 후 추적관찰이 중요합니다.

오레오 렌즈 + x-T30

[R] Table <> Key value pair, 테이블 변환

작성일: 2021.1.11 by Junn¶

Gather function example:¶

Group_by function example:¶

Table -> Key-value pair > Table¶

그 외, separate_rows, separate, unite¶

결측치 다루기: drop_na, fill, replace_na¶

Share this:

Like this:

Related

Tags:

J Seok

Leave a Reply Cancel reply

[R] TimeDependentROC curve

[OST] On the Nature of Daylight (Arrival, 2016; Shutter Island, 2010)

[국제학회] 10th IAOO (구강암 학회) 참석 후기

갑상선 고주파수술(RFA) 후 추적관찰이 중요합니다.

오레오 렌즈 + x-T30