Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

source

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

manysource 2023. 6. 29. 20:13

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

데이터 프레임이 있습니다.

import pandas as pd
import numpy as np

df = pd.DataFrame({'foo.aa': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
                   'foo.fighters': [0, 1, np.nan, 0, 0, 0],
                   'foo.bars': [0, 0, 0, 0, 0, 1],
                   'bar.baz': [5, 5, 6, 5, 5.6, 6.8],
                   'foo.fox': [2, 4, 1, 0, 0, 5],
                   'nas.foo': ['NA', 0, 1, 0, 0, 0],
                   'foo.manchu': ['NA', 0, 0, 0, 0, 0],})

다음으로 시작하는 열에서 1의 값을 선택합니다.foo.다음 외에 더 나은 방법이 있습니까?

df2 = df[(df['foo.aa'] == 1)|
(df['foo.fighters'] == 1)|
(df['foo.bars'] == 1)|
(df['foo.fox'] == 1)|
(df['foo.manchu'] == 1)
]

다음과 같은 것을 쓰는 것과 유사한 것:

df2= df[df.STARTS_WITH_FOO == 1]

답은 다음과 같이 데이터 프레임을 출력해야 합니다.

   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      5.0     1.0         0             0        2         NA      NA
1      5.0     2.1         0             1        4          0       0
2      6.0     NaN         0           NaN        1          0       1
5      6.8     6.8         1             0        5          0       0

[4 rows x 7 columns]

목록 이해를 수행하여 열을 만듭니다.

In [28]:

filter_col = [col for col in df if col.startswith('foo')]
filter_col
Out[28]:
['foo.aa', 'foo.bars', 'foo.fighters', 'foo.fox', 'foo.manchu']
In [29]:

df[filter_col]
Out[29]:
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

또 다른 방법은 열에서 열을 만들고 벡터화된 str 방법을 사용하는 것입니다.

In [33]:

df[df.columns[pd.Series(df.columns).str.startswith('foo')]]
Out[33]:
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

원하는 것을 달성하기 위해서는 다음을 추가하여 다음과 같은 값을 충족하지 못하는 값을 필터링해야 합니다.==1기준:

In [36]:

df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]]==1]
Out[36]:
   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      NaN       1       NaN           NaN      NaN        NaN     NaN
1      NaN     NaN       NaN             1      NaN        NaN     NaN
2      NaN     NaN       NaN           NaN        1        NaN     NaN
3      NaN     NaN       NaN           NaN      NaN        NaN     NaN
4      NaN     NaN       NaN           NaN      NaN        NaN     NaN
5      NaN     NaN         1           NaN      NaN        NaN     NaN

편집

좋아요, 당신이 원하는 것을 보고 난 후에 다음과 같은 난해한 답이 나옵니다.

In [72]:

df.loc[df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]] == 1].dropna(how='all', axis=0).index]
Out[72]:
   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      5.0     1.0         0             0        2         NA      NA
1      5.0     2.1         0             1        4          0       0
2      6.0     NaN         0           NaN        1          0       1
5      6.8     6.8         1             0        5          0       0

이제 판다의 인덱스는 문자열 연산을 지원하므로 'foo'로 시작하는 열을 선택하는 가장 간단하고 최선의 방법은 다음과 같습니다.

df.loc[:, df.columns.str.startswith('foo')]

또는 로 열(또는 행) 레이블을 필터링할 수 있습니다.다음으로 시작하는 이름과 일치하는 정규식을 지정하려면foo.:

>>> df.filter(regex=r'^foo\.', axis=1)
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

필요한 행만 선택하는 방법(포함)1) 및 열을 사용할 수 있습니다.loc열 선택하기filter(또는 다른 방법) 및 다음을 사용하는 행any:

>>> df.loc[(df == 1).any(axis=1), df.filter(regex=r'^foo\.', axis=1).columns]
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
5     6.8         1             0        5          0

가장 간단한 방법은 열 이름에 str을 직접 사용하는 것입니다.pd.Series

df.loc[:,df.columns.str.startswith("foo")]

제 경우에는 접두사 목록이 필요했습니다.

colsToScale=["production", "test", "development"]
dc[dc.columns[dc.columns.str.startswith(tuple(colsToScale))]]

방법을 사용할 수 있습니다.filter매개 변수를 사용하여like:

df.filter(like='foo')

여기서 정규식을 사용하여 "foo"로 시작하는 열을 필터링할 수 있습니다.

df.filter(regex='^foo*')

만약 당신의 칼럼에 foo라는 문자열이 필요하다면,

df.filter(regex='foo*')

적절할 것입니다.

다음 단계에서는 다음을 사용할 수 있습니다.

df[df.filter(regex='^foo*').values==1]

'foo*' 열의 값 중 하나가 1인 행을 필터링합니다.

@EdChum의 답변을 기반으로 다음 솔루션을 시도할 수 있습니다.

df[df.columns[pd.Series(df.columns).str.contains("foo")]]

이 방법은 선택하려는 모든 열이 다음으로 시작하지 않는 경우에 매우 유용합니다.foo이 메서드는 부분 문자열을 포함하는 모든 열을 선택합니다.foo기둥 이름의 임의의 지점에 배치할 수 있습니다.

본질적으로, 나는 교체했습니다..startswith()와 함께.contains().

원하는 항목을 선택할 수 있는 또 다른 옵션은 다음과 같습니다.map:

df.loc[(df == 1).any(axis=1), df.columns.map(lambda x: x.startswith('foo'))]

다음을 포함하는 행에 대한 모든 열을 제공합니다.1:

   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
5     6.8         1             0        5          0

행 선택은 다음과 같이 수행됩니다.

(df == 1).any(axis=1)

당신에게 다음과 같은 것을 주는 @ajcr의 대답처럼.

0     True
1     True
2     True
3    False
4    False
5     True
dtype: bool

그 행을 의미합니다.3그리고.4을 포함하지 않습니다.1선택되지 않습니다.

열 선택은 다음과 같은 부울 인덱싱을 사용하여 수행됩니다.

df.columns.map(lambda x: x.startswith('foo'))

위의 예에서는 이 값이 반환됩니다.

array([False,  True,  True,  True,  True,  True, False], dtype=bool)

이 따서열로 라지하않경는우시작로음다으이▁so▁does우.foo,False이 반환되므로 열이 선택되지 않습니다.

1 출력에서 알 수 - 할 수.

df.loc[(df == 1).any(axis=1)]

어느 쪽이 돌아옵니까?

   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      5.0     1.0         0             0        2         NA      NA
1      5.0     2.1         0             1        4          0       0
2      6.0     NaN         0           NaN        1          0       1
5      6.8     6.8         1             0        5          0       0

는 다른 번 해야 하는 않습니다. 데이터 도 있습니다. 이름이 지정된 프레임이 하나만 있으면 괜찮을 수 있습니다.df하지만 그렇지 않은 경우가 많습니다(실제 이름은 훨씬 더 길 수 있음).팬더 인덱싱 기능을 사용하여 타이핑을 줄이고 코드를 더 읽기 쉽게 만듭니다.우리가 다음과 같은 것을 사용하는 것을 막을 수 있는 것은 아무것도 없습니다.

df.loc[:, columns.startswith('foo')]

인덱서가 다음과 같을 수 있기 때문입니다.Callable그런 다음 이 유사 인덱서를 변수에 할당하여 여러 프레임에 사용할 수도 있습니다.

foo_columns = columns.startswith('foo')
df_1.loc[:, foo_columns]
df_2.loc[:, foo_columns]

인쇄도 예쁘게 할 수 있습니다.

> foo_columns
<function __main__.PandasIndexer:columns.str.startswith(pat='foo')()>

그리고 우리는 다른 방법을 사용할 수 있습니다.str 접자예):columns.contains(r'bar\d', regex=True)유용한 서명을 가져오는 동안:

> columns.contains
<function __main__.PandasIndexer:columns.str.contains(pat, case=True, flags=0, na=None, regex=True)>

이 짧은 마법 코드로 모두:

from pandas import Series
from inspect import signature, Signature


class PandasIndexer:
    def __init__(self, axis_name, accessor='str'):
        """
        Args:
            - axis_name: `columns` or `index`
            - accessor: e.g. `str`, or `dt`
        """
        self._axis_name = axis_name
        self._accessor = accessor
        self._dummy_series = Series(dtype=object)

    def _create_indexer(self, attribute):
        dummy_accessor = getattr(self._dummy_series, self._accessor)
        dummy_attr = getattr(dummy_accessor, attribute)
        name = f'PandasIndexer:{self._axis_name}.{self._accessor}.{attribute}'

        def indexer_factory(*args, **kwargs):
            def indexer(df):
                axis = getattr(df, self._axis_name)
                accessor = getattr(axis, self._accessor)
                method = getattr(accessor, attribute)
                return method(*args, **kwargs)

            bound_arguments = signature(dummy_attr).bind(*args, **kwargs)
            indexer.__qualname__ = (
                name + str(bound_arguments).replace('<BoundArguments ', '')[:-1]
            )
            indexer.__signature__ = Signature()
            return indexer

        indexer_factory.__name__ = name
        indexer_factory.__qualname__ = name
        indexer_factory.__signature__ = signature(dummy_attr)
        return indexer_factory

    def __getattr__(self, attribute):
        return self._create_indexer(attribute)

    def __dir__(self):
        """Make it work with auto-complete in IPython"""
        return dir(getattr(self._dummy_series, self._accessor))


columns = PandasIndexer('columns')

여러 접두사에 대해서도 다음과 같이 시도할 수 있습니다.

temp = df.loc[:, df.columns.str.startswith(('prefix1','prefix2','prefix3'))]

나의 해결책.성능이 더 느릴 수 있습니다.

a = pd.concat(df[df[c] == 1] for c in df.columns if c.startswith('foo'))
a.sort_index()


   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      5.0     1.0         0             0        2         NA      NA
1      5.0     2.1         0             1        4          0       0
2      6.0     NaN         0           NaN        1          0       1
5      6.8     6.8         1             0        5          0       0

언급URL : https://stackoverflow.com/questions/27275236/how-to-select-all-columns-whose-names-start-with-x-in-a-pandas-dataframe

'source' 카테고리의 다른 글

"git init"과 "git init --bare"의 차이점은 무엇입니까? (0)	2023.06.29
.NET Guid를 MongoDB 개체로 변환아이디 (0)	2023.06.29
Oracle SQL 저장 프로시저 호출 및실행 (0)	2023.06.29
Oracle에서 기본 키 열을 인덱싱해야 합니까? (0)	2023.06.29
정규식을 사용하여 Ruby 문자열에서 부분 문자열 추출 (0)	2023.06.29

현재글Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

각종 프로그래밍 정보를 다루는 블로그입니다.

Excel, JSON, mariadb, MySQL, reactjs, sql-server, jQuery, angularJS, MongoDB, Oracle, Java, C, Wordpress, css, php, Python, spring-boot, JavaScript, ajax, Git,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

manysource

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

'source' 카테고리의 다른 글

'source'의 다른글

티스토리툴바

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

Panda DataFrame에서 이름이 X로 시작하는 모든 열을 선택하는 방법

'source' 카테고리의 다른 글

'source'의 다른글

관련글

티스토리툴바