Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
164 views
in Technique[技术] by (71.8m points)

python - Counting elements for different string sets in Pandas

Suppose I have the following dataframe:

d = {'col1':['apple; kiwi; banana','orange; apple','apple', 'apple, orange, melon']}
df= pd.DataFrame(d)

to get :

                   col1
0   apple; kiwi; banana
1         orange; apple
2                 apple
3  apple, orange, melon

I want to count how many times apple comes in association with other fruits. If I do df.value_counts() I can see that each element is counted once only. I would want to know however how times apple are in coming in different cell length, as in apple has two cells that have 3 string, 1 cell with 2 string and 1 cell in 1 string. So the outcome would be:

   len of string  number for apple
0              1                 1
1              2                 1
2              3                 2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

First filter columns with apples, then count ; and add 1 for number of values separated by ; and for count use Series.value_counts:

df = df[df['col1'].str.contains('apple')]
df1= (df['col1'].str.replace(',',';')
                .str.count(';')
                .add(1)
                .value_counts(sort=False)
                .rename_axis('vals')
                .reset_index(name='count'))
print (df1)
   vals  count
0     1      1
1     2      1
2     3      2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.7k users

...