Win a copy of Secure Financial Transactions with Ansible, Terraform, and OpenSCAP this week in the Cloud/Virtualization forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Ron McLeod
  • Paul Clapham
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Rob Spoor
  • Henry Wong
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Carey Brown
  • Stephan van Hulst
  • Tim Holloway
  • Piet Souris
Bartenders:
  • Frits Walraven
  • Himai Minh
  • Jj Roberts

Compare rows within a group in a dataframe

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello everyone, and thank you for seeing my post.
What I need seems kind of basic, but I'm struggling with it, so I would appreciate your help.

This code:

   data = pd.DataFrame({'id' : ['a100', 'a100', 'a100', 'a200', 'a200', 'a200','a300','a300', 'a300', 'a400', 'a400', 'a400', 'a500', 'a500', 'a500', 'a600', 'a600', 'a600', 'a700', 'a700', 'a700', 'a800', 'a800', 'a800', 'a900', 'a900', 'a900'],
                    'type': ['euro', 'dollar', 'yen', 'euro', 'dollar', 'yen','euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen', 'euro', 'dollar', 'yen'],
                   'model': ['EQ', 'EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ','EQ', 'EQ', 'MC', 'MC', 'MC', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ', 'EQ'],
                    'status_ant': ['A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
                    'status': ['B', 'C', 'A', 'B', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'C', 'B', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'A', 'B', 'A', 'C', 'C']
                    })

results in this dataframe:

           id     type   model   status_ant status
    0   a100    euro    EQ           A        B
    1   a100  dollar     EQ          A        C
    2   a100     yen    EQ           A        A
    3   a200    euro    MC          A        B
    4   a200  dollar    MC           A        C
    5   a200     yen    MC           A        A
    6   a300    euro    EQ           B        A
    7   a300  dollar    EQ           C        A
    8   a300     yen    EQ           A        A
    9   a400    euro    MC          B        A
   10  a400  dollar    MC           C        A
   11  a400     yen    MC           A        A
   12  a500    euro    EQ           A        B
   13  a500  dollar    EQ           A        C
   14  a500     yen    EQ           A        B
   15  a600    euro    EQ           B        A
   16  a600  dollar    EQ           C        A
   17  a600     yen    EQ           B        A
   18  a700    euro    EQ           A        A
   19  a700  dollar    EQ           A        A
   20  a700     yen    EQ           A        B
   21  a800    euro    EQ           A        B
   22  a800  dollar    EQ           A        A
   23  a800     yen    EQ           A        B
   24  a900    euro    EQ           A        A
   25  a900  dollar    EQ           A        C
   26  a900     yen    EQ           A        C



I need to filter the rows I want using some conditions. The logic I need is this one:


* First of all, these conditions I will mention should only apply to rows where model == 'EQ'.

* If there's a row where:

status_ant == 'A' and status == 'B' and type == 'euro'

AND there's another row WITHIN the same group (by id) where
 
status_ant == 'A' and status == 'C' and type == 'dollar'

drop those two rows (else, keep them). The same logic applies for rows were:

status_ant == 'B' and status == 'A' and type == 'euro'

AND there's another row WITHIN the same group (by id) where

status_ant == 'C' and status == 'A' and type == 'dollar'

The row with type == 'yen' shouldn't be dropped whatever the values in status_ant and status are.


Note: I know it might be confusing, but the resulting dataframe I want is this one (it exemplifies very well all the conditions I mentioned):


         id     type   model   status_ant status
    2   a100     yen    EQ          A       A
    3   a200    euro    MC          A       B
    4   a200  dollar    MC          A       C
    5   a200     yen    MC          A       A
    8   a300     yen    EQ           A       A
    9   a400    euro    MC          B       A
    10  a400  dollar    MC          C       A
    11  a400     yen    MC          A       A
    14  a500     yen    EQ          A       B
    17  a600     yen    EQ          B       A
    18  a700    euro    EQ          A       A
    19  a700  dollar    EQ          A       A
    20  a700     yen    EQ          A       B
    21  a800    euro    EQ          A       B
    22  a800  dollar    EQ          A       A
    23  a800     yen    EQ          A       B
    24  a900    euro    EQ          A       A
    25  a900  dollar    EQ          A       C
    26  a900     yen    EQ          A       C

I was looking for the answer and found something similar here: [Pandas: Comparing rows within groups](https://stackoverflow.com/questions/48819644/pandas-comparing-rows-within-groups)
but couldn't make it work for my code. I do think an apply and groupby is what I need.

Really appreciate your help and your time. Thank you

 
Ruth Stout was famous for gardening naked. Just like this tiny ad:
SKIP - a book about connecting industrious people with elderly land owners
https://coderanch.com/t/skip-book
reply
    Bookmark Topic Watch Topic
  • New Topic