# How can I identify and summarize sets of data from matching groups in a dataframe?

Here is an example dataframe:

`set.seed(0) x1 <- c(1, 1, 1, 1, 1, 2, 2, 2, 2) x2 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1) x3 <- c(1, 1, 2, 2, 4, 1, 1, 2, 1) n <- c(1, 1, 1, 5, 5, 1, 1, 1, 1) y <- rnorm(9) mydf <- data.frame(x1, x2, x3, n, y) `

What I would like to do is

1. identify rows with n=1 and which share identical values of (x1, x2, x3)
2. return a single row for each subset with y = mean(y) and n = length(y)
3. keep other rows the same.

for example, the new dataframe would be

`x1 <- c(1, 1, 1, 1, 2, 2) x2 <- c(1, 0, 0, 0, 1, 1) x3 <- c(1, 2, 2, 4, 1, 2) n <- c(2, 1, 5, 5, 3, 1) y <- c(mean(y[1:2]), y[3], y[4], y[5], mean(y[c(6:7,9)]), y[8]) newdf <- data.frame(x1, x2, x3, n, y) `

I can figure this out with conditionals and loops, but I would prefer to learn more elegant way to do this.

-------------Problems Reply------------

By "identical values in other columns", I take it you mean that each subset is defined by the same value of `x1` in each of the rows of the subset, not that `x1` is equal to `x2`. Thanks for the example to see what you meant.

```library("plyr") ```

To get parts one and two

```ddply(mydf[mydf\$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y)) ```

This can be `rbind`-ed with the part of `mydf` where `n!=1` to get what you said

```rbind( ddply(mydf[mydf\$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y)), mydf[mydf\$n!=1,] ) ```

This doesn't have the same order as you listed. If that is really important, you can add some auxiliary sorting variables.

```mydf\$order = seq(length=nrow(mydf)) newdf <- rbind( ddply(mydf[mydf\$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y), order=min(order)), mydf[mydf\$n!=1,] ) newdf <- newdf[order(newdf\$order),] newdf\$order <- NULL ```

Category:r Views:0 Time:2011-08-29
Tags: data.frame

## Related post

• How to identify the type of socket data? 2010-06-06

May be i am not able to express my doubt properly in this question but still i will try. Basically i created a simple socket based chat program and everything works fine. But i think i have made many patches in it from the design point of view. I hav

• summarize the the data from the last four worksheets as I add more worksheets into the workbook 2013-06-27

I am creating a summary worksheet and I want to summarize the the data from the last four worksheets as I add more worksheets into the workbook. All of the worksheets are of the same format. I have created the formulas for specific worksheets but am

• Save edited cell to mysql - How can i send an Identifier with the edited cell data? 2009-08-12

i have the problem that i can't send any identifier for the edited content to the edit.php file. it sends automaticaly an id=1 parameter for the first row in the grid for example...but this is not the same value as in mysql table column "id". the cor

• How do you dynamically identify unknown delimiters in a data file? 2010-10-17

I have three input data files. Each uses a different delimiter for the data contained therein. Data file one looks like this: apples | bananas | oranges | grapes data file two looks like this: quarter, dime, nickel, penny data file three looks like t

• Can I use conditional formatting to identify unique values in two data sets that should be identical? 2013-02-14

Hi everybody, I was wondering if anybody could help me with a data checking problem? I am using Excel to check my data has been entered correctly, I have entered all my data twice, and have now put it in neighboring rows in my spreadsheet. I wondered

• Identify if 2 sets of dates fall at same time - Formula help 2014-05-18

Hi, I have 2 lines of data, each line tells me when a person travelled (travel start and end dates), I need to see whether on 2 lines of data the travel dates overlap. For example Travel Start Date Travel End Date 01/01/11 20/01/11 06/01/11 10/01/11

• Identifying the season from the Date using Java 2008-10-09

I've had nothing but good luck from SO, so why not try again? I have an application that needs to show a different image based on the season of the year (spring, summer, winter, fall). I have very specific start and end dates for these seasons. What

• How to identify and get the sqlserver data files filepath 2009-06-18

I am afraid that i am unable to locate the absolute path of the SQL SERVER data files. I have tried to so do by doing the following. foreach( Database db in srv.Databases) string filepath=db.PrimaryFilepath; string name=db.Name; abspth=filepath+"//"+

• Identify the checkbox that is checked, in a group of checkboxes 2010-06-01

I have 2 checkboxes in a form and onclick of these, some php code needs to be executed and based on the result of the code, the checkbox is checked or unchecked. So i have written onclick = document.formName.submit(); Now it is triggering the same pa

• Efficiently identifying if any item in first set matches any item in second set 2010-11-25

I have two IEnumerable<string> that represent lists of strings. I want to see if any element in the first set matches any element in the second set. At the moment I have something that looks like this: firstSet.Intersect(secondSet).Count() >

• how to identify the login details from data base 2011-02-01

i created a signup page and enter those values in database. Now i want to create a login page and have to check the details from the data base, whether the user registered or not? How can any one help me...please? --------------Solutions-------------

• hotel reservation system SQL: identify any room available in date range 2011-04-10

(In case this seems familiar: I asked a different question similar to this one, but I just got a note that the site design has changed and now the owners do want a way to search for any room that is available between a range of dates. So this is a ne

• Summarize or not summarize the result set based on grouping of row 2011-09-14

probably sql makes me dizzy when complexity level increases. It is easier to put a for loop and work in c#. I have a query like select.field1,.field2, field3,field4 from table1 Suppose this returns rows 1, 2, 3, 4, 5, 6. I want to return summarized o

• Identify change in list, check date in adjacent cell 2013-08-12

In Column A list of sites, listed in order, usually 6 to 8 entries for each site. Column B due date Column C Function to zip down A, at each last entry of each site look at the corresponding date in column B. Highlight only if the date is less than 2

• The domain "Example.com" has been identified as an insecure domain for mail-enabled groups with hidden DL membership. 2014-12-05

This message is coming while installing domainprep in Exchange server 2003 --------------Solutions------------- Hello The Domain Example.com, You can find the Exchange Server support forums on TechNet, please create a new post at the following link:

• Identify full vs half yearly datasets in SQL 2008-10-13

I have a table with two fields of interest for this particular exercise: a CHAR(3) ID and a DATETIME. The ID identifies the submitter of the data - several thousand rows. The DATETIME is not necessarily unique, either. (The primary keys are other fie

• Identifying Exception Type in a handler 2009-11-26

I have created custom exception class public class Web2PDFException : Exception { public Web2PDFException(string message, Exception innerException) : base(message, innerException) { } } In my application I want to find out is throw exception is my cu

• VS2008 C# : Regular expression and identifying certain words 2009-12-17

I would like to use Regular expression to identify certain words in a string. For example: "bla bla bla | First Name = John Doe | City = Denver | bla bla bla | State = CA | bla bla bla" In the above string, which is | delimited for words, I want to p

• ORA-00904: "FORMAT": invalid identifier 2010-04-26

I am trying to format a date: FORMAT(table.TCKT.TCKT_ISS_DATE, 'YYYY') AS TICKETYEAR but I am getting the following error: ORA-00904: "FORMAT": invalid identifier Right now the date show the complete timestamp. Any suggestions on how to fix this prob

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.144 (s). 11 q(s)