How can I identify and summarize sets of data from matching groups in a dataframe?

Here is an example dataframe:

set.seed(0) x1 <- c(1, 1, 1, 1, 1, 2, 2, 2, 2) x2 <- c(1, 1, 0, 0, 0, 1, 1, 1, 1) x3 <- c(1, 1, 2, 2, 4, 1, 1, 2, 1) n <- c(1, 1, 1, 5, 5, 1, 1, 1, 1) y <- rnorm(9) mydf <- data.frame(x1, x2, x3, n, y)

What I would like to do is

  1. identify rows with n=1 and which share identical values of (x1, x2, x3)
  2. return a single row for each subset with y = mean(y) and n = length(y)
  3. keep other rows the same.

for example, the new dataframe would be

x1 <- c(1, 1, 1, 1, 2, 2) x2 <- c(1, 0, 0, 0, 1, 1) x3 <- c(1, 2, 2, 4, 1, 2) n <- c(2, 1, 5, 5, 3, 1) y <- c(mean(y[1:2]), y[3], y[4], y[5], mean(y[c(6:7,9)]), y[8]) newdf <- data.frame(x1, x2, x3, n, y)

I can figure this out with conditionals and loops, but I would prefer to learn more elegant way to do this.

-------------Problems Reply------------

By "identical values in other columns", I take it you mean that each subset is defined by the same value of x1 in each of the rows of the subset, not that x1 is equal to x2. Thanks for the example to see what you meant.


To get parts one and two

ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y))

This can be rbind-ed with the part of mydf where n!=1 to get what you said

ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise, n = length(y), y = mean(y)),

This doesn't have the same order as you listed. If that is really important, you can add some auxiliary sorting variables.

mydf$order = seq(length=nrow(mydf))
newdf <- rbind(
ddply(mydf[mydf$n==1,], .(x1, x2, x3), summarise,
n = length(y), y = mean(y), order=min(order)),
newdf <- newdf[order(newdf$order),]
newdf$order <- NULL

Category:r Views:0 Time:2011-08-29
Tags: data.frame

Related post

  • How to identify the type of socket data? 2010-06-06

    May be i am not able to express my doubt properly in this question but still i will try. Basically i created a simple socket based chat program and everything works fine. But i think i have made many patches in it from the design point of view. I hav

  • summarize the the data from the last four worksheets as I add more worksheets into the workbook 2013-06-27

    I am creating a summary worksheet and I want to summarize the the data from the last four worksheets as I add more worksheets into the workbook. All of the worksheets are of the same format. I have created the formulas for specific worksheets but am

  • Save edited cell to mysql - How can i send an Identifier with the edited cell data? 2009-08-12

    i have the problem that i can't send any identifier for the edited content to the edit.php file. it sends automaticaly an id=1 parameter for the first row in the grid for example...but this is not the same value as in mysql table column "id". the cor

  • How do you dynamically identify unknown delimiters in a data file? 2010-10-17

    I have three input data files. Each uses a different delimiter for the data contained therein. Data file one looks like this: apples | bananas | oranges | grapes data file two looks like this: quarter, dime, nickel, penny data file three looks like t

  • Can I use conditional formatting to identify unique values in two data sets that should be identical? 2013-02-14

    Hi everybody, I was wondering if anybody could help me with a data checking problem? I am using Excel to check my data has been entered correctly, I have entered all my data twice, and have now put it in neighboring rows in my spreadsheet. I wondered

  • Identify if 2 sets of dates fall at same time - Formula help 2014-05-18

    Hi, I have 2 lines of data, each line tells me when a person travelled (travel start and end dates), I need to see whether on 2 lines of data the travel dates overlap. For example Travel Start Date Travel End Date 01/01/11 20/01/11 06/01/11 10/01/11

  • Identifying the season from the Date using Java 2008-10-09

    I've had nothing but good luck from SO, so why not try again? I have an application that needs to show a different image based on the season of the year (spring, summer, winter, fall). I have very specific start and end dates for these seasons. What

  • How to identify and get the sqlserver data files filepath 2009-06-18

    I am afraid that i am unable to locate the absolute path of the SQL SERVER data files. I have tried to so do by doing the following. foreach( Database db in srv.Databases) string filepath=db.PrimaryFilepath; string name=db.Name; abspth=filepath+"//"+

  • Identify the checkbox that is checked, in a group of checkboxes 2010-06-01

    I have 2 checkboxes in a form and onclick of these, some php code needs to be executed and based on the result of the code, the checkbox is checked or unchecked. So i have written onclick = document.formName.submit(); Now it is triggering the same pa

  • Efficiently identifying if any item in first set matches any item in second set 2010-11-25

    I have two IEnumerable<string> that represent lists of strings. I want to see if any element in the first set matches any element in the second set. At the moment I have something that looks like this: firstSet.Intersect(secondSet).Count() >

  • how to identify the login details from data base 2011-02-01

    i created a signup page and enter those values in database. Now i want to create a login page and have to check the details from the data base, whether the user registered or not? How can any one help me...please? --------------Solutions-------------

  • hotel reservation system SQL: identify any room available in date range 2011-04-10

    (In case this seems familiar: I asked a different question similar to this one, but I just got a note that the site design has changed and now the owners do want a way to search for any room that is available between a range of dates. So this is a ne

  • Summarize or not summarize the result set based on grouping of row 2011-09-14

    probably sql makes me dizzy when complexity level increases. It is easier to put a for loop and work in c#. I have a query like select.field1,.field2, field3,field4 from table1 Suppose this returns rows 1, 2, 3, 4, 5, 6. I want to return summarized o

  • Identify change in list, check date in adjacent cell 2013-08-12

    In Column A list of sites, listed in order, usually 6 to 8 entries for each site. Column B due date Column C Function to zip down A, at each last entry of each site look at the corresponding date in column B. Highlight only if the date is less than 2

  • The domain "" has been identified as an insecure domain for mail-enabled groups with hidden DL membership. 2014-12-05

    This message is coming while installing domainprep in Exchange server 2003 --------------Solutions------------- Hello The Domain, You can find the Exchange Server support forums on TechNet, please create a new post at the following link:

  • Identify full vs half yearly datasets in SQL 2008-10-13

    I have a table with two fields of interest for this particular exercise: a CHAR(3) ID and a DATETIME. The ID identifies the submitter of the data - several thousand rows. The DATETIME is not necessarily unique, either. (The primary keys are other fie

  • Identifying Exception Type in a handler 2009-11-26

    I have created custom exception class public class Web2PDFException : Exception { public Web2PDFException(string message, Exception innerException) : base(message, innerException) { } } In my application I want to find out is throw exception is my cu

  • VS2008 C# : Regular expression and identifying certain words 2009-12-17

    I would like to use Regular expression to identify certain words in a string. For example: "bla bla bla | First Name = John Doe | City = Denver | bla bla bla | State = CA | bla bla bla" In the above string, which is | delimited for words, I want to p

  • ORA-00904: "FORMAT": invalid identifier 2010-04-26

    I am trying to format a date: FORMAT(table.TCKT.TCKT_ISS_DATE, 'YYYY') AS TICKETYEAR but I am getting the following error: ORA-00904: "FORMAT": invalid identifier Right now the date show the complete timestamp. Any suggestions on how to fix this prob

Copyright (C), All Rights Reserved.

processed in 0.144 (s). 11 q(s)