I am trying to generate a dummy variable that = 1 if at least two or more (out of seven) dummy variables also == 1. Could anybody tell me an efficient way of doing this? --------------Solutions------------- Let's suppose that the indicator variables
I'm new with sql and I struggle with such a problem. Let's suppose I have a table like this: Date Value 2014-01-01 1248.56 2014-01-02 1247.24 2014-01-03 1245.82 2014-01-04 1252.07 ... All I want to do is count semivariance of variable 'Value'. Semiva
There is a variable M with normal distribution N(μ, σ), where μ=100 and σ = 10. Find the probability P{|M-80|≥ 11}? What I did using R was: P{|M-80|≥ 11} = P{|M|≥ 11 + 80} = P{|M|≥ 91} pnorm(91, mean=100, sd=10, lower.tail = FALSE) But it's incorrect
I got a RDD similar to this color category green a green b red a orange a green b red d green c red d green e And I'm trying to find the most frequent category for each color. Something like this: [green, b] : 2 [red, d ] : 2 [orange, a] : 1 I'm alre
I would like to generalize an existing univariate probability distribution framework to the analysis of 3-tuples. In the existing framework, normal or beta distributions, respectively, are fitted to a distribution over real numbers. Now I would like
I have a big continuous array of values that ranges from (-100, 100) Now for this array I want to calculate the weighted average described here since it's continuous I want also to set breaks for the values every 20 i.e the values should be discrete
I wrote a computer simulation program (the details are irrelevant) that writes on two files some results in a 2 colums format, then i used gnuplot from another terminal to plot these results on a graph, successfully. Now i want to integrate the graph
I have two data sets and I want to map the second data set to the first one: n <- c(2, 3, 5,6,7,8) s <- c("aa", "bb", "cc","aa", "bb", "cc") b <- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE) df <- data.table(n, s, b) rs <- c("aa", "bb", "c
The following script computes R-squared value between two numpy arrays(x and y). The R-squared value is very low due to outliers in the data. How can I extract the indices of those outliers? import numpy as np, matplotlib.pyplot as plt, scipy.stats a
I want to use the aggregation function of R to aggregate a Price on several fields. However, I also have NAs in my data, which I would like to keep. Tried: > dput(df) structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 3L, 2L
I want to calculate the part-expectation of log-normal distribution via: m = 1; v = 2; mu = log((m^2)/sqrt(v+m^2)); sigma = sqrt(log(v/(m^2)+1)); syms x; d = x*lognpdf(x,mu,sigma); int(d, x, 0, 10); However, MATLAB says: Error using symfun>validat
I have just started working using CCA in Matlab. I have two vectors X and Y of dimension 60x1920 and 60x1536 with the number of samples being 60 and variables in the different set of vectors being 1920 and 1536 respectively. I want to know do CCA for
I'm trying to analyze a set of data in order to find trends and trend changes in the data. My data is a list of answers to a concrete quiz ordered by date. My aim is to find if the users can be answering a option based on previous answers more than a
I search statistics website where I can get the percentage of screen resolution greater than/lower than 1200px or 1024px width (in world). The problem of statcounter.com is that they give resolution, but one by one. Thanks --------------Solutions----
I want to determine the probability that a data point belongs to a population of data. I read that sklearn GMM can do this. I tried the following.... import numpy as np from sklearn.mixture import GMM training_data = np.hstack(( np.random.normal(500,
I'd like to find the local maxima for a set of data. I have a log of flight data from a sounding rocket payload, and I'd like to find the approximate times for the staging based on accelerometer data. I should be able to get the times I want based on
I'm using the descr package so I can use crosstab() to make a table and return a chi square for two variables. However, I also need the phi coefficient, which VCD's assocstats() returns (I know it's a simple formula, but eh lazy). However, the chi-sq
I have a file with the following data: 12341231 1231312 1233123 1231313 523454 6567 73525 I would like to read the file into an R object and calculate STD on the data. --------------Solutions------------- I'd probably use scan for that file. You don'
I am going to ask my question through example, because I don't know what the best way to phrase it in general is. Using the ChickWeight dataset built into R: > head(ChickWeight) weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5
I have a set of database tables of known but variable dimensions. Some of them are gigantic (100 millions rows or more) and table optimization is needed to speed these processes up. However, a full optimization is expensive for extremely large tables