# statistics

• ## how to gen variable = 1 if at least two dummy variables == 1 in Stata?2019-03-13

I am trying to generate a dummy variable that = 1 if at least two or more (out of seven) dummy variables also == 1. Could anybody tell me an efficient way of doing this? --------------Solutions------------- Let's suppose that the indicator variables

• ## Semivariance of variable2017-12-19

I'm new with sql and I struggle with such a problem. Let's suppose I have a table like this: Date Value 2014-01-01 1248.56 2014-01-02 1247.24 2014-01-03 1245.82 2014-01-04 1252.07 ... All I want to do is count semivariance of variable 'Value'. Semiva

Tags: sql, sql server, statistics
• ## How to calculate probability in normal distribution with R2017-11-28

There is a variable M with normal distribution N(μ, σ), where μ=100 and σ = 10. Find the probability P{|M-80|≥ 11}? What I did using R was: P{|M-80|≥ 11} = P{|M|≥ 11 + 80} = P{|M|≥ 91} pnorm(91, mean=100, sd=10, lower.tail = FALSE) But it's incorrect

• ## Find most frequent value (mode) for each variable2017-10-20

I got a RDD similar to this color category green a green b red a orange a green b red d green c red d green e And I'm trying to find the most frequent category for each color. Something like this: [green, b] : 2 [red, d ] : 2 [orange, a] : 1 I'm alre

Tags: python, statistics, apache spark
• ## Univariate probability distribution over tuples2017-06-26

I would like to generalize an existing univariate probability distribution framework to the analysis of 3-tuples. In the existing framework, normal or beta distributions, respectively, are fitted to a distribution over real numbers. Now I would like

Tags: statistics, tuples
• ## Weighted mean in numpy/python2016-08-10

I have a big continuous array of values that ranges from (-100, 100) Now for this array I want to calculate the weighted average described here since it's continuous I want also to set breaks for the values every 20 i.e the values should be discrete

Tags: python, statistics, numpy, mean, weighted
• ## plot a graph from C program using GNUPLOT2015-02-09

I wrote a computer simulation program (the details are irrelevant) that writes on two files some results in a 2 colums format, then i used gnuplot from another terminal to plot these results on a graph, successfully. Now i want to integrate the graph

Tags: statistics, graph
• ## Merge two `data.table` objects2015-02-09

I have two data sets and I want to map the second data set to the first one: n <- c(2, 3, 5,6,7,8) s <- c("aa", "bb", "cc","aa", "bb", "cc") b <- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE) df <- data.table(n, s, b) rs <- c("aa", "bb", "c

Tags: statistics, data.table
• ## Extracting the indices of outliers in Linear Regression2015-02-04

The following script computes R-squared value between two numpy arrays(x and y). The R-squared value is very low due to outliers in the data. How can I extract the indices of those outliers? import numpy as np, matplotlib.pyplot as plt, scipy.stats a

Tags: statistics, numpy, matplotlib, scipy
• ## Aggregate Function - Keep NAs in data.frame2015-02-02

I want to use the aggregation function of R to aggregate a Price on several fields. However, I also have NAs in my data, which I would like to keep. Tried: > dput(df) structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 3L, 2L

Tags: statistics
• ## How to calculate the integral of log-normal distribution with MATLAB2015-01-26

I want to calculate the part-expectation of log-normal distribution via: m = 1; v = 2; mu = log((m^2)/sqrt(v+m^2)); sigma = sqrt(log(v/(m^2)+1)); syms x; d = x*lognpdf(x,mu,sigma); int(d, x, 0, 10); However, MATLAB says: Error using symfun>validat

• ## Cannonical Correlation Analysis2015-01-26

I have just started working using CCA in Matlab. I have two vectors X and Y of dimension 60x1920 and 60x1536 with the number of samples being 60 and variables in the different set of vectors being 1920 and 1536 respectively. I want to know do CCA for

Tags: matlab, statistics, correlation
• ## How to find trend changes in a set of data?2015-01-14

I'm trying to analyze a set of data in order to find trends and trend changes in the data. My data is a list of answers to a concrete quiz ordered by date. My aim is to find if the users can be answering a option based on previous answers more than a

Tags: algorithm, statistics, trend
• ## Get stats from World Screen Resultion, in percentage greater than a value?2014-12-29

I search statistics website where I can get the percentage of screen resolution greater than/lower than 1200px or 1024px width (in world). The problem of statcounter.com is that they give resolution, but one by one. Thanks --------------Solutions----

• ## Calculating probability with sklearn GMM2014-12-15

I want to determine the probability that a data point belongs to a population of data. I read that sklearn GMM can do this. I tried the following.... import numpy as np from sklearn.mixture import GMM training_data = np.hstack(( np.random.normal(500,

• ## Finding local maxima2014-12-04

I'd like to find the local maxima for a set of data. I have a log of flight data from a sounding rocket payload, and I'd like to find the approximate times for the staging based on accelerometer data. I should be able to get the times I want based on

Tags: javascript, statistics
• ## Why is the chi-square I get for VCD's assocstats() function different from descr's crosstab() function?2014-10-11

I'm using the descr package so I can use crosstab() to make a table and return a chi square for two variables. However, I also need the phi coefficient, which VCD's assocstats() returns (I know it's a simple formula, but eh lazy). However, the chi-sq

• ## How to calculate standard deviation with R for a file with a single numeric column?2014-10-04

I have a file with the following data: 12341231 1231312 1233123 1231313 523454 6567 73525 I would like to read the file into an R object and calculate STD on the data. --------------Solutions------------- I'd probably use scan for that file. You don'

Tags: statistics
• ## R: Modifying Subsets of Dataframe using Calculations on that Subset2014-09-26

I am going to ask my question through example, because I don't know what the best way to phrase it in general is. Using the ChickWeight dataset built into R: > head(ChickWeight) weight Time Chick Diet 1 42 0 1 1 2 51 2 1 1 3 59 4 1 1 4 64 6 1 1 5

Tags: statistics
• ## Basic statistical analysis of database optimization strategy2014-09-19

I have a set of database tables of known but variable dimensions. Some of them are gigantic (100 millions rows or more) and table optimization is needed to speed these processes up. However, a full optimization is expensive for extremely large tables