Why malloc+memset is slower than calloc?

It's known that calloc is different than malloc in that it initializes the memory allocated. With calloc, the memory is set to zero. With malloc, the memory is not cleared.

So in everyday work, I regard calloc as malloc+memset. Incidentally, for fun, I wrote the following code for a benchmark.

The result is confusing.

Code 1:

#include<stdio.h> #include<stdlib.h> #define BLOCK_SIZE 1024*1024*256 int main() { int i=0; char *buf[10]; while(i<10) { buf[i] = (char*)calloc(1,BLOCK_SIZE); i++; } }

Output of Code 1:

time ./a.out **real 0m0.287s** user 0m0.095s sys 0m0.192s

Code 2:

#include<stdio.h> #include<stdlib.h> #include<string.h> #define BLOCK_SIZE 1024*1024*256 int main() { int i=0; char *buf[10]; while(i<10) { buf[i] = (char*)malloc(BLOCK_SIZE); memset(buf[i],'',BLOCK_SIZE); i++; } }

Output of Code 2:

time ./a.out **real 0m2.693s** user 0m0.973s sys 0m1.721s

Replacing memset with bzero(buf[i],BLOCK_SIZE) in Code 2 produces the same result.

My question is: Why is malloc+memset so much slower than calloc? How can calloc do that?

-------------Problems Reply------------

This is probably due to your large allocation size. You might want to read up on how virtual memory works and OS theory.

When you allocate a large enough region of memory (the threshold is often 1 MiB if memory serves), most allocators will get a new region of memory from the kernel using "mmap" just for that region. However, when "mmap" gives you new pages of memory, they have to be initialized to zero (when using MAP_ANONYMOUS). If they weren't, they'd be filled with all sorts of junk from other applications — and this is a serious security vulnerability. What if root was editing /etc/shadow with those pages? The same also applies if "malloc" runs out of memory for small allocations and calls "sbrk" to get more.

But it would take too long to zero all that memory. The kernel cheats. There is a page of memory already zeroed set aside. All pages in the new allocation point at this one page of physical ram, which is shared among all processes on the system, so it doesn't actually use any memory. It's marked as read-only. As soon as you write to it, the processor raises an exception. The kernel's exception handler then grabs a page of RAM (possibly swapping something else out), fills it with zeroes, and maps it into your process's address space. The "calloc" function can exploit this.

(Actually, the kernel can go a step further, and have "mmap" do nothing to your process's memory until you read from it.)

The "memset" implementation touches every page in the allocation, resulting in much higher memory usage — it forces the kernel to allocate those pages now, instead of waiting until you actually use them.

The "calloc" implementation just changes a few page tables, consumes very little actual memory, writes to very little memory, and returns. On most systems you can even allocate more memory than your system can support (more than RAM + swap) without problems, as long as you don't write to all of it. (This feature is slightly controversial on the operating systems that allow this.)

Some systems do not support virtual memory: very old ones (think 80286) and some embedded systems. On these systems, the speeds might be much closer.



There are a few guesses in other answers that hypothesize that "memset" is slower than "calloc" because "memset" can't assume the memory is aligned. Here is how a typical "memset" implementation works:

function memset(dest, c, len)
// one byte at a time, until the dest is aligned...
while (len > 0 && ((unsigned int)dest & 15))
*dest++ = c
len -= 1
// now write big chunks at a time (processor-specific)...
// block size might not be 16, it's just pseudocode
while (len >= 16)
// some optimized vector code goes here
// glibc uses SSE2 when available
dest += 16
len -= 16
// the end is not aligned, so one byte at a time
while (len > 0)
*dest++ = c
len -= 1

In a 256 MiB chunk, that first and last loop are going to be negligible, and the middle loop is the same as the hypothetical calloc loop. Some compilers inline "memset", and can even infer that the result of "malloc" is an aligned block. And a typical "calloc" implementation just calls "memset" anyway — "calloc" is usually written in C, and it's often portable across operating systems.

The other guess I saw was that "malloc" already initializes the memory, so "memset" initializes it twice. This is technically true in this case. However, it would only account for about a factor of two in speed. The "calloc" version is ten to fifteen hundred times faster. The numbers do not support the conclusion.



Footnote: Just for giggles, I timed the two programs on two of my computers. On my OS X / PowerPC box, the memset version was over 1500x slower (8 s versus 5 ms). On my Linux / x86 box, the memset version ran for 35x as long before segfaulting (expected, that computer has less RAM — note though that the calloc version didn't crash).

Because on many systems, in spare processing time, the OS goes around setting free memory to zero on its own and marking it safe for calloc(), so when you call calloc(), it may already have free, zeroed memory to give you.

On some platforms in some modes malloc initialises the memory to some typically non-zero value before returning it, so the second version could well initialize the memory twice

Category:c# Views:0 Time:2010-04-22
Tags: c# malloc

Related post

  • Difference between malloc and calloc? 2009-10-08

    What is the difference between doing: ptr = (char **) malloc (MAXELEMS * sizeof(char *)); or: ptr = (char **) calloc (MAXELEMS, sizeof(char*)); When is it a good idea to use calloc over malloc or vice versa? --------------Solutions------------- callo

  • C - calloc() v. malloc() 2010-08-10

    Possible Duplicate: c difference between malloc and calloc Please explain the significance of this statement, Another difference between the malloc() and calloc() functions is that the memory allocated by malloc( ) function contains garbage values, w

  • calloc v/s malloc and time efficiency 2010-04-09

    I've read with interest the post C difference between malloc and calloc. I'm using malloc in my code and would like to know what difference I'll have using calloc instead. My present (pseudo)code with malloc: Scenario 1 int main() { allocate large ar

  • I'm very confused about malloc() and calloc() on C 2010-11-21

    I've always programmed in Java, which is probably why I'm so confused about this: In Java I declare a pointer: int[] array and initialize it or assign it some memory: int[] array = {0,1,0} int[] array = new int[3] Now, in C, it's all so confusing. At

  • How to free() a malloc()'d structured correctly? 2010-02-02

    i have a structure malloc()'d, and after using them, i want to free() it, but my program freezes out here. Can anyone tell me, what i am doing wrong? Here is my code: struct data { char *filename; char *size; }; //primarypcs is a long type variable s

  • Two arguments to calloc 2010-11-03

    Why does calloc take two arguments instead of one like malloc? Specifically, since there is no difference between (or is there?) between the following expressions: calloc (a, b); calloc (b, a); calloc (a * b, 1); calloc (1, a * b); why not just accep

  • Why does malloc initialize the values to 0 in gcc? 2011-11-06

    Maybe it is different from platform to platform, but when I compile using gcc and run the code below, I get 0 every time in my ubuntu 11.10. #include <stdio.h> #include <stdlib.h> int main() { double *a = (double*) malloc(sizeof(double)*1

  • Malloc has junk for C string? 2012-01-13

    I'm new to C, so feel free to correct mistakes. I have some code that somewhat goes like this: // some variables declared here like int array_size char* cmd = (char*)malloc(array_size*sizeof(char)); for(;;){ // code here sets cmd to some string free(

  • Use OpenBSD's malloc, realloc and free in my program 2010-05-20

    I would like to use OpenBSD's implementation of malloc, realloc and free on my Debian lenny desktop rather than glibc's. Are they simply drop in replacements: will they work on my Linux desktop ? Which are the file(s) that I need and which OpenBSD pa

  • Is cudamalloc slower than cudamemcpy? 2011-07-13

    i am working on a code which needs to be time efficient and thus using Cufft for this purpose but when i try to compute fft of a very large data in parallel it is slower than cpu fftw and the reason i find after finding the time for every line of cod

  • XCode Analyzer Warning "result of calloc is converted-" 2013-06-14

    I have this lines: char* data = (char *) mmap(NULL, size, PROT_READ, MAP_SHARED, fd, offset); ... size_t lineLength = strlen(data); char *out = calloc(lineLength, sizeof(data)); at the last line I get with the new XCode the following warning Result o

  • dynamic allocating array of arrays in C 2009-01-18

    I don't truly understand some basic things in C like dynamically allocating array of arrays. I know you can do: int **m; in order to declare a 2 dimensional array (which subsequently would be allocated using some *alloc function). Also it can be "eas

  • Best way to handle memory allocation in C? 2009-04-06

    I think I've got a good grasp on how to handle memory in C++ but doing it in C is different I'm a bit off. In C++ I've got constructors and destructors, I've got the pretty straightforward new and delete and I know how to encapsulate it using RAII, u

  • Overwriting vs allocation/deallocation - efficiency 2009-06-02

    I am writing a C++ application which will require a block of memory (about 1000 bytes) as a temporary buffer for some text processing. The operation may be repeated up to 10,000 times a second. Could anybody confirm that it would be more expensive to

  • At what point is it worth reusing arrays in Java? 2009-12-23

    How big does a buffer need to be in Java before it's worth reusing? Or, put another way: I can repeatedly allocate, use, and discard byte[] objects OR run a pool to keep and reuse them. I might allocate a lot of small buffers that get discarded often

  • Is it bad practice to declare an array mid-function 2010-06-07

    in an effort to only ask what I'm really looking for here... I'm really only concerned if it's considered bad practice or not to declare an array like below where the size could vary. If it is... I would generally malloc() instead. void MyFunction()

  • Objective-C sparse array redux 2011-09-22

    First off, I've seen this, but it doesn't quite seem to suit my needs. I've got a situation where I need a sparse array. Some situations where I could have, say 3000 potential entries with only 20 allocated, other situations where I could have most o

  • How to lazy allocate zeroed memory? 2011-11-11

    From what I understand, I have to choose between calloc, which will allocate zeroed memory, and malloc, which can allocate memory on demand. Is there a function that combines both those properties? Maybe direct call to mmap? If it's possible, why cal

  • Dynamically allocate C struct? 2009-12-30

    I want to dynamically allocate a C struct: typedef struct { short *offset; char *values; } swc; Both 'offset' and 'values' are supposed to be arrays, but their size is unknown until runtime. How can I dynamically allocate memory for my struct and the

Copyright (C) dskims.com, All Rights Reserved.

processed in 0.254 (s). 11 q(s)