Fixing GNU bash associative array insert speed

Bash uses linear search to insert values in to associative arrays. This is all well and good for small numbers of keys. I was adding millions1. I went poking around the bash source code today (2020-04-18) to confirm my suspicion and gauge the difficulty of adding an option to do something more sensible.

In less than a day after I reported it, there is a patch My timing code and pre and post patch timings are here:

Here the steps I took and where I might go if I get serious about fixing the problem:

Get the source code

Find it

find the homepage
A quick bit of googling lead to the homepage
use git
For a minute it looked like GNU was still stuck in the bad old days of having to download a tarball and then apply a series of patches, but fortunately, it there is a git repo

Download it

git clone

Build it

Bash follows a time honored build convention


Analyze it

/* Create an entry for STRING, in TABLE.  If the entry already
   exists, then return it (unless the HASH_NOSRCH flag is set). */
hash_insert (string, table, flags)
     char *string;
     HASH_TABLE *table;
     int flags;
  int bucket;
  unsigned int hv;

  if (table == 0)
    table = hash_create (0);

  item = (flags & HASH_NOSRCH) ? (BUCKET_CONTENTS *)NULL
                               : hash_search (string, table, 0);
/* Return a pointer to the hashed item.  If the HASH_CREATE flag is passed,
   create a new hash table entry for STRING, otherwise return NULL. */
hash_search (string, table, flags)
     const char *string;
     HASH_TABLE *table;
     int flags;
  int bucket;
  unsigned int hv;

  if (table == 0 || ((flags & HASH_CREATE) == 0 && HASH_ENTRIES (table) == 0))

  bucket = HASH_BUCKET (string, table, hv);

  for (list = table->bucket_array ? table->bucket_array[bucket] : 0; list; list = list->next)
      /* This is the comparison function */
      if (hv == list->khash && STREQ (list->key, string))
          return (list);

Next steps

DONE Reach out to the maintainers

see if they would even entertain the idea of a patch

CANCELED Look for appropriate in-memory hash insert/lookup functions


CANCELED test it

CANCELED submit patch


  1. yes, there are many better tools for this job, but not in the constrained environment where this had to run. ↩︎