An exercise in large datasets, Web Workers, and rendering large lists
A list of ~500k English words searchable with regex. The project explores the feasibility of filtering and displaying large datasets without hindering UI performance.
Note that this project was for experimental purposes and is not meant to be used as a definitive list of words found in the English language. Further, the list contains many words and phrases which some may find inappropriate. The author is neither the creator nor the host of the word list. For issues regarding words found in the word list, please refer to this repository.
Ensure Node.js is installed. Clone the repository and navigate to project directory.
git clone https://github.com/ashtonmeuser/regex-dictionary.git && cd regex-dictionary
Install dependencies and run the development server.
npm install && npm run serve
See the scripts
portion of package.json for options to build and deploy the project.
The project is built using Vue.js and Vuex.
The words used in this project are downloaded from this repository upon the created
lifecycle event of the App component. See lifecycle documentation of Vue components for more information.
This experiment highlighted three interesting problems:
Rendering a large list was made trivial by implementing vue-virtual-scroller. A RecycleScroller
component is introduced which renders only elements that are visible. When a user scrolls, the existing list items’ DOM is reused.
Simple elements are used within the RecycleScroller
to avoid the overhead involved with rendering functional Vue components.
A DynamicScroller
should be used if list items are to have a dynamic height.
The word list is filtered using a regex pattern. The two main ways to match strings against a regex pattern are String.match() and RegExp.test(). In practice, the RegExp.test()
method proved to be far more efficient. See the Regex Performance section of this document for a deeper look into both methods.
The word list is filtered in the filterWords
action of the Vuex store.
JavaScript is a single-threaded language. There is, however, a relatively new means of offloading work to a background thread known as Web Workers.
A Web Worker is instantiated by providing a file containing the JavaScript it is to execute. The author created the WebWorker wrapper class to handle inline JavaScript. The run()
method of the class can be used to execute the provided function with supplied arguments. The wrapper class was largely inspired by simple-web-worker.
The WebWorker
class returns a standard Worker
instance. As such, the created workers can be cancelled via the Worker.terminate()
method. Each time a new regex pattern is entered, the current background worker, if it exists, is terminated and a new worker is created with the updated pattern. If the worker successfully filters the word list before a new pattern is entered, the filtered list is returned to the main thread via the onmessage()
event handler and the store is updated. This allows the UI to remain responsive.
The RegExp.test()
method proved to be roughly 60 to 350 times faster than String.match()
. See the table below for tests run againt the word list (466,551 words at the time of writing, n=10 for each test).
Regex Pattern | RegExp.test (ms) | String.match (ms) |
---|---|---|
^aa.*$ | 31.10 (s=5.07) | 9686.20 (s=46.36) |
(.{2,})\1\1 | 189.90 (s=1.64) | 12843.70 (s=134.82) |
(.)\1{3,} | 135.50 (s=8.22) | 13525.50 (s=123.14) |
^\d.*\d$ | 29.30 (s=1.10) | 10394.20 (s=185.55) |
Based on the fact that the execution times of the two methods are non-linear, it is the author’s conclusion that there is an overhead involved with the String.match()
method. This may involve creating the array which is to store match groups returned by the String.match()
method. This has been discussed here and here.