Introduction - Hypergol - Data Science Productivity

Hypergol was created from our experience in creating industrial-strength machine learning products at a London startup. We realised that success is not only about writing and training deep learning models but also how you store and process data efficiently. To avoid reinventing the wheel each time you face the same problem, we created Hypergol.

Situation

When you try to solve a production level problem, you do some initial analysis to justify the productionisation efforts. This is usually happening in an ad-hoc way. You open a Jupyter notebook, download some data, do some cleaning, train some model and plot some charts—no big deal.

You don’t want to invest too much energy into this phase because: 1. It might not work, and then any extra effort is wasted. 2. This is just a temporary phase, and you want to conserve your energy for the “real” part.

Complication

But your stakeholders want further and further proofs of value which require additional cleaning in an ad-hoc framework, mostly in a notebook with dozens of cells. Very time consuming and inefficient.

Why does this keep happening?

When you start on a new problem, you try to focus on the problem itself and not dull technical details. You try to deal with low hanging fruits and not small fish. But this will soon come to an end, and you need to dig into the details to deal with special cases. But this would usually require large scale evaluation to find as many of these edge cases as possible. There are tools for these, but they are hard to deal with and difficult to set up. Now you have a choice: - Invest time and energy into moving the project to a scalable platform and hope you can catch up with its help later. - Keep iterating in the notebook inefficiently.

All of this while you are still not sure about the feasibility of the project.

Enter Hypergol

What if there is a tool that doesn’t need setup, creates a tailored project for you in no time and saves you from the chores of setting up?

No infrastructure: Just rent a very large instance from AWS/GCP/Azure (see posts on this blog for instructions).
No setup: all you need to do is: pip install hypergol.
No settings: autogenerate your project with a virtual environment, directories and shell scripts. No need to search on StackOverflow for rarely used shell commands

Pipelines

Once the setup is done, the next stage is processing data. Hypergol provides a “mini-MapReduce” system that combines a parallelisable and versioned storage format with a very simple multithreaded pipeline execution. All you need to do is define a simple function that streams through your input data and outputs the right objects. Everything else is taken care of even if your data is larger than machine memory. Is your data in a custom format? No problem, just override the right functions. Which are the right functions, you ask? No need to worry, just like setting up, pipelines are generated from template code with all the necessary scripts and abstract methods included.

Modelling

When you want to write a model, you only want to concentrate on the actual deep learning code. Hypergol provides a framework to stream the data from the output of a pipeline during training and stream the output data for future evaluation. It also prepares it for deployment. When the model’s input/output interface changes all the code will be in one place, so it is easy to edit in one go.

Deployment

Because the model input/output is taken care of in the previous step, deploying is easy. In fact, all the code is automatically generated, and you can exploit the benefits of packages like: pydantic, FastAPI, uvicorn and Swagger. The API self documents itself, even for complex hierarchical inputs. Easy to communicate your solution to the engineering team.

Hypergol is designed to show you all the relevant information that is helpful during a project. Many useful shell commands are in the README.md in a copy-paste-able format to speed you up and let you concentrate on the problem itself rather than searching for syntax on StackOverflow.

Please join our Discord server for any feedback and success stories!