Hypergol: Arm yourself with the power of an entire Data Science team

Hypergol: Arm yourself with the power of an entire Data Science team

  • Run prototypes at scale and draw real conclusions
  • Generate your entire project repository with a single command
  • Use parallel processing without any infrastructure burden
  • Track your models with simplified MLOps procedures
  • Eliminate chores and focus on what you really want to do

Documentation Install from PyPI

Features

Hypergol achieves compounding acceleration by providing opinionated tools in all areas of the Data Science stack.

python -m hypergol.cli.create_data_model Article articleId:int:id publishDate:datetime "sentences:List[Sentence]" cat data_models/article.py from typing import List from datetime import datetime from hypergol import BaseData from data_models.sentence import Sentence class Article(BaseData):     def __init__(self, articleId: int, publishDate: datetime, sentences: List[Sentence]):         self.articleId = articleId         self.publishDate = publishDate         self.sentences = sentences     def get_id(self):         return (self.articleId, )     def to_data(self):         data = self.__dict__.copy()         data['publishDate'] = data['publishDate'].isoformat()         data['sentences'] = [v.to_data() for v in data['sentences']]         return data     @classmethod     def from_data(cls, data):         data['publishDate'] = datetime.fromisoformat(data['publishDate'])         data['sentences'] = [Sentence.from_data(v) for v in data['sentences']]         return cls(**data)

Consistent Data Modeling

Treat your data as first-class citizens. Handle everything in python classes. Autogenerate the code from the command line to avoid this tedious and error-prone task.

This simplifies:

  • Data pipeline I/O
  • Data storage
  • Model training and evaluation
  • Model deployment

Your data is the most utilised part of your project. Rather than defining it as columns in pandas dataframes or dictionaries that lack validation, create standardised and composable structures everywhere.

Learn More Install from PyPI

ls -al master/articles -rw-r--r-- 1 user staff    1202 26 Sep 13:24 articles.chk -rw-r--r-- 1 user staff    1924 26 Sep 13:15 articles.def -rw-r--r-- 1 user staff 2612430 26 Sep 13:17 articles_0.jsonl.gz -rw-r--r-- 1 user staff 2302538 26 Sep 13:17 articles_1.jsonl.gz -rw-r--r-- 1 user staff 2396773 26 Sep 13:16 articles_2.jsonl.gz ... cat master/articles/articles.chk {     "articles.def": "5307badc7cbc8eb1e08a371882c637b1a6c8657a",     "articles_0.jsonl.gz": "81c6d8508422e5a1c4e71d98c7dc0f6f3596928b",     "articles_1.jsonl.gz": "7e9ed9de5a89204162b169553cd1aa3fb63250a7",     "articles_2.jsonl.gz": "c015204d9a411236e4f8fca68908d2891a8b9b6a", ... cat master/articles/articles.def {     "branch": "master",     "chunkCount": 16,     "creationTime": "2020-09-26T13:15:28.221910",     "dataType": "Article",     "dependencies": [         {             "branch": "master",             "chkFileChecksum": "2b1cc2c7b5b061dd315b0bc06d305d24f1c0c9a3",             "chunkCount": 16,             "creationTime": "2020-09-26T13:12:49.851657",             "dataType": "ArticleText",             "name": "article_texts", ...     ],     "name": "articles",     "project": "example",     "repo": {         "branchName": "master",         "comitterEmail": "user@gmail.com",         "comitterName": "User User",         "commitHash": "91dc9e6a12e0ca9961499e9fc9d226d9ce57489d",         "commitMessage": "First commit\n"     }

Standardized Storage

Store all your data in the same way with the data model defined above by using Hypergol’s own storage format.

This enables:

  • Parallel processing
  • Data lineage
  • Simplified serialisation of composite structures

The storage format enables processing larger than machine memory data sizes and parallel processing by sharding. Hypergol uses the same SHA-1 checksum and compression git uses to create data lineage.

You will always know how your data was created and process it as efficiently as possible.

Learn More Install from PyPI

Effortless Pipeline Parallelisation

Effortless Pipeline Parallelisation

Parallelise your data processing with Hypergol’s own no-infra task scheduler. All you need is just a really large clound instance. Using its custom data format you are able to process huge datasets multithreaded just as easily as on a single thread.

No schedulers, no DAGs, no containers, no clusters. Just organised computing power.

Learn More Install from PyPI

Straightforward Deep Learning

Straightforward Deep Learning

Hypergol’s Deep Learning Framework will enable you to create models that are easily maintainable. Treat machine learning as programming by adapting a well-structured system.

This enables:

  • Extending the model architecture
  • Adding new features to the input
  • Retraining and deploying with ease

Hypergol’s batch processing classes enable seamless integration with Hypergol’s storage format to simplify training and evaluation.

Learn More Install from PyPI

Simple Model Deployment

Simple Model Deployment

Hypergol’s integrated framework enables deployment code generation with FastAPI and uvicorn.

Using your datamodel you will be able to generate typed APIs for your models automatically through pydantic.

Because FastAPI uses Swagger your code is immediately self-documented. Deploy your models in minutes!

Learn More Install from PyPI

Even More Features

  • Simple use of stored data with context managers in Jupyter Notebooks.
  • Convenience tools for data discovery in interactive environments.
  • Experiment versioning with git branches.
  • Data model conversion between different versions for schema evolution.
  • Generate code for everything:
    • virtual environment generating scripts
    • shell scripts to run pipelines with parameters
    • scripts to run unit tests
    • the unit tests themselves
    • scripts to run pylint
    • stubs for all Hypergol components

Accelerate yourself with Hypergol right now!

All you need to do is `pip install hypergol`

Install from PyPI Join on Discord

Instructions

pip install hypergol python3 -m hypergol.cli.create_project ProjectName Creating directory project_name. Creating directory project_name/data_models. ... Creating file project_name/README.md. Project ProjectName was created in directory project_name. deactivate cd project_name git init Initialized empty Git repository in ~/project_name/.git/ git add . git commit -m "first commit" On branch master Initial commit ... git remote add origin git@github.com:user_name/project_name.git ... git push -u origin master ... ./make_venv.sh source .venv/bin/activate ls README.md       models      requirements.txt    tasks data_models     pipelines   run_pylint.sh       tests make_venv.sh    pylintrc    run_tests.sh Hypergol project ready. Create your first data model class with: python3 -m hypergol.cli.create_data_model ClassName ...

Latest Posts

How to get notified if your instance is preempted on GCP?

Preemptable instances are cheaper than normal ones but can automatically disappear. To deal with it, you not only need to make sure your data pipeline saves your progress at regular checkpoints but also to get notified when this unfortunate event happens.

, by Laszlo Sragner

Starting on GCP from scratch

Step-by-step instruction on how to create an instance, install the right tools easily and start using Hypergol.

, by Laszlo Sragner

How to start on a new machine with a "Settings repo"

How to get your usual environment on a new VM in no time at all? Including scripts to start Jupyter notebook server, create virtual environments and git autocomplete in command line.

, by Laszlo Sragner