Storage
Storage options
In any distributed system, processes must communicate with each other. Examples of this in the multinode
framework include functions returning outputs directly to parent processes, and jobs storing outputs of individual executions.
But sometimes, it is more appropriate to store these outputs in a data structure that is not tied to any single function call or any single job execution. In these situations, you can either use multinode
's in-built dict class, or you can use an external managed data store provider.
Multinode's dict
If you need a key-value store that can accessed by any process in the application, then the multinode
dict is the option of least resistance.
Here is an excellent example, taken from the section on scheduled tasks.
import multinode as mn
from fastapi import API
import uvicorn
# This dict is accessible from the scheduled task AND the service.
summaries_dict = mn.get_dict(name="tweet_summaries")
@mn.scheduled_task(period=timedelta(days=1))
def extract_summaries_from_last_day():
tweets = scrape_tweets()
for tweet in tweets:
summary = summarise(tweet)
summaries_dict[summaries.subject] = summary.content
app = FastAPI()
@app.get(/tweet_summaries)
def get_summary_of_most_recent_tweet(subject):
return summaries_dict[subject]
@mn.service(port=80)
def api():
unicorn.run(app, host="0.0.0.0", port=80)
This dict behaves very much like a Python dict
.
my_dict = mn.get_dict(name="name")
my_dict["a"] = 1
my_dict["b"] = 2
my_dict["b"] = 3
my_dict["c"] = 4
print(my_dict["c"]) # 4
del my_dict["a"]
print("a" in my_dict) # False
print("b" in my_dict) # True
for key in my_dict.keys():
print(key) # "b", "c"
print(my_dict.get("a")) # None
print(my_dict["a"]) # raises KeyError
It also supports certain atomic update operations.
my_dict["x"] = 4
my_dict["x"] += 2 # increments atomically
# Only updates if the current value is 6
my_dict.update("x", new_value=9, expected_current_value=6)
# Only sets a value if the key "y" does not currently exist
my_dict.set_if_absent("y", value=10)
External data stores
If the multinode
dict is insufficient for your use case, then consider using a managed storage solution from an external vendor.
- Postgres: Supabase, Amazon Aurora
- Mongo: MongoDB Atlas
- Kafka: Confluent
- Object storage: Amazon S3
When working with external data stores, remember to include the access credentials in the environment variables for your multinode
deployment.
Coming soon
In future, we may provide utilities to help you deploy some of the more popular managed data stores more seamlessly.