Core API#

In this section, we demonstrate the simplified version with sqlalchemy_mate to manipulate data using core API.

First, let’s define a table to get start, everything looks normal so far.

[1]:

import sqlalchemy as sa

metadata = sa.MetaData()

t_users = sa.Table(
    "users", metadata,
    sa.Column("id", sa.Integer, primary_key=True),
    sa.Column("name", sa.String, nullable=True)
)

# For syntax testing, you could use sqlite
# But you could see significant performance improvement in main stream
# sql database for bulk inserts
engine = sa.create_engine("sqlite:///:memory:")

metadata.create_all(engine)

Bulk insert and Count rows#

We want to insert 3 random user data into the database and do some basic query.

[2]:

import random

three_user_data_list = [
    dict(id=random.randint(1, 1000))
    for _ in range(3)
]

With sqlalchemy_mate

[3]:

import sqlalchemy_mate.api as sam

# do bulk insert
sam.inserting.smart_insert(engine, t_users, three_user_data_list)

# returns number of row in a table.
row_counts = sam.selecting.count_row(engine, t_users)
print(f"{row_counts = }")

# return one row by primary key values
user = sam.selecting.by_pk(engine=engine, table=t_users, id_=three_user_data_list[0]["id"])
print(f"{user = }")

row_counts = 3
user = (996, None)

Without sqlalchemy_mate

[4]:

with engine.connect() as connection:
    # let's have a fresh start first
    connection.execute(t_users.delete())
    connection.commit()

    # do bulk insert
    connection.execute(t_users.insert(), three_user_data_list)
    connection.commit()

    # returns number of row in a table.
    stmt = sa.select(sa.func.count()).select_from(t_users)
    row_counts = connection.execute(stmt).one()[0]
    print(f"{row_counts = }")
    # return one row by primary key values
    stmt = sa.select(t_users).where(t_users.c.id==three_user_data_list[0]["id"])
    user = connection.execute(stmt).one()
    print(f"{user = }")

row_counts = 3
user = (996, None)

Smart Single / Bulk Insert#

Now we already have 3 items in database, let’s try to insert 1,000 users to the table.

[5]:

thousands_user_data_list = [
    dict(id=id_)
    for id_ in range(1, 1000+1)
]

With sqlalchemy_mate

[6]:

import time

start_time = time.process_time()
# this is the smart insert API, only one line
op_count, ins_count = sam.inserting.smart_insert(engine=engine, table=t_users, data=thousands_user_data_list)
elapsed = time.process_time() - start_time
print(f"{op_count = }") # 60 bulk INSERT sql command fired to database
print(f"{ins_count = }") # 997 data inserted
print(f"{elapsed = :.6f}") # 0.020140 in local postgres database
row_counts = sam.selecting.count_row(engine, t_users)
print(f"{row_counts = }") # now we have 1000 rows

op_count = 57
ins_count = 997
elapsed = 0.019205
row_counts = 1000

Without sqlalchemy_mate

[7]:

# Core insert logic = 7 line
from sqlalchemy.exc import IntegrityError

with engine.connect() as connection:
    connection.execute(t_users.delete())
    connection.commit()

    ins = t_users.insert()
    connection.execute(ins, three_user_data_list)

    start_time = time.process_time()
    ins = t_users.insert()
    for user_data in thousands_user_data_list:
        try:
            connection.execute(ins, user_data)
            connection.commit()
        except IntegrityError:
            connection.rollback()
    elapsed = time.process_time() - start_time
    print(f"{elapsed = :.6f}") # 0.181163
    row_counts = connection.execute(sa.select(sa.func.count()).select_from(t_users)).one()[0]
    print(f"{row_counts = }")

elapsed = 0.181163
row_counts = 1000

sqlachemy_mate is significantly faster than native sqlalchemy. Because it smartly split big dataset into smaller pack, hence the total number of INSERT sql actually fired to database is greatly reduced. In this test case, sqlclhemy_mate is 10x faster with a Postgres DB on local, in real use case it could save more times because they are remote user.

Smart Single / Bulk Update#

A common update use case is to locate a row by primary key, and update non primary key fields.

With sqlalchemy_mate

[8]:

# update
# before, it is {"id": 1, "name": None}
user = sam.selecting.by_pk(engine, t_users, 1)
print(f"before: {user = }")

# do single update
user_data = dict(id=1, name="Alice")
sam.updating.update_all(engine=engine, table=t_users, data=user_data)

# after, it is {"id": 1, "name": "Alice"}
user = sam.selecting.by_pk(engine=engine, table=t_users, id_=1)
print(f"after: {user = }")

# do multiple update
user_data_list = [
    dict(id=1, name="Alice"),
    dict(id=2, name="Bob"),
    dict(id=3, name="Cathy"),
]
sam.updating.update_all(engine=engine, table=t_users, data=user_data_list)

before: user = (1, None)
after: user = (1, 'Alice')

[8]:

(3, 0)

Without sqlalchemy_mate

[9]:

# do single update
with engine.connect() as connection:
    connection.execute(t_users.update().where(t_users.c.id==1).values(name="Alice"))
    connection.commit()

# do multiple update
user_data_list = [
    dict(id=1, name="Alice"),
    dict(id=2, name="Bob"),
    dict(id=3, name="Cathy"),
]
with engine.connect() as connection:
    for user in user_data_list:
        stmt = t_users.update().where(t_users.c.id==user["id"]).values(**user)
        connection.execute(stmt)
        connection.commit()

Smart Single Bulk Upsert#

Upsert is a database dependent feature that not available in all sql system. upsert_all`() function made upsert generally available to all SQL system and super easy to use. Internally there’s an optimization that collect “to insert” items and bulk insert them fast.

With sqlalchemy_mate

[10]:

# prepare your data
user_data_list = [
    dict(id=999, name="Alice"),
    dict(id=1000, name="Bob"),
    dict(id=1001, name="Cathy"),
    dict(id=1002, name="David"),
]

# use ``upsert_all`` method
update_counter, insert_counter = sam.updating.upsert_all(engine=engine, table=t_users, data=user_data_list)
print(f"{update_counter = }")
print(f"{insert_counter = }")

update_counter = 2
insert_counter = 2

Selecting Shortcuts#

See selecting

Deleteing Short cuts#

See deleting

[ ]: