Spark DDL

CREATE TABLE

To create an MixedFormat table under an Amoro Catalog, you can use using mixed_iceberg or using mixed_hive to specify the provider in the CREATE TABLE statement. If the Catalog type is Hive, the created table will be a Hive-compatible table.

CREATE TABLE mixed_catalog.db.sample (
    id bigint  COMMENT "unique id",
    data string
) USING mixed_iceberg

PRIMARY KEY

You can use PRIMARY KEY in the CREATE TABLE statement to specify the primary key column. MixedFormat ensures the uniqueness of the primary key column through MOR (Merge on Read) and Self-Optimizing.

CREATE TABLE mixed_catalog.db.sample (
    id bigint  COMMENT "unique id",
    data string ,
    PRIMARY KEY (id)
) USING mixed_iceberg

PARTITIONED BY

Using PARTITIONED BY in the CREATE TABLE statement to create a table with partition spec.

CREATE TABLE mixed_catalog.db.sample (
    id bigint,
    data string,
    category string)
USING mixed_iceberg
PARTITIONED BY (category)

In the PARTITIONED BY clause, you can define partition expressions, and Mixed-Iceberg format supports all partition expressions in Iceberg.

CREATE TABLE mixed_catalog.db.sample (
    id bigint,
    data string,
    category string,
    ts timestamp, 
    PRIMARY KEY (id) )
USING mixed_iceberg
PARTITIONED BY (bucket(16, id), days(ts), category)

Supported transformations are:

years(ts): partition by year
months(ts): partition by month
days(ts) or date(ts): equivalent to dating partitioning
hours(ts) or date_hour(ts): equivalent to dating and hour partitioning
bucket(N, col): partition by hashed value mod N buckets

truncate(L, col): partition by value truncated to L

Strings are truncated to the given length

Integers and longs truncate to bins: truncate(10, i) produces partitions 0, 10, 20, 30, …

Mixed-Hive format doesn’t support transform.

CREATE TABLE … AS SELECT

CREATE TABLE mixed_catalog.db.sample
USING mixed_iceberg
AS SELECT ...

The CREATE TABLE ... AS SELECT syntax is used to create a table and write the query results to the table. Primary keys, partitions, and properties are not inherited from the source table and need to be configured separately.

You can enable uniqueness check for the primary key in the source table by setting set spark.sql.mixed-format.check-source-data-uniqueness.enabled = true in Spark SQL. If there are duplicate primary keys, an error will be raised during the write operation.

You can use the following syntax to create a table with primary keys, partitions, and properties:

CREATE TABLE mixed_catalog.db.sample
PRIMARY KEY(id) USING mixed_iceberg 
PARTITIONED BY (pt)  
TBLPROPERTIES (''prop1''=''val1'', ''prop2''=''val2'')
AS SELECT ...

In the current version, CREATE TABLE ... AS SELECT does not provide atomicity guarantees.

CREATE TABLE … LIKE

The CREATE TABLE ... LIKE syntax copies the structure of a table, including primary keys and partitions, to a new table, but it does not copy the data.

USE mixed_catalog;
CREATE TABLE db.sample
LIKE db.sample2
USING mixed_iceberg
TBLPROPERTIES ('owner'='xxxx');

Since PRIMARY KEY is not a standard Spark syntax, if the source table is a MixedFormat table with primary keys, the new table can copy the schema information with the primary keys. Otherwise, only schema could be copied.

Create Table Like only supports the binary form of db.table and in the same catalog

REPLACE TABLE … AS SELECT

The REPLACE TABLE ... AS SELECT syntax only supports tables without primary keys in the current version.

REPLACE TABLE mixed_catalog.db.sample
USING mixed_iceberg
AS SELECT ...

In the current version, REPLACE TABLE ... AS SELECT does not provide atomicity guarantees.

DROP TABLE

DROP TABLE mixed_catalog.db.sample;

TRUNCATE TABLE

The TRUNCATE TABLE statement could delete all data in the table.

TRUNCATE TABLE mixed_catalog.db.sample;

ALTER TABLE

The ALTER TABLE syntax supported by Mixed-Format includes:

ALTER TABLE … SET TBLPROPERTIES
ALTER TABLE … ADD COLUMN
ALTER TABLE … RENAME COLUMN
ALTER TABLE … ALTER COLUMN
ALTER TABLE … DROP COLUMN
ALTER TABLE … DROP PARTITION

ALTER TABLE … SET TBLPROPERTIES

ALTER TABLE mixed_catalog.db.sample SET TBLPROPERTIES (
    'read.split.target-size'='268435456'
);

Using UNSET to remove properties:

ALTER TABLE mixed_catalog.db.sample UNSET TBLPROPERTIES ('read.split.target-size');

ALTER TABLE … ADD COLUMN

ALTER TABLE mixed_catalog.db.sample
ADD COLUMNS (
    new_column string comment 'new_column docs'
  );

You can add multiple columns at once, separated by commas.

-- create a struct column
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN point struct<x: double, y: double>;

-- add a field to the struct
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN point.z double;

-- create a nested array column of struct
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN points array<struct<x: double, y: double>>;

-- add a field to the struct within an array. Using keyword 'element' to access the array's element column.
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN points.element.z double;

-- create a map column of struct key and struct value
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN points map<struct<x: int>, struct<a: int>>;

-- add a field to the value struct in a map. Using keyword 'value' to access the map's value column.
ALTER TABLE mixed_catalog.db.sample
ADD COLUMN points.value.b int;

You can add columns at any position by using the FIRST or AFTER clause.

ALTER TABLE mixed_catalog.db.sample
ADD COLUMN new_column bigint AFTER other_column;

ALTER TABLEmixed_catalog.db.sample
ADD COLUMN nested.new_column bigint FIRST;

ALTER TABLE … RENAME COLUMN

ALTER TABLE mixed_catalog.db.sample RENAME COLUMN data TO payload;

ALTER TABLE … ALTER COLUMN

"ALTER COLUMN` can be used to widen types, make fields nullable, set comments, and reorder fields.

ALTER TABLE mixed_catalog.db.sample ALTER COLUMN measurement TYPE double;

To add or remove columns from a structure, use ADD COLUMN or DROP COLUMN with nested column names.

Column comments can also be updated using ALTER COLUMN.

ALTER TABLE mixed_catalog.db.sample ALTER COLUMN measurement TYPE double COMMENT 'unit is bytes per second';
ALTER TABLE mixed_catalog.db.sample ALTER COLUMN measurement COMMENT 'unit is kilobytes per second';

You can use the FIRST and AFTER clauses to reorder top-level or nested columns within a structure.

ALTER TABLE mixed_catalog.db.sample ALTER COLUMN col FIRST;

ALTER TABLE mixed_catalog.db.sample ALTER COLUMN nested.col AFTER other_col;

ALTER TABLE … DROP COLUMN

ALTER TABLE mixed_catalog.db.sample DROP COLUMN id;
ALTER TABLE mixed_catalog.db.sample DROP COLUMN point.z;

ALTER TABLE … DROP PARTITION

ALTER TABLE mixed_catalog.db.sample DROP IF EXISTS PARTITION (dt=2022);

DESC TABLE

DESCRIBE TABLE returns basic metadata information about a table, including the primary key information for tables that have a primary key

 { DESC | DESCRIBE }  TABLE  mixed_catalog.db.sample;