QuickStart Two
The previous QuickStart introduced configuration and mapping. This tutorial extends that information while working with a sharded data set. This tutorial also extends the mapping functionality to include list and object properties on the model class.
Sharded data introduces two complexities:
- How do I uniquely identify and locate a record, which might be on any shard?
- How do I manage data on one shard and related data on a foreign shard?
This walkthrough illustrates how both challenges can be met.
Create the Project
If you are following along at home with a new project, in Visual Studio create a new “ASP.NET Core Web Application” project. When prompted, select the “API” project type. Once the solution is created, open your dependencies and add the following NuGet packages:
- ArgentSea.Sql or ArgentSea.Pg - for SQL Sever or PostgreSQL databases respectively
- Swashbuckle.Aspnetcore - for Swagger and for invoking the API without creating a client
To follow a standard convention for an MVC project, create folders for Models, InputModels, and Stores (or “Repositories” if you prefer).
The Sample Data
Whether you simply downloaded the walkthrough or a creating a new project, you will need to create some sample shards.
Our sample application is going to track Customers. The data set is completely made-up and not especially realistic. Customers can have multiple Locations (1:∞). Customers can also have Contacts, but the Contacts can belong to more than one Customer (∞:∞). The Contact may not exist in the same shard as the Customer.
Note
The data is set up this way to illustrate one of the difficulties with sharded data: managing relationships between records that exist on different shards. In this case, a Customer may be associated with a Contact on any shard. Managing this — an managing this efficiently — adds substantial complexity.
In our sample model, the ∞:∞ relationship between a Contact and a Customer is managed by a linking table, CustomerContacts. Given the id of a local Customer, the table lists the keys of the associated Contacts. What happens if instead I have the id of a Contact and want to find all of the Customers? With only that table, it would be necessary to query every shard to determine whether they have a related Customer!
To better optimize this, the data model creates a second ContactCustomers linking table. With this approach, if I have a Contact and want to find Customers (including those in foreign shards), or Customers and want to find Contacts (again, even those not local), I can use the appropriate linking table and query only the shards with relevant records data. The downside is that when the Customer and Contact are on different shards, this relationship must be managed in different tables on different databases.
In a real implementation, it might be beneficial to include additional information in the linking tables, like including the contact name within the CustomerContact table for example, so that common lookups do not require secondary lookups from related shards just to get the contact name. The QuickStart example doesn’t do this so that it can illustrate performing these secondary lookups.
Naturally, deleting a Customer means removing the ContactCustomers link from each Contact also, which could be on any shard. This shard list must be retrieved from the CustomerContacts list before it is deleted. Updating a Customer’s Contacts could also impact multiple shards in two ways: a removed foreign-shard Contact must be deleted and a additional foreign-shard Contacts must be added.
The SQL for the sample data is found in the GitHub source repository, at https://github.com/argentsea/quickstarts/tree/master/QuickStart2.Sql/QuickStart2.Sql/SqlSetup.
The first SQL script to execute is ServerSetup.sql, which creates four databases and two logins.
Caution
The logins contain weak passwords (that are published on the Internet), so you might consider changing them; on the other hand, these will only have permission to execute procedures or functions in a specific namespace, so it’s not a big risk.
Note
Conceptually, these four databases would correspond to regional data stores in the United States, Brazil, Europe, and China. In my real global application I would replicate the data from each region to each other region. Therefore, each region would have one writable data store and three readable ones. In this way, the “local” shard is writeable, the “foreign” shards are read-only. Writes to a foreign shard must be done across the WAN. Your implementation may vary.
Our walkthrough, however, only needs four simple databases on one server. We’ll only imagine the rest.
Next, after the databases have been created, connect to each database in turn and run the ShardSetup.sql SQL script. This will created the schemas, tables, stored procedures, reference data, etc. for our sample databases.
Finally, run the shard-specific SQL scripts — ShardUS.sql, ShardBR.sql, ShardUS.sql, ShardUS.sql — within their respective databases. This will load the shard-specific sample data.
At this point, we should have four database with identical table structures. SQL Server instances should also have a identical set of stored procedures.
The Shard Set Record Key
Each of the four databases needs an identifier. ArgentSea uses a short
(Int32/tinyint) to identify each shard, and a generic type as a record id. The combination of shard id and record id becomes a “virtual compound key”, which is called a ShardKey. A thorough discussion of the options and impact is here.
Some tables may require a compound key themselves. For these, ArgentSea offers ShardKeys with generic overloads, which may consist of a shard id, record id, child id, grandchild id, and even great grandchild id! Because all the the record and child id types are generic, they can accommodate most data column types.
Configuring Connections
Because sharded data may require a large number of data connections, ArgentSea offers a more flexible way of managing this than by using connection strings. ArgentSea offers the “Hereditary Configuration Hierarchy”. This allows you to set an attribute at the parent level and all children will inherit this value, unless overwritten by a child. A more thorough discussion is here.
In our sample application, we can use the same server or host for all connections; each shard connection only changes the database name. (In a production deployment, the configuration might be exactly backwards: the databases have identical names, but each is on a different host). We also want to use the webWriter user for write connections and webReader for read connections.
So the configuration settings looks like this (with annotations):
{
"SqlShardSets": [¹
{
"ShardSetName": "Customers",²
"DataSource": ".",³
"DefaultShardId": 1,⁴
"Write": {⁵
"UserName": "webWriter",
"Password": "Pwd567890",
},
"Read": {⁵
"ApplicationIntent": "ReadOnly",
"UserName": "webReader",
"Password": "Pwd123456"
},
"Shards": [⁶
{
"ShardId": 1,⁷
"InitialCatalog": "CustomerShardUS"⁸
},
{
"ShardId": 2,⁷
"InitialCatalog": "CustomerShardEU"⁸
},
{
"ShardId": 3,⁷
"InitialCatalog": "CustomerShardBR"⁸
},
{
"ShardId": 4,⁷
"InitialCatalog": "CustomerShardZH"⁸
}
]
}
]
}
Annotations
¹ SqlShardSets
is the root JSON section for all the shard configuration metadata. It contains an array of shard sets.
² ShardSetName
is a required key for this specific shard set. Multiple shard sets are possible and each will be identified by this key. This value must exactly match the value used in your code to invoke this shard set.
³ DataSourceName
is a connection attribute. Connection attributes can appear anywhere in the hierarchy. Because it appears at the “shard set” level, all shards in the shard set will inherit this server name.
⁴ DefaultShardId
this setting determins which shard is presented as the ShardSet.DefaultShard
. This is useful for determining which shard should be actively accepting new records for this client.
⁵ Read
and Write
are peculiar, and optional, members of shard set’s “inheritance” chain, as their children are indirect. Any attributes defined in the shard set’s Write
section apply only to write connections. Likewise, for Read
connections. These values can be overwritten by shard or connection attributes.
⁶ Shards
is an array of shard connections, one for each shard in the shard set.
⁷ ShardId
is a required identifier for the shard. This value is essential for finding and identifying a sharded record. It cannot be duplicated within a shard set. The value must be a number.
⁸ InitialCatalog
is a connection attribute. Because it appears at the shard level, both read connections and write connections for this shard will inherit this value.
This hierarchy, then, defines a server name once, to be used for the entire shard set. The read and write logins are also defined once, to be used by all read or write connections in the shard set. Each shard has a distinct database name. ArgentSea can build read and write connections to each data store without the need to configure any of this data redundantly — the login, server name, and database names are each managed only once.
When you save this configuration to project’s appsettings file, be sure to update the JSON to the appropriate server references.
You might consider moving the login password information to the UserSecrets store, which is a best practice. Simply remove the password entries from the appsettings.json hierarchy and add them to the usersecrets.json file. Ideally, the password should also be changed to a different value.
User Secrets Entry
{
"SqlShardSets": [
{
"Write": {
"Password": "Pwd567890",
},
"Read": {
"Password": "Pwd123456"
}
}
]
}
Warning
The configuration arrays in appsettings.json and usersecrets.json will not match if they do not appear in exactly the same order. In this sample, we have only one shard set and the passwords are not in the Shards
array, so this is not a concern.
Creating the Models
The process of creating a model class was introduced in Quickstart 1. Essentially, it simply requires adding attributes to properties, which the Mapper can then use. This QuickStart adds four new wrinkles: shard keys, object properties, list properties, and inheritance.
The complete code is on GitHub, at https://github.com/argentsea/quickstarts/tree/master/QuickStart2.Sql/QuickStart2.Sql for SQL Server, or https://github.com/argentsea/quickstarts/tree/master/QuickStart2.Pg/QuickStart2.Pg for PostgreSQL. The classes and SQL you need are located there; it is not fully reproduced here.
To use the attributes, each Model class should include a using ArgentSea.Sql
statement.
using ArgentSea.Sql;
Advanced Model Mapping
The previous walthrough demonstrated mapping to standard .NET types like strings, numbers, and dates. This walkthrough illustrates mapping to objects, lists, and child classes.
Properties with Object Types
Our data contains Location data with latitude and longitude values. Generally, these are usually managed as a value pair. Geographic functions would likely expect a single geographic coordinates argument, rather than the two separate values. It would be handy to map data directly to/from a coordinates class, which would be a property of the Location class.
That is exactly what the MapToModel
attribute does. This attribute tells the mapper that the property is a child object that also has properties to be included in the mapping.
// The coordinates class:
public class CoordinatesModel
{
[MapToSqlFloat("Latitude")]
public double Latitude { get; set; }
[MapToSqlFloat("Longitude")]
public double Longitude { get; set; }
}
// The location, which contains the coordinates class as a property:
public class LocationModel
{
//include other properties here...
[MapToModel]
public CoordinatesModel Coordinates { get; set; }
}
If the Coordinates property is null, the Mapper will instantiate an instance before setting the properties. Of course, the CoordinatesModel must have a default constructor and the property must be settable. If you want to make the property read-only, just make sure that the Coordinates object exists:
[MapToModel]
public CoordinatesModel Coordinates { get; } = new CoordinatesModel();
Properties with List Types
One of the most expensive activities an application can do is reach out to another server. Our high-performance application should do everything possible to minimize database server round-trips. This means getting all the data necessary to populate our Customer model in a single request. The ArgentSea Mapper can automatically handle multiple results from a single request.
Our Customer can have any number of Locations. Our Customer can also have any number of Contacts. Our query returns the Customer, Location and Contact information in a single round-trip.
Tip
The base Customer record could be returned in either output parameters or in a single-row SELECT result. The first would use the Mapper’s MapOutput
method, the other requires the MapReader
method; both would use the data reader to handle list properties.
To map the multiple data reader results to the Model, we tell the Mapper the order of the results when we fetch:
var prms = new QueryParameterCollection()
.AddSqlIntInputParameter("@CustomerId", customerKey.RecordId)
.CreateOutputParameters<CustomerModel>(_logger);
return await _shardSet[customerKey].Read.MapOutputAsync<CustomerModel, LocationModel, ContactModel>(Queries.CustomerGet, prms, cancellation);
The CustomerModel
type in the first generic position tells the Mapper that that is the base object type. The Mapper will automatically create a new instance of the CustomerModel type and populate its properties from the query’s output parameters.
The LocationModel
type in the next generic position indicates that the first data reader result contains this type. The Mapper will build a list of locations, find a property of type List<LocationModel>
or IList<LocationModel>
, and set the property to the list object.
Likewise, the third generic argument tells the Mapper that the next data reader result is a list of Contacts, which the Mapper will use to populate the Contacts property.
Only a few lines of code are all that is required to manage this complex result.
Model Inheritance
The sample application has a CustomerListItem
Model, which contains a record key and a customer name property. The CustomerModel
include those same values, plus some others. By inheriting from first Model, the CustomerModel
inherits the key and an customer name, including their mapping attributes.
Because database queries often return subsets of entity columns, this object inheritance technique is useful in allowing the mapping attributes to be defined only once.
The ShardKey
The final object type which may combine multiple data records is the ShardKey type. This type is described in detail here.
[MapShardKey('c', "@CustomerId")]
[MapToSqlInt("@CustomerId")]
public ShardKey<int> CustomerKey { get; set; }
Note
The ShardKey in this example does not specify a ShardId data mapping. Because the client knows the ShardId, ArgentSea will populate the ShardId value from this configuration data. If you provide a ShardId mapping (and include the shardid argument in the MapShardKey
attribute), ArgentSea will understand that you want to use the data value instead.
The ShardKey type supports table compound keys in your sharded data. In this sample application, the Customer Location records are identified by a compound key including both a CustomerId and LocationId.
[MapShardKey('L', "CustomerId", "LocationId")]
[MapToSqlInt("CustomerId")]
[MapToSqlSmallInt("LocationId")]
public ShardKey<int, int> CustomerLocationKey { get; set; }
It would be possible for ArgentSea to include a “ShardGrandChild” struct, for three-column compound keys, (or even a “ShardGreatGrandChild”) but the need for this hasn’t arisen.
Loading the Shard Service
Loading and injecting the ArgentSea ShardSets service is identical to the Databases services explained in the last tutorial.
Calling the ArgentSea AddSqlService
extension method will load both the ArgentSea sharding and database services.
using ArgentSea.Sql;
...
public void ConfigureServices(IServiceCollection services)
{
...
services.AddSqlServices(this.Configuration);
...
}
Note
The ShardSets
services implicitly loads the Databases
service also. Of course, if no databases are configured, the collection will be empty.
Queries and Data
Our implementation adds two static classes which help describe our data. The first is a simple list of constants, which correspond to the “origin” parameter of ShardKey
object. the “Origin” helps prevent accidental use of, say, an Inventory key to delete an Order record. By using constants, you cam more explicitly distinguish the “c” used for Contact data from the “C” used for Customer data, which might otherwise be confusing.
The second static class consolodates query definitions, as described in the Creating SQL Queries tutorial. This serves two purposes: first, it becomes easy to determine which queries are actually used by the code (on large projects, this can be difficult). Also, the optional parameter list can limit the parameters that are set when Model attributes have more parameters than the query requires.
The code for these are in the DataOrigins
and Queries
classes respectively. The code is self explainitory.
The Repository Pattern
Our sample code uses a CustomerStore
class, which implements all of the actual data access logic. The theory of the repository pattern is help contain the coupling between the data layer and application logic. We could theorectically replace the data store by only changing the CustomerStore
implementation. Because this is a web service that provides data access, the virtue of this approach is not compelling in our sample. Your milage may vary.
Because our web service does very little except read and write data, there is actually very little for the controller to do. You will find nearly all of the ArgentSea implemetation in the CustomerStore
class.
Note
The code in the CustomerStore
would have been even simpler if there was no relationship between Customers and Contacts. The sample is intended to illustrate handling a challenging sharding scenario, but without this relationship the code would have been half as long.
This class is injected into the controller, so it need to be a registered as a service at startup.
services.AddTransient<Stores.CustomerStore>();
The injectable CustomerStore
class in turn uses the injected ShardSets
service. Obtaining the injectible ShardSets
service in your repository class is straightforward:
public class CustomerStore
{
private readonly ShardSets.ShardSet _shardSet;
private readonly ILogger<CustomerStore> _logger;
public CustomerStore(ShardSets shardSets, ILogger<CustomerStore> logger)
{
_shardSet = shardSets["Customers"];
_logger = logger;
}
....
If you elected to use the ArgentSea shard sets collection instead, things are just a little more verbose:
public class CustomerStore
{
private readonly SqlShardSets<byte>.ShardSet _shardSet;
private readonly ILogger<CustomerStore> _logger;
public CustomerStore(SqlShardSets<byte> shardSets, ILogger<CustomerStore> logger)
{
_shardSet = shardSets["Customers"];
_logger = logger;
}
....
Using a ShardSet
instance, you generally simply provide a query, set parameters, and invoke a method. You can query across all shards, some shards, or within a single shard instance. Querying across all or some shards can return all results in a unified list or the first valid result.
The code necessary to query all shards is very simple:
public async Task<CustomerModel> FindByLogonName(string loginName, CancellationToken cancellation)
{
var prms = new ParameterCollection()
.AddSqlNVarCharInputParameter("@LoginName", loginName, 255)
.CreateOutputParameters<CustomerModel>(_logger);
return await _shardSet.ReadFirst.MapOutputAsync<CustomerModel, LocationModel, ContactListItem>(Queries.CustomerFind, prms, cancellation);
}
Optimizing Multi-Record Saves
The principal means of optimizing data access is to limit the number of round-trips to the database server. If the need to save ten records generates ten distinct calls to the database, the solution will not be very efficient. Unfortuantely, there is no standard way of handling multi-record saves; fortunately, there are platform-specific ways of managing it.
SQL Server uses Table Valued Parameters to save multiple records. Our sample data saves a Customer with multiple Locations and multiple Contacts. These are both passed to Table Valued Parameters. This also allows the related records to be managed in a single internally-managed (low oeverhead) transaction.
Table Valued Parameters require a “User Defined Type”, which defines the column names and types for each row. ArgentSea offers a SQL Server-specific Mapper which converts the metadata attrributes to the correct format for this paramteter. (Like its siblings, it also uses expression trees to to compile a high-performance solution when it is initially run). In our sample implementation, the combined mapping attributes created more columns than the User Defined Type required. To solve this, we can simply provide a list or array of names; then, only those are used.
var prms = new ParameterCollection()
.CreateInputParameters(customer, _logger)
.AddSqlTableValuedParameter("@Locations", customer.Locations, customerLocationTypeColumns, _logger)
.AddSqlTableValuedParameter<ContactListItem, byte, int>("@Contacts", customer.Contacts, "ShardId", System.Data.SqlDbType.TinyInt, "RecordId", System.Data.SqlDbType.Int);
In this example, there are two AddSqlTableValuedParameter
overloads, the first uses a Model and maps to the Unser Defined Type using mapping attributes; the second example maps only the key values, using the column names supplied.
You can learn more in the section on Multi-Record Saves.
Using Swagger
when you launch the web API project, it will open to thw Swagger UI by default. If you are creating a new project, open project properties, go to the Debug tab, then change the Launch browser: text value to “swagger”.
The first GET method returns all customers, across all shards (you don't want to do this in the real world).
You can selected any ShardKey in the resulting list and provide that to the other GET method, which takes a ShardKey string argument. This methods returns a complex JSON result with extended customer detail.
You can edit the customer detail and provide that to the PATCH method to update the database values.
The POST method allows you to create a new customer. If successful, it returns the ShardKey of the created record. To create a new customer you can provide the following JSON:
{
"name": "New Customer",
"type": "WalkIn",
"locations": [
{
"type": "RetailStore",
"streetAddress": "123 Main Street",
"locality": "Chicago",
"region": "IL",
"postalCode": "60612",
"iso3166": "us",
"coordinates": {
"latitude": 41.867789,
"longitude": -87.675839
}
},
{
"type": "RetailStore",
"streetAddress": "456 Oak Avenue",
"locality": "Dallas",
"region": "TX",
"postalCode": "75211",
"iso3166": "us",
"coordinates": {
"latitude": 32.730430,
"longitude": -87.675839
}
}
],
"contacts": [
{
"origin": "c",
"shardId": 1,
"recordId": 7
},
{
"origin": "c",
"shardId": 2,
"recordId": 4
},
{
"origin": "c",
"shardId": 1,
"recordId": 8
}
]
}
Finally, you can DELETE a customer by providing a ShardKey.