Seeding data in EF Core using SQL scripts
Image from Pexels by Pixabay

Seeding data in EF Core using SQL scripts

Another way of seeding data in EF Core

Not so while ago I wrote an article on Seeding data with Entity Framework Core using migrations. This approach relies on EF Core migrations to ensure seeding of specific data is done only once using __EFMigrationsHistory table to track structural migrations as well as data seeding migrations.

In large number of cases this approach works just fine and you do not need to do any adjustments or add any additional tables, just use migrations mechanism out of the box. However in some corner case, when you are changing the table structure of the tables which are also involved in seeding process you may end up in compilation errors because you are referencing table columns in your seeding which may not exist anymore.

Another type of error that may occur due to table structure changes is when you try to execute migrations from the scratch, to an empty database. This does not cause compilation error, but rather runtime because your migration that seeds may try to set some columns that may be created in the table in the migration that executes later. Unfortunately, you cannot skip some columns when mapping entities to tables and this is where this approach becomes a bit hard to maintain.

Another approach which I also used to overcome these issues is using good old plain SQL script files to seed the data. This allows you to separate completely data structure setup (migrations) and data seeding. This will ensure that all structural changes are performed before any data is seeded.

Separating migrations and seeding

The biggest problem with approach described in Seeding data with Entity Framework Core using migrations article is that data structure migrations and seeding migrations are executed at the same time simultaneously. This causes some of the seeding migrations to fail due to structure changes either during compilation or runtime. The biggest problem is that EF Core does not let you target only specific columns, but rather tries to map all columns to the entity model class.

On a good side you have EF Core migrations mechanism making sure your seeding migration executes only once by adding a record to __EFMigrationsHistory table.

I'll use a simple solution which you can find on Github to demonstrate SQL scripts approach.

Note

To avoid unnecessary complexity I did not include application layer (with services) and data layer (repositories), but you should consider adding these projects to your real world application solution

If you check the solution, you will see that comparing to the previously mentioned approach, migration classes and SQL scripts are separated completely into separate folders under data infrastructure project.

Solution Explorer

What we are loosing with this approach is out of the box mechanism to take care of single seeding script execution which we have using migrations. Luckily this is not hard to replicate and we'll go through it in the rest of the article.

Tracking of executed seeding scripts

EF Core migration keeps tracking of executed migrations in __EFMigrationsHistory by adding a new record for every migration file that was successfully executed. This is quite straight forward and it can be easily replicated.

I choose to name the table for seeding history track __SeedingHistory. It is created by simply using a migration in EF Core like any other table, but I added some small differences in dbContext to differentiate it from other entities that are used for application logic, but let's start with simple things first.

The sample application which is attached to this article as a Github repository is keeping records of the employees, Nothing complicated, just basic things.

namespace Sample.Seeding.Domain
{
    public  class Employee
    {
        public Guid Id { get; set; }
        public String FirstName { get; set; }
        public String LastName { get; set; }
        public String Email { get; set; }
        public String PhoneNumber { get; set; }
    }
}
    

To have things clean and separated, instead of having configuration in the dbContext, I use a separate configuration class to define keys, indexes and restrictions for this entity model.

namespace Sample.Seeding.Data.Infrastructure.Configurations
{
    class EmployeeConfiguration: IEntityTypeConfiguration<Employee>
    {

        public void Configure(EntityTypeBuilder<Employee> builder)
        {
            builder.ToTable("Employees");
            builder.HasKey(e => e.Id);
            builder.Property(e => e.FirstName).IsRequired();
            builder.Property(e => e.LastName).IsRequired();
            builder.Property(e => e.Email).IsRequired();
        }
    }
}
    

Now that we have our application data that is taking care of the actual domain application is attempting to solve, we can add entities that are not directly related to domain and that entity is the one representing the record of seeding SQL script executed during seeding.

namespace Sample.Seeding.Data.Infrastructure.Entities
{
    class SeedingEntry
    {
        public String Name { get; set; }
    }
}
    
Note

SeedingEntry entity is not sitting alongside Employee entity model. This is because seeding has nothing to do with application logic, but rather with bootstrap of the underlying data infrastructure

Same way we have separate configuration for our domain entity model, we have separate configuration for our data seeding record entity.

namespace Sample.Seeding.Data.Infrastructure.Configurations
{
    class SeedingEntryConfiguration : IEntityTypeConfiguration<SeedingEntry>
    {
        public void Configure(EntityTypeBuilder<SeedingEntry> builder)
        {
            builder.ToTable("__SeedingHistory");
            builder.HasKey(s => s.Name);
        }
    }
}
    

We only need to store the executed script name, but you can also add additional info like date time stamp of auto increment id. For the simplicity I only store name of the script file which is the only essential info to make sure script is executed only once.

Entities need to be added to our DbConext class, but there is a slight difference how we are adding properties for these two entity collections.

namespace Sample.Seeding.Data.Infrastructure
{
    public class EmployeesDatabaseContext : DbContext
    {
        public EmployeesDatabaseContext() : base()
        {

        }

        public EmployeesDatabaseContext(DbContextOptions<EmployeesDatabaseContext> options) : base(options)
        {
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.ApplyConfiguration(new SeedingEntryConfiguration());
            modelBuilder.ApplyConfiguration(new EmployeeConfiguration());
        }

        internal virtual DbSet<SeedingEntry> SeedingEntries { get; set; }
        public virtual DbSet<Employee> Employees { get; set; }
    }
}
    

Since we do not care about the seeding outside of the infrastructure layers, no real reason to expose it outside and therefore it is internal property unlike public for domain entity models.

Before we create our tables with migrations we need to implement IDesignTimeDbContextFactory<EmployeesDatabaseContext> to be able to run migrations from the EF Core CLI global tool.

namespace Sample.Seeding.Data.Infrastructure
{
    public class EmployeesDatabaseContextFactory : IDesignTimeDbContextFactory<EmployeesDatabaseContext>
    {

        public EmployeesDatabaseContext CreateDbContext(string[] args)
        {
            var dbContext = new EmployeesDatabaseContext(new DbContextOptionsBuilder<EmployeesDatabaseContext>().UseSqlServer(
                 new ConfigurationBuilder()
                     .AddJsonFile(Path.Combine(Directory.GetCurrentDirectory(), $"appsettings.json"))
                     .AddEnvironmentVariables()
                     .Build()
                     .GetConnectionString(DbContextConfigConstants.DB_CONNECTION_CONFIG_NAME)
                 ).Options);
            dbContext.Database.Migrate();
            return dbContext;
        }
    }
}
    

We also need to include appsettings.json to the infrastructure level to supply the connection string which will be used when migrations are run from the CLI command line

{
  "ConnectionStrings": {
    "EmployeesDatabase": "data source=DEJAN-PC;initial catalog=EMP;user id=sa;password=sa;MultipleActiveResultSets=True;PersistSecurityInfo=true;"
  }
}

    
Note

EF Core 3.1 migrations are moved to CLI global tool which you need to install first. You can easily do it from Windows cmd/PowerShell or on Linux from the bash using the following command "dotnet tool install dotnet-ef --global --ignore-failed-sources"

We are now all set to have our tables created from EF Core command line tool

# Add seeding table migration
dotnet ef migrations add Add_Seeding_Tracking -o Migrations -c EmployeesDatabaseContext

# Add domain model table migration
dotnet ef migrations add Add_Employees -o Migrations -c EmployeesDatabaseContext

# Update database
dotnet ef database update -c EmployeesDatabaseContext

    

Executing the seeding scripts

Now the fun part. We need to execute our scripts which will seed the data.

The scripts can be easily supplied as embedded files which means they will be bundled inside our complied dll file. In .NET Core this can be simply done by adding the file to the project and then registering it as embedded resource in .csproj file.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <None Remove="Seedings\20200906220610_Add_Initial_Employees.sql" />
    <None Remove="Seedings\20200906220913_Add_Employees.sql" />
  </ItemGroup>

  <ItemGroup>
    <EmbeddedResource Include="Seedings\20200906220913_Add_Employees.sql" />
    <EmbeddedResource Include="Seedings\20200906220610_Add_Initial_Employees.sql" />
  </ItemGroup>

  <ItemGroup>
    <Folder Include="Migrations\" />
  </ItemGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.AspNetCore.Http.Abstractions" Version="2.2.0" />
    <PackageReference Include="Microsoft.Extensions.Configuration" Version="3.1.5" />
    <PackageReference Include="Microsoft.Extensions.Configuration.EnvironmentVariables" Version="3.1.5" />
    <PackageReference Include="Microsoft.Extensions.Configuration.FileExtensions" Version="3.1.5" />
    <PackageReference Include="Microsoft.Extensions.Configuration.Json" Version="3.1.5" />
    <PackageReference Include="Microsoft.EntityFrameworkCore" Version="3.1.5" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.AutoHistory" Version="3.1.1" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="3.1.5" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Design" Version="3.1.5">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
  </ItemGroup>

  <ItemGroup>
    <ProjectReference Include="..\Sample.Seeding.Domain\Sample.Seeding.Domain.csproj" />
  </ItemGroup>

</Project>

    

I added two migration SQL script files for the demonstration purposes.

You can note that filenames start with date and time in yyyyMMddHHmmss format. This is the convention that I used to know the order in which files need to execute.

To make it easy to use I made seeding as an extension method IApplicationBuilder so it can be invoked from the ASP.NET Core pipeline easily.

namespace Sample.Seeding.Data.Infrastructure.Extensions
{
public static class DbContextExtensions
{
public static void AddEmployeesDbContext(this IServiceCollection services, IConfiguration configuration)
{
services.AddDbContext<EmployeesDatabaseContext>(options =>
{
options.UseSqlServer(configuration.GetConnectionString(DbContextConfigConstants.DB_CONNECTION_CONFIG_NAME),
x =>
{
x.MigrationsHistoryTable("__EFMigrationsHistory");
x.MigrationsAssembly(typeof(DbContextExtensions).Assembly.GetName().Name);
}
);
});
}
public static void SeedEmployeesData(this IApplicationBuilder app, IConfiguration configuration)
{
using (var serviceScope = app.ApplicationServices
.GetRequiredService<IServiceScopeFactory>()
.CreateScope())
{
using (var context = serviceScope.ServiceProvider.GetService<EmployeesDatabaseContext>())
{
context.Database.Migrate();
var assembly = typeof(DbContextExtensions).Assembly;
var files = assembly.GetManifestResourceNames();
var executedSeedings = context.SeedingEntries.ToArray();
var filePrefix = $"{assembly.GetName().Name}.Seedings.";
foreach (var file in files.Where(f => f.StartsWith(filePrefix) && f.EndsWith(".sql"))
.Select(f => new
{
PhysicalFile = f,
LogicalFile = f.Replace(filePrefix, String.Empty)
})
.OrderBy(f => f.LogicalFile))
{
if (executedSeedings.Any(e => e.Name == file.LogicalFile))
continue;
string command = string.Empty;
using (Stream stream = assembly.GetManifestResourceStream(file.PhysicalFile))
{
using (StreamReader reader = new StreamReader(stream))
{
command = reader.ReadToEnd();
}
}
if (String.IsNullOrWhiteSpace(command))
continue;
using (var transaction = context.Database.BeginTransaction())
{
try
{
context.Database.ExecuteSqlRaw(command);
context.SeedingEntries.Add(new Entities.SeedingEntry() { Name = file.LogicalFile });
                                context.SaveChanges();
                                transaction.Commit();
                            }
                            catch 
                            {
                                transaction.Rollback();
                                throw;
                            }
                        }

                    }
                }
            }
        }
    }
}


    

To ensure only executed scripts are tracked as records in __SeedingHistory table, both script execution and storing of the seeding execution record need to run inside the same transaction. This ensures that if script or seed tracking fails, both are rolled back and seeding is stopped.

Note

In case you need to stop the seeding from the SQL script itself you can use RAISERROR (Transact-SQL) https://docs.microsoft.com/en-us/sql/t-sql/language-elements/raiserror-transact-sql?view=sql-server-ver15 if you are using Microsoft SQL Server

Wiring it to ASP.NET Core project

Since both DbContext registration and seeding have their extension methods, it is pretty straight forward to add them to the ASP.NET Core dependency injection and pipeline. Now this should not be invoked directly from the API level, but for the sake of simplicity we will application and data/repository layer.

All the changes need to be done in the Startup.cs. Basically we just need to invoke our extension methods

namespace Sample.Seeding.Api
{
    public class Startup
    {
        public Startup(IConfiguration configuration)
        {
            Configuration = configuration;
        }

        public IConfiguration Configuration { get; }

        // This method gets called by the runtime. Use this method to add services to the container.
        public void ConfigureServices(IServiceCollection services)
        {
            //REGISTER EMPLOYEES DB CONTEXT
            services.AddEmployeesDbContext(Configuration);

            services.AddControllers();
        }

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            if (env.IsDevelopment())
            {
                app.UseDeveloperExceptionPage();
            }

            //DO THE EMPLOYEES SEEDING

            app.SeedEmployeesData(Configuration);

            app.UseHttpsRedirection();
            app.UseRouting();
            app.UseAuthorization();
            app.UseEndpoints(endpoints =>
            {
                endpoints.MapControllers();
            });
        }
    }
}
    

Once we have dependency injection and the pipeline extension methods invocation in place, we can just get the data from the database

namespace Sample.Seeding.Api.Controllers
{
    [ApiController]
    [Route("[controller]")]
    public class EmployeesController : ControllerBase
    {
        readonly EmployeesDatabaseContext _employeesDatabaseContext;
        public EmployeesController(EmployeesDatabaseContext employeesDatabaseContext)
        {
            _employeesDatabaseContext = employeesDatabaseContext;
        }

        [HttpGet]
        public async Task<IEnumerable<Employee>> Get()
        {
            return await _employeesDatabaseContext.Employees.ToListAsync();
        }
    }
}
    

Now once we run the project and try to access our employees endpoint, we will get all the data inserted by the seeding scripts


[
    {
        "id": "26a638a6-4bf1-4cd5-8260-57d0f13f3cb8",
        "firstName": "Michael",
        "lastName": "Sullivan",
        "email": "m.sullivan@sample.com",
        "phoneNumber": "697-555-0142"
    },
    {
        "id": "ca728139-e56e-4d4a-946d-61d7b6458a1b",
        "firstName": "David",
        "lastName": "Bradley",
        "email": "d.bradley@sample.com",
        "phoneNumber": null
    },
    {
        "id": "192b9869-f709-4b6c-8bc4-680c6bf41952",
        "firstName": "Kevin",
        "lastName": "Brown",
        "email": "k.brown@sample.com",
        "phoneNumber": null
    }
]


    
Note

Seeding using SQL scripts does not necessary need to execute only INSERT commands. You can use scripts to do conditional INSERT/UPDATE/DELETE based on the present data on different environments

Making it easier with CLI global tool

If you have used command line to create migration with code-first approach with EF Core you are definately familiar with CLI global tools. It is actually required that you have installed dotnet-ef global tool in order to create migrations from the command line

dotnet tool install dotnet-ef --global

To make generating of seeding SQL script files easier for your team you can use .NET Core CLI global tool. A part of the repository on GitHub for this article is global tool project which you can use to generate SQL script files. More on how to build, debug, install and use your own global tools you can checkout in article Building and using advanced .NET Core CLI global tools.

Global tool code for creating seedig SQL script files and adding it as embedded resource to your .csproj if quite simple as it has only one command which is add.

This command is responsible for two major operations:

  • creating SQL script file with proper name prefixed by timestamp
  • adding file as embeded resource to .csproj file

As a nice addition it uses a sample Template.sql file which behaves similar to NotImplementedException empty method body that Visual Studio generates.

/******** Add you seeeding script here ********/


/******* Comment out or remove following line once you add your script ********/
RAISERROR ('No seeding script implemented', 16,1)
    

This forces you to alter the file with your logic. If you by accident hit F5 and run the project it will break by throwing and SQL exception when it tries to execute seeding file body.

More on this along with functional code you can find on Github https://github.com/dejanstojanovic/sample-seeding-sql/tree/master/Sample.Seeding.Tool.

References

Disclaimer

Purpose of the code contained in snippets or available for download in this article is solely for learning and demo purposes. Author will not be held responsible for any failure or damages caused due to any other usage.


About the author

DEJAN STOJANOVIC

Dejan is a passionate Software Architect/Developer. He is highly experienced in .NET programming platform including ASP.NET MVC and WebApi. He likes working on new technologies and exciting challenging projects

CONNECT WITH DEJAN  Loginlinkedin Logintwitter Logingoogleplus Logingoogleplus

JavaScript

read more

SQL/T-SQL

read more

Umbraco CMS

read more

PowerShell

read more

Comments for this article