[
{
"objectID": "about.html",
"href": "about.html",
"title": "About",
"section": "",
"text": "My musings in my journey for better software design. I am a huge fan of Domain Driven Design, Object-Oriented programming, and Functional Programming. I am constantly looking for new ways to improve myself and my designs."
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html",
"title": "PySpark Unit Testing",
"section": "",
"text": "One of the features that I love about PySpark is the data frame abstraction is agnostic about data sources and destinations. This allows unit testing data transformations without connecting to a database or file system. As a bonus data frames are immutable and each transformation returns a new one. This means we can test transformations on the data in bite-sized chunks.\nAll we need to do is create an in-memory data frame, perform a set of transformations, then check the data by executing an action. As an added bonus, the transformations can be tested without data. Given an input schema, the resulting output schema can be confirmed alone. No data needed. This allows us to make sure that our PySpark script fulfills its contract. The unit tests can verify preconditions and postconditions by comparing schemas."
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#introduction",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#introduction",
"title": "PySpark Unit Testing",
"section": "",
"text": "One of the features that I love about PySpark is the data frame abstraction is agnostic about data sources and destinations. This allows unit testing data transformations without connecting to a database or file system. As a bonus data frames are immutable and each transformation returns a new one. This means we can test transformations on the data in bite-sized chunks.\nAll we need to do is create an in-memory data frame, perform a set of transformations, then check the data by executing an action. As an added bonus, the transformations can be tested without data. Given an input schema, the resulting output schema can be confirmed alone. No data needed. This allows us to make sure that our PySpark script fulfills its contract. The unit tests can verify preconditions and postconditions by comparing schemas."
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#example",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#example",
"title": "PySpark Unit Testing",
"section": "Example",
"text": "Example\nLet’s start with a simple example. Our dataset is a simple list of musicians with their bands and roles. It’s just enough to keep explanations clear.\n\nfrom pyspark.sql import SparkSession, DataFrame, Column\nimport pyspark.sql.types as sqlt\nimport pyspark.sql.functions as sqlf\n\n\nspark = SparkSession.builder.getOrCreate();\n\n\ndata = [\n (\"Rob\", \"Halford\", \"Judas Priest\", \"Singer\"),\n (\"Alice\", \"Cooper\", \"Hollywood Vampires\", \"Singer\"),\n (\"Steve\", \"Harris\", \"Iron Maiden\", \"Bassist\"),\n (\"James\", \"Hetfield\", \"Metallica\", \"Singer\"),\n (\"Bernie\", \"Worrell\", \"Parliament\", \"Keyboardist\"),\n]\n\nschema = sqlt.StructType(\n [\n sqlt.StructField(\"first_name\", sqlt.StringType(), True),\n sqlt.StructField(\"last_name\", sqlt.StringType(), True),\n sqlt.StructField(\"band\", sqlt.StringType(), True),\n sqlt.StructField(\"role\", sqlt.StringType(), True),\n ]\n)\n\ndf = spark.createDataFrame(data=data, schema=schema)\ndf.printSchema()\ndf.show(truncate=False)\n\nroot\n |-- first_name: string (nullable = true)\n |-- last_name: string (nullable = true)\n |-- band: string (nullable = true)\n |-- role: string (nullable = true)\n\n+----------+---------+------------------+-----------+\n|first_name|last_name|band |role |\n+----------+---------+------------------+-----------+\n|Rob |Halford |Judas Priest |Singer |\n|Alice |Cooper |Hollywood Vampires|Singer |\n|Steve |Harris |Iron Maiden |Bassist |\n|James |Hetfield |Metallica |Singer |\n|Bernie |Worrell |Parliament |Keyboardist|\n+----------+---------+------------------+-----------+"
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#implement-transformations",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#implement-transformations",
"title": "PySpark Unit Testing",
"section": "Implement Transformations",
"text": "Implement Transformations\nLet’s create some transformation functions that we can use to test our data.\n\ndef drop_unnecessary(df: DataFrame) -> DataFrame:\n \"\"\"\n Removed the role column\n \"\"\"\n return df.drop(\"role\")\n\n\ndef only_singers(df: DataFrame) -> DataFrame:\n \"\"\"\n Filter out only the singers\n \"\"\"\n return df.where(sqlf.col(\"role\") == \"Singer\")\n\n\ndef combine_names(first: Column, last: Column) -> Column:\n \"\"\"\n Take the first name and last name columns and create a single name structure\n \"\"\"\n return sqlf.struct(first.alias(\"first\"), last.alias(\"last\"))\n\n\ndef fix_name(df: DataFrame) -> DataFrame:\n \"\"\"\n Fix names from two columns to one\n \"\"\"\n return df.select(\n \"band\",\n combine_names(sqlf.col(\"first_name\"), sqlf.col(\"last_name\")).alias(\"name\"),\n )"
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#run-transformations",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#run-transformations",
"title": "PySpark Unit Testing",
"section": "Run Transformations",
"text": "Run Transformations\nLet’s run the transformations that we just implemented. The “transform” method on DataFrame allows us to call the transformations in sequence without having to create temporary variables. It also makes the code easier to read with less noise.\n\n\"\"\"\nSince data frames are lazy, we can filter out the singers after dropping the role column. Notice that\nthe code below runs and returns the correct answer.\n\"\"\"\nnew_df = df.transform(drop_unnecessary).transform(only_singers).transform(fix_name)\nnew_df.show()\n\n+------------------+-----------------+\n| band| name|\n+------------------+-----------------+\n| Judas Priest| {Rob, Halford}|\n|Hollywood Vampires| {Alice, Cooper}|\n| Metallica|{James, Hetfield}|\n+------------------+-----------------+"
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#tests",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#tests",
"title": "PySpark Unit Testing",
"section": "Tests",
"text": "Tests\nNow, let’s write some tests that verify the output schema for each transformation and that each transformation changes the input data as expected.\n\nimport pytest\nimport ipytest\n\nipytest.autoconfig()\nipytest.clean()\n\n\n@pytest.fixture\ndef spark() -> SparkSession:\n return SparkSession.builder.getOrCreate()\n\n\n@pytest.fixture\ndef schema() -> sqlt.StructType:\n \"\"\"\n Create the input schema to be used for tests\n \"\"\"\n return sqlt.StructType(\n [\n sqlt.StructField(\"first_name\", sqlt.StringType(), True),\n sqlt.StructField(\"last_name\", sqlt.StringType(), True),\n sqlt.StructField(\"band\", sqlt.StringType(), True),\n sqlt.StructField(\"role\", sqlt.StringType(), True),\n ]\n )\n\n\n@pytest.fixture\ndef data(spark: SparkSession, schema: sqlt.StructType) -> DataFrame:\n \"\"\"\n Sample input data to use in tests\n \"\"\"\n data = [\n (\"Rob\", \"Halford\", \"Judas Priest\", \"Singer\"),\n (\"Alice\", \"Cooper\", \"Hollywood Vampires\", \"Singer\"),\n (\"Steve\", \"Harris\", \"Iron Maiden\", \"Bassist\"),\n (\"James\", \"Hetfield\", \"Metallica\", \"Singer\"),\n (\"Bernie\", \"Worrell\", \"Parliament\", \"Keyboardist\"),\n ]\n return spark.createDataFrame(data=data, schema=schema)\n\n\n# Schema tests\ndef test_drop_unnecessary_schema(spark: SparkSession, schema: sqlt.StructType):\n \"\"\"\n This test verifies only the column names\n \"\"\"\n empty = spark.createDataFrame(data=[], schema=schema)\n result = empty.transform(drop_unnecessary)\n assert [\"first_name\", \"last_name\", \"band\"] == result.columns\n\n\ndef test_only_singers_schema(spark: SparkSession, schema: sqlt.StructType):\n \"\"\"\n You can also verify the contract by comparing schema instances.\n In this case, the schemas should be the same.\n \"\"\"\n empty = spark.createDataFrame(data=[], schema=schema)\n result = empty.transform(only_singers)\n assert result.schema == schema\n\n\ndef test_fix_name_schema(spark: SparkSession, schema: sqlt.StructType):\n \"\"\"\n Schema can also be verified with a simple string\n \"\"\"\n empty = spark.createDataFrame(data=[], schema=schema)\n result = empty.transform(fix_name)\n assert [\"band\", \"name\"] == result.columns\n assert (\n result.schema[\"name\"].simpleString() == \"name:struct<first:string,last:string>\"\n )\n\n\n# Data test\ndef test_end_to_end(spark: SparkSession, data: DataFrame):\n result = (\n data.transform(drop_unnecessary).transform(only_singers).transform(fix_name)\n )\n # call an action\n output = result.collect()\n assert len(output) == 3\n\n assert output[0][\"band\"] == \"Judas Priest\"\n assert output[0][\"name\"][\"first\"] == \"Rob\"\n assert output[0][\"name\"][\"last\"] == \"Halford\"\n\n assert output[1][\"band\"] == \"Hollywood Vampires\"\n assert output[2][\"band\"] == \"Metallica\"\n\n\nipytest.run();\n\n.... [100%]\n4 passed in 0.20s"
},
{
"objectID": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#conclusion",
"href": "posts/2023-09-28-pyspark-testing/pyspark-testing.html#conclusion",
"title": "PySpark Unit Testing",
"section": "Conclusion",
"text": "Conclusion\nI hope this gave you some ideas on how to test the transformations in your own pipelines. It’s helped me simplify my tests and verify edge cases more easily. It’s caught many errors sooner without doing a full run of the script or using all of the data. It also makes debugging quicker since the tests are smaller."
},
{
"objectID": "posts/2023-07-29-welcome/index.html",
"href": "posts/2023-07-29-welcome/index.html",
"title": "Welcome To My New Blog",
"section": "",
"text": "This is the first post in my new Quarto blog. Welcome!\nIf you are looking for my older posts, please visit here."
},
{
"objectID": "posts/2023-08-27-julia-hanoi/blog.html",
"href": "posts/2023-08-27-julia-hanoi/blog.html",
"title": "Tower Of Hanoi Solver",
"section": "",
"text": "I’ve always loved the Towers of Hanoi ever since I was introduced to it as a kid by chance. I thought it would be perfect to write a solver using Julia to learn it. This is my first project using Julia and I found it enjoyable. I’m still learning the ropes so to speak. For future projects, I would like to stick to a more strict functional style of programming for the solvers and use the naming convention of using “!” postfix to denote stateful functions.\nJulia initially picqued my interest with its take on types, multiple dispatch, and no objects. It forces you away from generic types which can make for more readable code. The typing system gets you thinking about the problem differently and how you can use types to better document intent without relying on problematic “if” statements. The packaging system, REPL, and jupyter integration make developing and trying out packages easy. There is a learning curve and some frustrations with how code is loaded that gets in the way of development initially. But, nothing a Google search can’t fix. All part of learning something new and a different take on programming.\nThe lack of encapsulation is worrisome for large projects. I have the same issue with Python, but documentation and a few team guidelines help. It’s always good to have strict enforcement though. Minimizing mutable state structures and documenting the functions that mutate to use goes a long way. All in all a minor quibble.\nOverall, I enjoyed coding in Julia and plan to implement more projects with it. The type system is the intriguing piece and I did some experiements in the code to try to push my understanding. Look at the implementation of moves in the domain for an example. This was a quick weekend project and comments are minimal. There are several emergent design patterns in Julia that I would love to explore as well like the Holy Trait Pattern and work more through the excellent “Hands on Design Patterns and Best Practices with Julia” book.\nIf you would like to follow along, I have provided the source for the domain along with the tests. I implemented two solvers and provided the source for the naive version and the A* with heurstic version.\n\nimport Pkg\nPkg.activate(\".\")\n\n Activating project at `~/Documents/julia/TowerOfHanoi`"
},
{
"objectID": "posts/2023-08-27-julia-hanoi/blog.html#experimenting-with-julia",
"href": "posts/2023-08-27-julia-hanoi/blog.html#experimenting-with-julia",
"title": "Tower Of Hanoi Solver",
"section": "",
"text": "I’ve always loved the Towers of Hanoi ever since I was introduced to it as a kid by chance. I thought it would be perfect to write a solver using Julia to learn it. This is my first project using Julia and I found it enjoyable. I’m still learning the ropes so to speak. For future projects, I would like to stick to a more strict functional style of programming for the solvers and use the naming convention of using “!” postfix to denote stateful functions.\nJulia initially picqued my interest with its take on types, multiple dispatch, and no objects. It forces you away from generic types which can make for more readable code. The typing system gets you thinking about the problem differently and how you can use types to better document intent without relying on problematic “if” statements. The packaging system, REPL, and jupyter integration make developing and trying out packages easy. There is a learning curve and some frustrations with how code is loaded that gets in the way of development initially. But, nothing a Google search can’t fix. All part of learning something new and a different take on programming.\nThe lack of encapsulation is worrisome for large projects. I have the same issue with Python, but documentation and a few team guidelines help. It’s always good to have strict enforcement though. Minimizing mutable state structures and documenting the functions that mutate to use goes a long way. All in all a minor quibble.\nOverall, I enjoyed coding in Julia and plan to implement more projects with it. The type system is the intriguing piece and I did some experiements in the code to try to push my understanding. Look at the implementation of moves in the domain for an example. This was a quick weekend project and comments are minimal. There are several emergent design patterns in Julia that I would love to explore as well like the Holy Trait Pattern and work more through the excellent “Hands on Design Patterns and Best Practices with Julia” book.\nIf you would like to follow along, I have provided the source for the domain along with the tests. I implemented two solvers and provided the source for the naive version and the A* with heurstic version.\n\nimport Pkg\nPkg.activate(\".\")\n\n Activating project at `~/Documents/julia/TowerOfHanoi`"
},
{
"objectID": "posts/2023-08-27-julia-hanoi/blog.html#naive",
"href": "posts/2023-08-27-julia-hanoi/blog.html#naive",
"title": "Tower Of Hanoi Solver",
"section": "Naive",
"text": "Naive\nA sample run with 4 discs is shown below. Keep scrolling for the A* version.\n\ninclude(\"src/naive_solver.jl\")\n\nMoves to solve: 79\n *** \n ***** \n ******* \n ********* \nheurstic value: 16\nMove from 1 to 3\n \n ***** \n ******* \n ********* *** \nheurstic value: 14\nMove from 3 to 2\n \n ***** \n ******* \n ********* *** \nheurstic value: 14\nMove from 1 to 3\n \n \n ******* \n ********* *** ***** \nheurstic value: 14\nMove from 2 to 3\n \n \n ******* *** \n ********* ***** \nheurstic value: 16\nMove from 3 to 1\n \n *** \n ******* \n ********* ***** \nheurstic value: 15\nMove from 3 to 2\n \n *** \n ******* \n ********* ***** \nheurstic value: 15\nMove from 1 to 3\n \n \n ******* \n ********* ***** *** \nheurstic value: 14\nMove from 3 to 2\n \n \n ******* *** \n ********* ***** \nheurstic value: 16\nMove from 1 to 3\n \n \n *** \n ********* ***** ******* \nheurstic value: 18\nMove from 2 to 3\n \n \n *** \n ********* ***** ******* \nheurstic value: 18\nMove from 3 to 1\n \n \n *** \n ********* ***** ******* \nheurstic value: 16\nMove from 2 to 3\n \n \n *** ***** \n ********* ******* \nheurstic value: 18\nMove from 1 to 3\n \n *** \n ***** \n ********* ******* \nheurstic value: 22\nMove from 3 to 2\n \n \n ***** \n ********* *** ******* \nheurstic value: 18\nMove from 3 to 1\n \n \n ***** \n ********* *** ******* \nheurstic value: 15\nMove from 2 to 3\n \n \n ***** *** \n ********* ******* \nheurstic value: 17\nMove from 3 to 1\n \n *** \n ***** \n ********* ******* \nheurstic value: 16\nMove from 3 to 2\n \n *** \n ***** \n ********* ******* \nheurstic value: 16\nMove from 1 to 3\n \n \n ***** \n ********* ******* *** \nheurstic value: 15\nMove from 3 to 2\n \n \n ***** *** \n ********* ******* \nheurstic value: 17\nMove from 1 to 3\n \n \n *** \n ********* ******* ***** \nheurstic value: 18\nMove from 2 to 3\n \n \n *** \n ********* ******* ***** \nheurstic value: 18\nMove from 3 to 1\n \n \n *** \n ********* ******* ***** \nheurstic value: 16\nMove from 3 to 2\n \n \n *** ***** \n ********* ******* \nheurstic value: 18\nMove from 1 to 3\n \n \n ***** \n ********* ******* *** \nheurstic value: 18\nMove from 3 to 2\n \n *** \n ***** \n ********* ******* \nheurstic value: 22\nMove from 1 to 3\n \n *** \n ***** \n ******* ********* \nheurstic value: 26\nMove from 2 to 3\n \n \n ***** *** \n ******* ********* \nheurstic value: 24\nMove from 3 to 1\n \n \n ***** \n *** ******* ********* \nheurstic value: 21\nMove from 2 to 3\n \n \n ***** \n *** ******* ********* \nheurstic value: 21\nMove from 1 to 3\n \n *** \n ***** \n ******* ********* \nheurstic value: 26\nMove from 3 to 2\n \n \n *** ***** \n ******* ********* \nheurstic value: 24\nMove from 3 to 1\n \n \n *** \n ***** ******* ********* \nheurstic value: 20\nMove from 2 to 3\n \n \n *** \n ***** ******* ********* \nheurstic value: 20\nMove from 3 to 1\n \n \n *** \n ***** ******* ********* \nheurstic value: 18\nMove from 2 to 3\n \n \n *** ******* \n ***** ********* \nheurstic value: 20\nMove from 1 to 3\n \n *** \n ******* \n ***** ********* \nheurstic value: 24\nMove from 3 to 2\n \n \n ******* \n ***** *** ********* \nheurstic value: 20\nMove from 1 to 3\n \n ***** \n ******* \n *** ********* \nheurstic value: 26\nMove from 2 to 3\n *** \n ***** \n ******* \n ********* \nheurstic value: 32\nMove from 3 to 1\n \n ***** \n ******* \n *** ********* \nheurstic value: 25\nMove from 3 to 2\n \n \n ******* \n *** ***** ********* \nheurstic value: 21\nMove from 1 to 3\n \n *** \n ******* \n ***** ********* \nheurstic value: 26\nMove from 3 to 2\n \n \n *** ******* \n ***** ********* \nheurstic value: 24\nMove from 3 to 1\n \n \n *** \n ******* ***** ********* \nheurstic value: 19\nMove from 2 to 3\n \n \n *** \n ******* ***** ********* \nheurstic value: 19\nMove from 3 to 1\n \n \n *** \n ******* ***** ********* \nheurstic value: 17\nMove from 2 to 3\n \n \n *** ***** \n ******* ********* \nheurstic value: 19\nMove from 1 to 3\n \n *** \n ***** \n ******* ********* \nheurstic value: 23\nMove from 3 to 2\n \n \n ***** \n ******* *** ********* \nheurstic value: 19\nMove from 3 to 1\n \n \n ***** \n ******* *** ********* \nheurstic value: 16\nMove from 2 to 3\n \n \n ***** *** \n ******* ********* \nheurstic value: 18\nMove from 3 to 1\n \n *** \n ***** \n ******* ********* \nheurstic value: 17\nMove from 3 to 2\n \n *** \n ***** \n ******* ********* \nheurstic value: 17\nMove from 1 to 3\n \n \n ***** \n ******* ********* *** \nheurstic value: 16\nMove from 3 to 2\n \n \n ***** *** \n ******* ********* \nheurstic value: 18\nMove from 1 to 3\n \n \n *** \n ******* ********* ***** \nheurstic value: 19\nMove from 2 to 3\n \n \n *** \n ******* ********* ***** \nheurstic value: 19\nMove from 3 to 1\n \n \n *** \n ******* ********* ***** \nheurstic value: 17\nMove from 3 to 2\n \n \n *** ***** \n ******* ********* \nheurstic value: 19\nMove from 1 to 3\n \n \n ***** \n ******* ********* *** \nheurstic value: 19\nMove from 3 to 2\n \n *** \n ***** \n ******* ********* \nheurstic value: 23\nMove from 1 to 3\n \n *** \n ***** \n ********* ******* \nheurstic value: 26\nMove from 2 to 3\n \n \n ***** *** \n ********* ******* \nheurstic value: 24\nMove from 3 to 1\n \n \n ***** \n *** ********* ******* \nheurstic value: 21\nMove from 2 to 3\n \n \n ***** \n *** ********* ******* \nheurstic value: 21\nMove from 1 to 3\n \n *** \n ***** \n ********* ******* \nheurstic value: 26\nMove from 3 to 2\n \n \n *** ***** \n ********* ******* \nheurstic value: 24\nMove from 3 to 1\n \n \n *** \n ***** ********* ******* \nheurstic value: 20\nMove from 2 to 3\n \n \n *** \n ***** ********* ******* \nheurstic value: 20\nMove from 3 to 1\n \n \n *** \n ***** ********* ******* \nheurstic value: 18\nMove from 3 to 2\n \n \n *** ******* \n ***** ********* \nheurstic value: 20\nMove from 1 to 3\n \n \n ******* \n ***** ********* *** \nheurstic value: 20\nMove from 3 to 2\n \n *** \n ******* \n ***** ********* \nheurstic value: 24\nMove from 1 to 3\n \n *** \n ******* \n ********* ***** \nheurstic value: 26\nMove from 2 to 3\n \n \n ******* *** \n ********* ***** \nheurstic value: 24\nMove from 3 to 1\n \n \n ******* \n *** ********* ***** \nheurstic value: 21\nMove from 3 to 2\n \n ***** \n ******* \n *** ********* \nheurstic value: 25\nMove from 1 to 2\nfinal:\n *** \n ***** \n ******* \n *********"
},
{
"objectID": "posts/2023-08-27-julia-hanoi/blog.html#a-heuristic",
"href": "posts/2023-08-27-julia-hanoi/blog.html#a-heuristic",
"title": "Tower Of Hanoi Solver",
"section": "A* Heuristic",
"text": "A* Heuristic\nI implemented the A* algorigithm and used the following heurstic function:\nfunction heuristic(initial::Tower, state::Tower)\n disc_cache = Dict()\n for (r, rod) in enumerate(initial.rods)\n for (d, disc) in enumerate(rod.discs)\n disc_cache[disc]=(r,d)\n end\n end\n num_discs = length(disc_cache)\n result = 0\n for (r, rod) in enumerate(state.rods)\n for (d, disc) in enumerate(rod.discs)\n (ir, id) = disc_cache[disc]\n rod_diff = abs(ir - r) > 0 ? 2 : 1\n disc_diff = num_discs - abs(id - d)\n result += rod_diff * disc_diff\n end\n end\n result\nend\nIt simply sums how many discs are not in the goal state from the given intial. Thus, a higher score is closer to the goal and the A* algorithm maximizes search for it.\nA sample run with 4 discs is shown below.\nThe naive solver took 79 moves and the A* solver took 21 moves. A huge improvement in finding the solution. This was a great project to learn the basics of Julia.\n\ninclude(\"src/heuristic_solver.jl\")\n\n\nNumber of moves: 21\n *** \n ***** \n ******* \n ********* \nheurstic value: 16\nMove from 1 to 2\n \n ***** \n ******* \n ********* *** \nheurstic value: 14\nMove from 2 to 3\n \n ***** \n ******* \n ********* *** \nheurstic value: 14\nMove from 1 to 2\n \n \n ******* \n ********* ***** *** \nheurstic value: 14\nMove from 3 to 2\n \n \n ******* *** \n ********* ***** \nheurstic value: 16\nMove from 1 to 3\n \n \n *** \n ********* ***** ******* \nheurstic value: 18\nMove from 2 to 3\n \n \n *** \n ********* ***** ******* \nheurstic value: 18\nMove from 3 to 1\n \n \n *** \n ********* ***** ******* \nheurstic value: 16\nMove from 2 to 3\n \n \n *** ***** \n ********* ******* \nheurstic value: 18\nMove from 1 to 3\n \n *** \n ***** \n ********* ******* \nheurstic value: 22\nMove from 1 to 2\n \n *** \n ***** \n ********* ******* \nheurstic value: 26\nMove from 3 to 2\n \n \n *** ***** \n ********* ******* \nheurstic value: 24\nMove from 3 to 1\n \n \n *** \n ***** ********* ******* \nheurstic value: 20\nMove from 2 to 3\n \n \n *** \n ***** ********* ******* \nheurstic value: 20\nMove from 3 to 1\n \n \n *** \n ***** ********* ******* \nheurstic value: 18\nMove from 3 to 2\n \n \n *** ******* \n ***** ********* \nheurstic value: 20\nMove from 1 to 2\n \n *** \n ******* \n ***** ********* \nheurstic value: 24\nMove from 1 to 3\n \n *** \n ******* \n ********* ***** \nheurstic value: 26\nMove from 2 to 3\n \n \n ******* *** \n ********* ***** \nheurstic value: 24\nMove from 3 to 1\n \n \n ******* \n *** ********* ***** \nheurstic value: 21\nMove from 3 to 2\n \n ***** \n ******* \n *** ********* \nheurstic value: 25\nMove from 1 to 2\n *** \n ***** \n ******* \n *********"
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html",
"title": "Einstein Puzzle Solver",
"section": "",
"text": "I love number puzzles and bought one inspired by Einstein on holiday recently. The idea is simple. There’s 16 tiles with numbers that can be placed in a 4x4 grid. The goal is for each row, column, and diagonal to sum up to 264. The tiles can be flipped so if tile has “66” imprinted on it, it can also represent “99”.\nHere’s an example layout of tiles.\n---------------------\n| 18 | 89 | 98 | 61 |\n-----+----+----+-----\n| 68 | 91 | 88 | 16 |\n-----+----+----+-----\n| 81 | 19 | 66 | 99 |\n-----+----+----+-----\n| 96 | 69 | 11 | 86 |\n---------------------\nThings quickly escalated when I got the idea to write a solver that would use genetic programming concepts. The algorithm is simple. Start with a population of all scrambled states, then pick the best layouts out of the population. Exchange a few tiles in each of samples from the the best to create a new population along with a few new scrambled states. Repeat until we get a layout of tiles that satisfies the goal."
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#puzzle-and-idea",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#puzzle-and-idea",
"title": "Einstein Puzzle Solver",
"section": "",
"text": "I love number puzzles and bought one inspired by Einstein on holiday recently. The idea is simple. There’s 16 tiles with numbers that can be placed in a 4x4 grid. The goal is for each row, column, and diagonal to sum up to 264. The tiles can be flipped so if tile has “66” imprinted on it, it can also represent “99”.\nHere’s an example layout of tiles.\n---------------------\n| 18 | 89 | 98 | 61 |\n-----+----+----+-----\n| 68 | 91 | 88 | 16 |\n-----+----+----+-----\n| 81 | 19 | 66 | 99 |\n-----+----+----+-----\n| 96 | 69 | 11 | 86 |\n---------------------\nThings quickly escalated when I got the idea to write a solver that would use genetic programming concepts. The algorithm is simple. Start with a population of all scrambled states, then pick the best layouts out of the population. Exchange a few tiles in each of samples from the the best to create a new population along with a few new scrambled states. Repeat until we get a layout of tiles that satisfies the goal."
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#puzzle-domain",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#puzzle-domain",
"title": "Einstein Puzzle Solver",
"section": "Puzzle Domain",
"text": "Puzzle Domain\nFirst things first. A domain to model the puzzle was needed.\n\n\nSource\nimport itertools\nimport random\nfrom typing import Iterable, Sequence\n\n# Type to represent the layout of tiles\nState = list[int]\n\n# Map the values of a number when flipped\nFLIP_MAP: dict[int, int] = {\n 1: 1,\n 6: 9,\n 8: 8,\n 9: 6,\n}\n\n# All the tiles represented\nALL_TILES: State = [\n 66,\n 66,\n 19,\n 19,\n 86,\n 86,\n 89,\n 89,\n 16,\n 16,\n 18,\n 18,\n 88,\n 96,\n 69,\n 11,\n]\n\n# The goal that each row, column, and diagonal should sum up to\nGOAL = 264\n\n\ndef flip(tile: int, flip_map: dict[int, int] = FLIP_MAP) -> int:\n \"\"\"\n Take a tile and flip it. So, 89 will become 68.\n Calling it twice should return the original tile. 89->68->89\n \"\"\"\n assert tile > 9 and tile < 100\n ones = tile % 10\n tens = tile // 10\n return flip_map[ones] * 10 + flip_map[tens]\n\n\ndef add_rows(tiles: State) -> list[int]:\n \"\"\"\n Return the sum of all rows\n \"\"\"\n return [sum(tiles[0:4]), sum(tiles[4:8]), sum(tiles[8:12]), sum(tiles[12:16])]\n\n\ndef add_columns(tiles: State) -> list[int]:\n \"\"\"\n Return the sum of all columns\n \"\"\"\n return [\n sum(each for i, each in enumerate(tiles) if i % 4 == 0),\n sum(each for i, each in enumerate(tiles) if i % 4 == 1),\n sum(each for i, each in enumerate(tiles) if i % 4 == 2),\n sum(each for i, each in enumerate(tiles) if i % 4 == 3),\n ]\n\n\ndef add_diagonals(tiles: State) -> list[int]:\n \"\"\"\n Return the sum of the two diagonals\n \"\"\"\n return [\n tiles[0] + tiles[5] + tiles[10] + tiles[15],\n tiles[12] + tiles[9] + tiles[6] + tiles[3],\n ]\n\n\ndef score_iter(tiles: State) -> Iterable[int]:\n \"\"\"\n Return a score for each row, column, and diagonal.\n The score is the distance away from the goal.\n \"\"\"\n return (\n abs(each - GOAL)\n for each in itertools.chain(\n add_rows(tiles), add_columns(tiles), add_diagonals(tiles)\n )\n )\n\n\ndef score(tiles: State) -> float:\n \"Return a sum of how far away from goal for each row, column, and diagonal. This score is divided by the goal.\"\n return sum(each / float(GOAL) for each in score_iter(tiles))\n\n\ndef score_breakdown(tiles: State) -> list[float]:\n \"Return a list of all the scores\"\n return list(score_iter(tiles))\n\n\ndef chance(probability: float) -> bool:\n \"Return True with probability between 1 and 0\"\n return random.random() < probability\n\n\ndef exchange(\n tiles: State,\n probability_to_flip: float = 0.5,\n probability_to_exchange: float = 0.95,\n) -> State:\n \"Return a new state of tiles after exchanging two and possibly flipping either\"\n choices = random.sample(range(len(tiles)), k=2)\n new_tiles = tiles.copy()\n for index in choices:\n if chance(probability_to_flip):\n new_tiles[index] = flip(new_tiles[index])\n if chance(probability_to_exchange):\n new_tiles[choices[0]], new_tiles[choices[1]] = (\n new_tiles[choices[1]],\n new_tiles[choices[0]],\n )\n return new_tiles\n\n\ndef scramble(tiles: State, probability_to_flip: float = 0.5) -> State:\n \"Scramble all of the tiles\"\n result = random.sample(tiles, k=len(tiles))\n return [flip(each) if chance(probability_to_flip) else each for each in result]"
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#domain-tests",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#domain-tests",
"title": "Einstein Puzzle Solver",
"section": "Domain Tests",
"text": "Domain Tests\nTo make sure the domain is correct, some tests were needed.\n\n\nSource\nimport ipytest\n\nipytest.autoconfig()\nipytest.clean()\n\nimport math\n\n\ndef test_flip():\n def assert_flip(input: int, expected: int):\n assert flip(input) == expected\n assert flip(expected) == input\n\n assert_flip(66, 99)\n assert_flip(19, 61)\n assert_flip(86, 98)\n assert_flip(89, 68)\n assert_flip(16, 91)\n assert_flip(18, 81)\n assert_flip(88, 88)\n assert_flip(96, 96)\n assert_flip(69, 69)\n assert_flip(11, 11)\n\n\ndef test_add_rows():\n tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]\n actual = add_rows(tiles)\n assert actual == [10, 26, 42, 58]\n\n\ndef test_add_columns():\n tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]\n actual = add_columns(tiles)\n assert actual == [28, 32, 36, 40]\n\n\ndef test_add_diagonals():\n tiles = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]\n actual = add_diagonals(tiles)\n assert actual == [1 + 6 + 11 + 16, 4 + 7 + 10 + 13]\n\n\ndef test_score():\n actual = score(ALL_TILES)\n assert math.isclose(actual, 2.708, abs_tol=0.001)\n\n\ndef test_solution():\n actual = score([91, 89, 68, 16, 18, 66, 81, 99, 86, 98, 19, 61, 69, 11, 96, 88])\n assert math.isclose(actual, 0.0, abs_tol=0.001)\n\n\ndef test_solution_another():\n actual = score([18, 86, 69, 91, 99, 61, 88, 16, 81, 19, 96, 68, 66, 98, 11, 89])\n assert math.isclose(actual, 0.0, abs_tol=0.001)\n\n\ndef test_score_breakdown():\n actual = score_breakdown(ALL_TILES)\n assert actual == [94, 86, 196, 0, 8, 0, 69, 127, 83, 52]\n\n\ndef test_sum():\n \"Example to show sum of four tiles\"\n assert 264 == sum([88, 96, 69, 11])\n\n\nipytest.run();\n\n\n......... [100%]\n9 passed in 0.01s"
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#genetic-programming-inspired-solution",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#genetic-programming-inspired-solution",
"title": "Einstein Puzzle Solver",
"section": "Genetic Programming Inspired Solution",
"text": "Genetic Programming Inspired Solution\nThe implementation creates a generation of tile layouts for each iteration until a layout is found that sums up to 264 across each row, column, and diagonal. There are few hyper parameters that were tweaked by hand while researching. The most important provided are what percentage of the top population to keep for mutations and how many new layouts to add. Finally, probabilities can be changed for the chances of flipping a tile in a mutation and if the tiles will be exchanged. Using 10% of the top tile layouts seemed to work the best, while generating a new population that is 90% of the original size by sampling with replacement mutating each. Finally, adding new scrambled layouts for the remaining 10%. Repeat until a solution is found.\nIt’s not perfect and it can get stuck in local minima. But, it is able find a solution in most cases in under 100,000 generations. Further research is needed for improvement.\n\n\n\n\n\n\n\n\ng\n\n \n\nnode0\n\n Best 10% Rest 90% \n\nnode1\n\n Sample with Replacement and Exchange 90% New Population (Random Shuffle) 10% \n\nnode0:f0->node1:f0\n\n \n\nnode2\n\n Discard \n\nnode0:f1->node2:f0\n\n \n\n\nFigure 1: How New Generations Are Created.\n\n\n\n\n\nimport random\nimport math\nfrom dataclasses import dataclass\nfrom typing import Callable, Optional\n\n\ndef initial_pop(tiles: State = ALL_TILES, size: int = 1000) -> list[list[int]]:\n \"Create a population of size of scrambled tiles\"\n return [scramble(tiles) for _ in range(size)]\n\n\ndef sample_pop_best(pop: list[State], best: list[State]) -> list[State]:\n \"\"\"\n Sample with replacement to create a new population of the best and add a population of new fully scrambled tiles.\n \"\"\"\n return [\n exchange(each) for each in random.choices(best, k=len(pop) - len(best))\n ] + initial_pop(size=len(best))\n\n\n@dataclass(frozen=True)\nclass Progress:\n \"\"\"\n Simple holder for progress\n \"\"\"\n\n generation: int\n mean_score: float\n best_score: float\n best_thus_far: State\n\n\ndef run(\n keep_percent: float = 0.1,\n progress_update: Callable[[Progress], None] = lambda progress: None,\n max_generations: int = 100_000,\n) -> Optional[State]:\n \"\"\"\n Simple solution using genetic programming concepts.\n \"\"\"\n pop = initial_pop()\n generation = 1\n to_keep = int(len(pop) * keep_percent)\n while generation <= max_generations:\n scores = list(map(score, pop))\n top_best = sorted(zip(scores, pop), key=lambda each: each[0])[:to_keep]\n best_score = top_best[0][0]\n best = top_best[0][1]\n mean_score = sum(each[0] for each in top_best) / len(top_best)\n progress = Progress(generation, mean_score, mean_score, best)\n progress_update(progress)\n if math.isclose(best_score, 0.0):\n solution_index = scores.index(best_score)\n return pop[solution_index]\n pop = sample_pop_best(pop, [each[1] for each in top_best])\n generation += 1\n return None\n\n\n@dataclass\nclass UpdatePrinter:\n \"\"\"\n Simple object to be used for printing periodic progress\n \"\"\"\n\n generation: int = 0\n how_often: int = 100\n\n def __call__(self, progress: Progress) -> None:\n self.generation = progress.generation\n if progress.generation % self.how_often == 0:\n print(\n \"gen:\",\n progress.generation,\n \"mean:\",\n round(progress.mean_score, 4),\n \"best:\",\n round(progress.best_score, 4),\n progress.best_thus_far,\n )"
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#sample-run",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#sample-run",
"title": "Einstein Puzzle Solver",
"section": "Sample run",
"text": "Sample run\n\n%%time\nseed = 1690419104\nupdate = UpdatePrinter()\nrandom.seed(seed)\nsolution = run(progress_update=update)\nprint(\"generations: \", update.generation)\nprint(\"solution: \", solution)\n\ngen: 100 mean: 0.1423 best: 0.1423 [18, 89, 98, 61, 68, 91, 88, 16, 81, 19, 66, 99, 96, 69, 11, 86]\ngen: 200 mean: 0.1404 best: 0.1404 [18, 91, 86, 69, 61, 88, 96, 18, 98, 11, 66, 89, 89, 66, 19, 91]\ngen: 300 mean: 0.1322 best: 0.1322 [18, 89, 96, 61, 66, 88, 91, 19, 91, 18, 66, 86, 89, 69, 11, 98]\ngen: 400 mean: 0.1441 best: 0.1441 [19, 89, 91, 66, 66, 88, 98, 11, 91, 18, 61, 89, 86, 69, 18, 96]\ngenerations: 423\nsolution: [18, 99, 86, 61, 66, 81, 98, 19, 91, 16, 69, 88, 89, 68, 11, 96]\nCPU times: user 3.03 s, sys: 11.1 ms, total: 3.04 s\nWall time: 3.04 s"
},
{
"objectID": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#more-solutions",
"href": "posts/2023-08-20-einstein-puzzle/einstein_blog.html#more-solutions",
"title": "Einstein Puzzle Solver",
"section": "More solutions",
"text": "More solutions\n\nipytest.clean()\n\nimport pytest\nimport math\nimport random\n\nSOLUTIONS = (\n (1669425992, [16, 98, 61, 89, 69, 81, 18, 96, 88, 66, 99, 11, 91, 19, 86, 68]),\n (1669426298, [68, 16, 81, 99, 89, 91, 66, 18, 96, 88, 19, 61, 11, 69, 98, 86]),\n (1690419104, [18, 99, 86, 61, 66, 81, 98, 19, 91, 16, 69, 88, 89, 68, 11, 96]),\n)\n\n\n@pytest.mark.parametrize(\"seed, expected\", SOLUTIONS)\ndef test_solution(seed: int, expected: State):\n random.seed(seed)\n actual = run()\n assert actual == expected\n assert math.isclose(score(actual), 0.0, abs_tol=0.001)\n\n\nipytest.run();\n\n... [100%]\n3 passed in 109.71s (0:01:49)"
},
{
"objectID": "posts/2023-07-30-optional-typing-presentation/index.html",
"href": "posts/2023-07-30-optional-typing-presentation/index.html",
"title": "Optional Typing in Python",
"section": "",
"text": "Gave a talk at the Tampa Python User Group on Optional Typing. The talk was last year in May, but I wanted to share it on my new blog."
},
{
"objectID": "index.html",
"href": "index.html",
"title": "My Journey of Software Improvement",
"section": "",
"text": "PySpark Unit Testing\n\n\n\n\n\n\n\npython\n\n\npyspark\n\n\n\n\n\n\n\n\n\n\n\nSep 28, 2023\n\n\nBlaine Buxton\n\n\n\n\n\n\n \n\n\n\n\nTower Of Hanoi Solver\n\n\n\n\n\n\n\njulia\n\n\n\n\n\n\n\n\n\n\n\nAug 27, 2023\n\n\nBlaine Buxton\n\n\n\n\n\n\n \n\n\n\n\nEinstein Puzzle Solver\n\n\n\n\n\n\n\npython\n\n\n\n\n\n\n\n\n\n\n\nAug 20, 2023\n\n\nBlaine Buxton\n\n\n\n\n\n\n \n\n\n\n\nOptional Typing in Python\n\n\n\n\n\n\n\ntalks\n\n\npython\n\n\n\n\n\n\n\n\n\n\n\nJul 30, 2023\n\n\nBlaine Buxton\n\n\n\n\n\n\n \n\n\n\n\nWelcome To My New Blog\n\n\n\n\n\n\n\nnews\n\n\n\n\n\n\n\n\n\n\n\nJul 29, 2023\n\n\nBlaine Buxton\n\n\n\n\n\n\nNo matching items"
}
]