Sunday, June 22, 2025

CST363: Week 8 (Week 24)

 The three most important things I learned in this course:

1) What databases are and their advantages over for example, a flat file or spreadsheet. Learning about tables, collections, schemas, etc. helps me imagine how ubiquitous applications are designed and about design considerations that are at work. The exposure in this class bridges the gap so that in the future if I need to design a database into a system that I am working on, I will at least have a foundational understanding of what it would entail. 

2) Optimizations and considerations that come into play when choosing and designing databases. For example, indexes can speed up queries when dealing with larger databases but might be a hinderance with smaller ones. Another one is how normalization can be used as a way to preserve data integrity. All this to say that there are better and worse ways to design a database and there are well-researched approaches.

3) The libraries for creating and accessing databases in our programs was very important. I don't see myself pivoting and going into database administration/design/engineering but I can imagine having to deploy a database in a future software development project on the job. Having now worked on programming assignments that use popular database libraries, I at least have some idea of what those solutions might look like.

CST363: Week 7 (Week 23)

 MySQL and MongoDB are both databases designed to store data and support operations that allow users to access and modify the contents. Speed, scalability, data integrity, and security are among the reasons why databases are preferred over spreadsheets and flat files in applications with large data sets and more complex transactions.

MongoDB is a document-ordered database where data is stored in files written using a markup lanugage like XML, JSON, etc. It is not as rigid so the flexibility lends itself to implementations where data tends to be unstructured or changing frequently. For example, someone designing a database using MySQL and finds that there will be wide variation between the number of columns that each row entry will use/need, it would be worth looking into MongoDB instead. MongoDB is also better suited for applications where a lot write activity is expected. That is for a number of reasons including restrictive locking and overhead that comes with maintaining data integrity.

MySQL is a relational database where the relationships between data elements dictates how the data is store and accessed/presented. It is more tightly structured than MongoDB so it is better suited for well organized data that is fairly consistent.

CST363: Week 6 (Week 22)

 For this week, the focus was on database programming and embedded SQL, as opposed to transactions, design, architecture, etc. We were introduced to JDBC, Java's database API which supports different databases. In our case we used MySQL for Lab 17 and for our Lab 19 group project. We learned how to setup a database connection within our program, update entries, query, and close the connection. The API and the amount of support that it has make it a powerful tool.

CST363: Week 5 (Week 21)

 Mark Winand, the author of the site, "Use the Index Luke," has a page entitled, "slow indexes."  

Winand explains that "slow indexes" is an incorrect descriptor that is used due to misunderstanding how index operations work. In general, we know that for large databases that don't have write-heavy workloads, well designed indexes tend to speed up queries. Winand points out that many of those who attribute slow indexes to unbalanced or malformed trees aren't taking into account that index lookup is constitued by three steps: 1) tree traversal, 2) following the leaf node chain, and 3) extracting the data from the table. Rather, they incorrectly believe that the index look up only includes step 1 (tree traversal).

Winand is just explaining that "slow indexes" is a misnomer. Indexed database queries are faster when deployed for the right applications. The shortcoming is that we aren't always clear on what the underlying process or technology is doing and make incorrect assessments. It's a good example of a case where it pays to look under the hood and take a closer look.

CST363: Week 4 (Week 20)

CST 363 at the half way mark... 

Five things I've learned:

  1. I've learned a little bit about the programmatical underpinnings of databases. Up through Lab 14, using Java to represent data and perform database-like operations has been helpful in getting an idea of what is happening behind the scenes in a database.
  2. I have a much better understanding of why SQL for example, is a relational database. Designs that link and overlap data make sense especially when it comes to preventing data duplication.
  3. Following from number two, normalization was a good lesson in de-duplication and storage efficiency. We learned a bit about how just-in-time operations and binding related data across the database can prevent inconsistencies that might spring up after data is changed. The problem is reminiscent of cache coherency and stale data in operating systems.
  4. We learned how indexes are a strategy to speed up database transactions. Where as the underlying data in a database may be vast and would take time to access some data directly, having sorted indexes as a fast, organized intermediary for lookup operations makes absolute sense intuitively and our lab 14 helped demonstrate what this looks like.
  5. Though I am still confused at times, we've learned how to create database ER diagrams that help us visualize what we are implementing and can be used to to generate schema. I can image that in many cases it would probably be easier to show someone an ER diagram rather than handing them the schema definition or trying to explain the database to them verbally.

Three questions I have:

  1. I don't know anything about multimedia storage and pattern recognition systems. I wonder about the nature of the databases those systems use for fast lookup. For example, what type of data storage and retrieval mechanisms do apps like Google Lens or Shazam use?
  2. How have databases affected storage technologies? Have file system implementations been developed with databases in mind? How about storage firmware/hardware?
  3. I want to learn more about databases deployed on embedded devices and edge computer systems. I imagine that there are specific differences to help them perform with limited processing power and storage capacity.

CST363: Week 8 (Week 24)

 The three most important things I learned in this course: 1) What databases are and their advantages over for example, a flat file or sprea...