Ancora Imparo

Sunday, June 22, 2025

CST363: Week 8 (Week 24)

The three most important things I learned in this course:

1) What databases are and their advantages over for example, a flat file or spreadsheet. Learning about tables, collections, schemas, etc. helps me imagine how ubiquitous applications are designed and about design considerations that are at work. The exposure in this class bridges the gap so that in the future if I need to design a database into a system that I am working on, I will at least have a foundational understanding of what it would entail.

2) Optimizations and considerations that come into play when choosing and designing databases. For example, indexes can speed up queries when dealing with larger databases but might be a hinderance with smaller ones. Another one is how normalization can be used as a way to preserve data integrity. All this to say that there are better and worse ways to design a database and there are well-researched approaches.

3) The libraries for creating and accessing databases in our programs was very important. I don't see myself pivoting and going into database administration/design/engineering but I can imagine having to deploy a database in a future software development project on the job. Having now worked on programming assignments that use popular database libraries, I at least have some idea of what those solutions might look like.

CST363: Week 7 (Week 23)

MySQL and MongoDB are both databases designed to store data and support operations that allow users to access and modify the contents. Speed, scalability, data integrity, and security are among the reasons why databases are preferred over spreadsheets and flat files in applications with large data sets and more complex transactions.

MongoDB is a document-ordered database where data is stored in files written using a markup lanugage like XML, JSON, etc. It is not as rigid so the flexibility lends itself to implementations where data tends to be unstructured or changing frequently. For example, someone designing a database using MySQL and finds that there will be wide variation between the number of columns that each row entry will use/need, it would be worth looking into MongoDB instead. MongoDB is also better suited for applications where a lot write activity is expected. That is for a number of reasons including restrictive locking and overhead that comes with maintaining data integrity.

MySQL is a relational database where the relationships between data elements dictates how the data is store and accessed/presented. It is more tightly structured than MongoDB so it is better suited for well organized data that is fairly consistent.

CST363: Week 6 (Week 22)

For this week, the focus was on database programming and embedded SQL, as opposed to transactions, design, architecture, etc. We were introduced to JDBC, Java's database API which supports different databases. In our case we used MySQL for Lab 17 and for our Lab 19 group project. We learned how to setup a database connection within our program, update entries, query, and close the connection. The API and the amount of support that it has make it a powerful tool.

CST363: Week 5 (Week 21)

Mark Winand, the author of the site, "Use the Index Luke," has a page entitled, "slow indexes."

Winand explains that "slow indexes" is an incorrect descriptor that is used due to misunderstanding how index operations work. In general, we know that for large databases that don't have write-heavy workloads, well designed indexes tend to speed up queries. Winand points out that many of those who attribute slow indexes to unbalanced or malformed trees aren't taking into account that index lookup is constitued by three steps: 1) tree traversal, 2) following the leaf node chain, and 3) extracting the data from the table. Rather, they incorrectly believe that the index look up only includes step 1 (tree traversal).

Winand is just explaining that "slow indexes" is a misnomer. Indexed database queries are faster when deployed for the right applications. The shortcoming is that we aren't always clear on what the underlying process or technology is doing and make incorrect assessments. It's a good example of a case where it pays to look under the hood and take a closer look.

CST363: Week 4 (Week 20)

CST 363 at the half way mark...

Five things I've learned:

I've learned a little bit about the programmatical underpinnings of databases. Up through Lab 14, using Java to represent data and perform database-like operations has been helpful in getting an idea of what is happening behind the scenes in a database.
I have a much better understanding of why SQL for example, is a relational database. Designs that link and overlap data make sense especially when it comes to preventing data duplication.
Following from number two, normalization was a good lesson in de-duplication and storage efficiency. We learned a bit about how just-in-time operations and binding related data across the database can prevent inconsistencies that might spring up after data is changed. The problem is reminiscent of cache coherency and stale data in operating systems.
We learned how indexes are a strategy to speed up database transactions. Where as the underlying data in a database may be vast and would take time to access some data directly, having sorted indexes as a fast, organized intermediary for lookup operations makes absolute sense intuitively and our lab 14 helped demonstrate what this looks like.
Though I am still confused at times, we've learned how to create database ER diagrams that help us visualize what we are implementing and can be used to to generate schema. I can image that in many cases it would probably be easier to show someone an ER diagram rather than handing them the schema definition or trying to explain the database to them verbally.

Three questions I have:

I don't know anything about multimedia storage and pattern recognition systems. I wonder about the nature of the databases those systems use for fast lookup. For example, what type of data storage and retrieval mechanisms do apps like Google Lens or Shazam use?
How have databases affected storage technologies? Have file system implementations been developed with databases in mind? How about storage firmware/hardware?
I want to learn more about databases deployed on embedded devices and edge computer systems. I imagine that there are specific differences to help them perform with limited processing power and storage capacity.

Wednesday, May 21, 2025

CST363: Week 3 (Week 19)

1. What is an SQL view. How is it similar to a table? In what ways is it different (think about primary keys, insert, update, delete operations)?

A view table is generated by a view query - a SELECT statement which uses existing tables as inputs. Views are similar to tables in that they are both organized/presented in row-column format. There are a few advantages that views have over tables. For example, views can omit data that was contained in the base tables. For example, a table containing patient data (patient ID, SSN, home address, medical history,etc) was used in combination with a table specifying zip codes at risk for certain ailments, a view could be created to show only names and addresses for patients who should be notified of the risk without divulging SSNs. Views persist in the database so they are also useful for saving complex queries for future reuse and improving readability by not having to use the verbose definition of the query data every time. Lastly, they are a way to save optimized queries for future use by database users. Similar to to the way that some platform/chip makers distribute software libraries with their developer kits and hardware.

Views are different from standard tables in that the data generated by the view is not stored. Rather, every time the view is referenced, the view query is executed.

The View is also subject to the limitations of the underlying base table. For example, an insert into a view that doesn't include the base table's primary key is not allowed. It would generate a null primary key which is forbidden. For aggregate values in view tables, the problem is that back propagating the insert to the base tables would would lead to undefined behavior so it is forbidden. For example, inserting into a view with rows containing AVG outputs of 3 base table values per entry. If the view insert operation set the AVG for the new entry to 4, how would SQL know whether to populate the 3 input values to (3, 4, 5), (5, 4, 3), (4, 4, 4), (8, 0, 0), etc.? Lastly, in view table entry delete operations the base table primary and/or foreign keys may not be included in the view so this could lead to ambiguities if for example the delete operation matches a view table entry that matches multiple base table entries. This is another forbidden operation that is specific to view tables and not standard tables.

2. We have completed our study of SQL for this course. This is not to imply that we have studied everything in the language. There are many specialized features such as calculating rolling averages, query of spatial data (data with latitude and longitude) coordinates, and more. But take a minute to think about how SQL compares to other programming languages such as Java. What features are similar , and which are present in one language but not in the other? For example, Java has conditional if statements which are similar to SQL WHERE predicates, the SELECT clause is similar to a RETURN statement in that it specifies what data or expression values are to be returned in the query result (although it is strange that a statement should specify the RETURN as the first part of a SELECT.

It's not Java but the "drop database if exists <database name>" directive is reminiscent of the C++ directive to prevent include the same code twice:

   #ifndef HEADER_FILE_H
   #define HEADER_FILE_H
   ...
   #endif

As for Java, even though it is not outwardly apparent you can see that there's implicit use of loops and iteration when operations are performed using multiple values in a column. For example, when we use SUM or AVG on numeric values, SQL has to iterate through each row and add each value to the running total. In the case of AVG, it also must have a count to use as a denominator.

The way views are executed when referenced and not cached is reminiscent of volatile variables in C/C++/Java in that they are accessed in real-time.

There are also SQL and database constructs where we can see that something like references(Java) or pointers(C/C++) are used. In single and muli-level indexes the behavior is similar to a using the name of a reference which stores a memory address, to access the memory at that location, which contains the actual entry, or in the case of multi-column indexes, another index.

Wednesday, May 14, 2025

CST363: Week 2 (Week 18)

SQL has the flexibility to join tables on any column(s) using any predicate (=, >, < ). Most of the time the join will use equality between a primary and foreign key. Think of example where joining on something other than keys would be needed. Write the query both as an English sentence and in SQL. If you can't think of your own example, search the textbook or internet for an example.

If a manufacturer imports raw materials, a Materials table migth contain columns for the material name (e.g. aluminum, lithium, lumber, etc.), supplier name, and address fields for the supplier including a column for the country. In that case the primary key would be a composite key formed by the material name and the supplier name.
A second table tracking tariff percentages might have a column for country ID (primary key), country, and tariff rate.
The join could be done using the country columns as follows to show what tariff rate is associated with each supplier:

SELECT *
FROM Materials M
INNER JOIN Purchases P
ON M.country = P.country;

What is your opinion of SQL as a language? Do you think it is easy to learn and use? When translating from an English question to SQL, what kinds of questions do you find most challenging?

So far, SQL hasn't been too difficult to learn. I would rate it as moderate difficulty since on one hand it is completely new to me and on the other hand the reserved words and organization are relatively intuitive. I need to spend more time reviewing the nested queries and the join operations to really get comfortable with them. They don't always translate to English as easily so it takes a bit more thought for me sometimes to see what is happening.