Today we'll review key ideas from last year's seminars to set up this upcoming year.
Specify is a relational database. Each object is a row, but the rows have references to each other
<html> <body> Hello, world! </body> </html>
How do we get our data back into a spreadsheet form?
Demo for queries.
AND
OR
NOT
The previous operations are at the "row" level. Once you have your data in a "good enough" spreadsheet, you often need to transform each cell.
If a regular text pattern exists, it's very easy for a computer to perform split or find operation on it using regular expressions.
\d\d\d-\d\d\d-\d\d\d\d
604-822-2301
Paul.*Bucci
Paul Alexander Hendrik Bucci
Paul A. H. Bucci
If you can structure data in a key-value(s) pair, it is easy for computers to perform find-replace operations.
key
value(s)
"P. Bucci": {"Paul Bucci", "Paul A. H. Bucci"}
Sometimes you can give the computer examples of well-labeled "correct" and "incorrect" data, and it will make a guess as to whether the item is correct or incorrect. Most AI will be in this format.
Chaining together multiple operations is called a pipeline. For example, Sheila's work learn student needed something that looked like this: