13. Automate#

Why automate?

  • Make life simpler.

  • Ensure reproducibility.

  • Enhance efficiency.

  • Foster collaboration.

  • Achieve scalability.

Efficiency Through Command Line

Bash, the Bourne Again SHell, is a cornerstone of computing mastery. Beyond command execution, Bash exemplifies streamlined operations and a profound understanding. It offers controlled automation. While interactve graphical user interfaces (GUIs) provide a superficial interaction, Bash delves deeper, facilitating a richer conversation with the system. Locate the terminal on your computer (search for ‘Terminal’ in MacOS or ‘Command Prompt’ in Windows). Then, type ls -l to list the files in your current directory. You’ve just executed your first Bash command.

Bash Basics:
Question: Which of the following commands lists the files in a directory in Linux?

A) dir
B) ls
C) show
D) open

# view from a MacOS terminal

(myenv) d@Poseidon 1.ontology % ls -l
total 1376
-rw-r--r--@   1 d  staff    3993 Aug 25 01:21 README.md
drwxr-xr-x@  13 d  staff     416 Aug 25 02:49 alpha
drwxr-xr-x@  35 d  staff    1120 Aug 24 14:52 bloc
drwxr-xr-x@   9 d  staff     288 Aug 21 12:45 cst
drwxr-xr-x@  16 d  staff     512 Aug 24 17:47 git
drwxr-xr-x@  17 d  staff     544 Aug  7 22:35 git-filter-repo
drwxr-xr-x@   7 d  staff     224 Aug  6 07:33 myenv
drwxr-xr-x@  13 d  staff     416 Aug 14 18:53 nh_projectbeta
-rw-r--r--@   1 d  staff   77753 Aug 23 12:44 particle_animation.gif
-rw-r--r--@   1 d  staff     463 Aug 23 12:43 populate_be.ipynb
-rw-r--r--@   1 d  staff  562412 Aug 25 06:37 python.ipynb
-rw-r--r--@   1 d  staff   34173 Aug 23 12:47 r.ipynb
drwxr-xr-x@  13 d  staff     416 Aug 15 02:12 seasons_projectalpha
drwxr-xr-x@  13 d  staff     416 Aug 17 14:05 sibs
-rw-r--r--@   1 d  staff     231 Aug 24 17:08 stata.do
-rw-r--r--@   1 d  staff    1196 Aug 23 18:43 stata.dta
-rw-r--r--@   1 d  staff    1892 Aug 24 17:03 stata.log
-rwxr-xr-x@   1 d  staff      90 Aug 24 17:08 stata.sh
drwxr-xr-x@ 139 d  staff    4448 Jun 25 08:29 summer
drwxr-xr-x@  25 d  staff     800 Jul 20 20:21 verano
(myenv) d@Poseidon 1.ontology % 

To see the contents of a simple bash script, type cat stata.sh. The cat command concatenates the contents of a file and prints them to the terminal. The stata.sh file contains the following:

(myenv) d@Poseidon 1.ontology % cat stata.sh
export PATH=$PATH:/applications/stata/statase.app/contents/macos/
stata-se -b do stata.do
(myenv) d@Poseidon 1.ontology % 

This tells us that the stata.sh script adds the Stata executable to the path and then executes the stata.do file. To run the script, type bash stata.sh. The bash command executes the script. You can also execute the script by typing ./stata.sh. The ./ tells the terminal to execute the script in the current directory. The stata.sh script is a simple example of how Bash can automate tasks. It’s a simple script, but it can be used to automate the execution of Stata code.

But before we run that lets see the contents of the stata.do file

(myenv) d@Poseidon 1.ontology % cat stata.do
if 0 { 

    cd /applications/stata
    ls -kla
    export PATH=$PATH:/applications/stata/statase.app/contents/macos/
    stata-se -b do stata.do

}

webuse auto, clear 
l make price mpg foreign in 1/9
di "`c(N)' obs & `c(k)' vars"%  
(myenv) d@Poseidon 1.ontology % 

Now let’s run the script

(myenv) d@Poseidon 1.ontology % ./stata.sh

(myenv) d@Poseidon 1.ontology % cat stata.log

  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      18.0
___/   /   /___/   /   /___/       SE—Standard Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Single-user , expiring 27 Mar 2024
Serial number: 401809200438
  Licensed to: Poseidon
               Johns Hopkins University

Notes:
      1. Stata is running in batch mode.
      2. Unicode is supported; see help unicode_advice.
      3. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.

. do stata.do 

. if 0 { 
. 
.     cd /applications/stata
.     ls -kla
.     export PATH=$PATH:/applications/stata/statase.app/contents/macos/
.     stata-se -b do stata.do
. 
. }

. 
. webuse auto, clear 
(1978 automobile data)

. l make price mpg foreign in 1/9

     +-----------------------------------------+
     | make             price   mpg    foreign |
     |-----------------------------------------|
  1. | AMC Concord      4,099    22   Domestic |
  2. | AMC Pacer        4,749    17   Domestic |
  3. | AMC Spirit       3,799    22   Domestic |
  4. | Buick Century    4,816    20   Domestic |
  5. | Buick Electra    7,827    15   Domestic |
     |-----------------------------------------|
  6. | Buick LeSabre    5,788    18   Domestic |
  7. | Buick Opel       4,453    26   Domestic |
  8. | Buick Regal      5,189    20   Domestic |
  9. | Buick Riviera   10,372    16   Domestic |
     +-----------------------------------------+

. di "`c(N)' obs & `c(k)' vars"
74 obs & 12 vars

. 
end of do-file
(myenv) d@Poseidon 1.ontology % 

We can see that the script ran the stata.do file and printed the results to stata.log in the terminal. This is a simple example of how Bash can automate tasks. It’s a simple script, but it can be used to automate the execution of Stata code.

Overshadowed by Aesthetic Simplicity

In the face of intuitive graphical user interfaces (GUIs) such as Windows, MacOS, and Android, depth of Bash is often overshadowed. With their colorful icons and all you have to do is click on what you want, GUIs are aesthetically pleasing. For many, clickable icons and immediate visual feedback make command-lines appear daunting. Yet, those willing to embrace Bash will find the experience extremely rewarding.

Early steps

Jupyter

Jupyter Notebooks optimize data analysis automation, integrating code, text, and visuals. They’re paramount in this Fena platform, simplifying work-sharing and reproducibility. This platform is built on Jupyter Notebooks and GitHub pages and seeks to automate data analysis and share analysis results (as the simple example above shows)

Workflow

  1. Initiate a Jupyter Notebook: Using Google Colab, Jupyter Lab, or VS Code.

  2. Develop Code: Use the built-in code editor in Jupyter Lab or VS Code. Our unique workflow even allows code writing in non-traditional Jupyter languages like Stata.

  3. Execute the Code.

  4. Save the Notebook.

  5. Share the Notebook: Ranging from basic sharing via Google Colab to advanced methods like pushing to GitHub pages.

Python

Python’s large user community and integration with Jupyter make it an ideal choice for automating data analysis.

Libraries Overview:

  • Data Analysis: Use Pandas for data manipulation and analysis.

  • Visualization: Matplotlib and Seaborn are renowned for plotting and graphing.

  • Machine Learning: Scikit-learn simplifies machine learning model creation.

  • Statistical Modeling: Utilize Statsmodels.

  • Bayesian Modeling: Turn to PyMC3.

  • Deep Learning: Opt for TensorFlow, Keras, or PyTorch.

  • Web Scraping: Scrapy, Beautiful Soup, Selenium, and Requests are your go-to libraries.

  • Natural Language Processing: Use NLTK, SpaCy, or Gensim.

  • Network Analysis: Explore networks with NetworkX, PyGraphviz, or PyTorch Geometric.

See Python chapter for a simple hands-on exercise

Later steps

  1. Familiarize yourself with basic Unix commands.

  2. Construct a bash script for executing code in Python, R, Stata, etc.

  3. Set a cron job to automate the bash script execution.

GitHub Actions & Automation GitHub Actions can be incredibly powerful, especially as your project grows and you might benefit from automation. Here are some tutorials and resources that can help you learn more about GitHub Actions:

Official Documentation

  1. GitHub Actions Documentation: This is the official documentation and is quite comprehensive. It provides a great overview and dives into more complex topics.

Online Tutorials

  1. GitHub Actions for Beginners: This is a simple, straightforward tutorial aimed at beginners.

  2. Automate your Workflow with GitHub Actions: A more in-depth look into GitHub Actions, great if you’re already familiar with CI/CD concepts.

Articles and Blog Posts

  1. Intro to GitHub Actions: A Medium article that walks you through the basics.

  2. Automating Workflows with GitHub Actions: This post provides a deep dive into automating various workflows.

Courses and Workshops

  1. GitHub Actions: Continuous Delivery - This course on GitHub’s own training platform is a more structured way to learn GitHub Actions.

  2. Automate your code-to-cloud workflows with GitHub Actions - A workshop that provides hands-on experience.

Sample Repositories

  1. Awesome Actions: A curated list of custom actions, articles, and other resources.

The landscape is constantly changing, so it’s worth keeping an eye out for new tutorials and resources. But these should give you a good starting point.


Quantified Tokens as Checklist

Automation, Iteration, and Token Quantification: Lessons from Icons in Clinical Research

In the realms of clinical research, where precision and excellence are paramount, the wisdom of industry icons like Chesley Sullenberger, Peter Pronovost, and Marty Makary holds particular significance. These luminaries have not only left indelible marks on their respective fields but have also showcased the enduring relevance of automation, iteration, and token quantification in achieving exceptional outcomes.

Chesley Sullenberger: Precision in the Skies

Captain Chesley “Sully” Sullenberger, celebrated for his remarkable “Miracle on the Hudson,” exemplifies the critical role of precision in aviation. Beyond his heroic water landing, Sully’s commitment to safety underscores the importance of automation. In aviation, where the margin for error is slim, automation is a linchpin. Sophisticated autopilot systems and flight management computers ensure that aircraft navigate with unparalleled precision, minimizing the potential for human error.

Yet, Sully’s story also teaches us the value of iteration. Pilots undergo rigorous, iterative training to sharpen their skills and adapt to evolving technologies. Each flight is a learning experience, and continuous improvement is a non-negotiable ethos in aviation. The lessons from Sully’s iconic landing extend beyond the cockpit, emphasizing that iteration is not confined to one profession but is a universal pursuit of excellence.

Peter Pronovost and Marty Makary: Checklist Champions

In the world of clinical research and healthcare, Peter Pronovost and Marty Makary are renowned for their groundbreaking work on checklists. Their efforts have saved countless lives by reducing medical errors and enhancing patient safety. Their checklist methodology embodies the principles of automation and token quantification.

Checklists automate critical steps in complex medical procedures, ensuring that healthcare professionals adhere to best practices. These checklists act as safeguards against error, promoting precision in high-stakes environments. Moreover, they exemplify token quantification, as each item on the checklist represents a crucial token in the process. By quantifying these tokens, Pronovost and Makary provide a clear roadmap for achieving precision in healthcare.

Token Quantification: A Universal Approach

Incorporating token quantification into clinical research can lead to more rigorous and error-free studies. By quantifying the tokens within research papers, scientists can ensure that their work aligns with industry standards. Just as checklists in healthcare represent essential tokens for patient safety, token quantification acts as a checklist for precision in research.

Automation, iteration, and token quantification, as demonstrated by these icons, transcend specific professions. They are universal principles that underpin excellence. Whether you’re navigating a commercial airliner, conducting life-saving surgery, or conducting clinical research, these principles serve as beacons of precision. By heeding the lessons of Sully, Pronovost, and Makary, professionals in any field can chart a course toward ever-higher levels of excellence and precision.