Elif Turan, Mathematics and Statistics
Applications of Bayesian Statistics in Industry
Project advisor: Marie Ozanne
Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data.1 With the great abundance of data in our current world, it has been important to interpret data for better results in different fields. Therefore, a science that roots from statistics, data science has been a new popular area. Agriculture, biology, medical clinical trials, safety, sports, transportation, and various other fields have been hiring data scientists to interpret their data to make better decisions and to improve their efficiency.
One of the fields that benefits from data science is supply chain management. Supply chain management focuses on reaching the specified goals of the production, storage, distribution procedures of goods and services. Although it might not seem as familiar at first; Amazon, Walmart, Target, Shell, Unilever are a few companies that are working on their supply chain to improve their efficiency. Improvements in supply chain management provide various benefits: decrease in excess material use, waste, and production cost and products offered at a lower price to a wider population. In the implementation process, there are two commonly used statistical methodologies: frequentist and Bayesian. While both approaches take into consideration the prior information, they differ in the analysis process: frequentist approach uses only current data where Bayesian approach takes the prior information into consideration.2 This presentation aims to share the experience from a one-year long internship that used Bayesian statistics in supply chain management using various examples, therefore, demonstrate Bayesian statistics-based supply chain management solutions in different industries.
1 “Statistics Definition & Meaning.” Merriam-Webster. Merriam-Webster. Accessed March 20, 2022.
2 Gupta, Sandeep K. “Use of Bayesian Statistics in Drug Development: Advantages and Challenges.” US
National Library of Medicine National Institutes of Health. Medknow Publications & Media Pvt
Ltd, January 2012. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3657986/.
Grace Rhodes, Mathematics and Statistics
Simulation Study of Markov Chain Composite Likelihood and its Application in Recombination Models
Project advisor: Marie Ozanne
DNA sequencing technologies are rapidly advancing, allowing researchers access to data which is both high quality and highly detailed. In particular, these technologies are able to record the allele found at single nucleotide polymorphism (SNP) sites on individual haplotypes. A SNP is a site on the human genome where there is variation among humans. A haplotype is the genetic material that is inherited together from one parent; an offspring receives one haplotype from their mother and one haplotype from their father. These data provide crucial insight into the passing of genetic material from parent to offspring. A central goal for this SNP data is thus to enrich our understanding of human evolutionary history through hierarchical trees of SNP inheritance and identify SNPs associated with disease. However, while this detailed SNP data exists for extant humans, this is not the case for their ancestors. There is a need to estimate backward across generations and a need for statistical methods that perform this estimation. The statistical question is: If we observe current descendants’ SNP sequences, how can we estimate the unknown ancestor’s SNP sequence while considering biological complexities? Recombination, a biological complexity involving an exchange of genetic material between chromosomes during meiosis, gives descendants haplotypes which don't match their ancestors’ on one or more sites.
Sun (2011)1 proposed the Recombination Model which estimates the unknown ancestral distribution of SNP sequences from observed descendant sequences while considering a fixed probability of recombination. Here, we run simulations to assess this estimator's performance, focusing on the use of Markov Chain Composite Likelihood and fixed quantities. Simulation results show that Sun’s method yields estimates of the ancestral distribution which are, on average, close in value to the true simulated probabilities and robust to selected recombination probability. The selected Markov chain order m mediated the success of the estimation: larger m produces estimates closer to the true value, but at the cost of more variability. Overall, this method tends to underestimate the probability of a given sequence, but the misplaced probability is assigned to similar sequences. This suggests that the Recombination Model is a method of estimation which is able to identify an ancestors’ possible SNP sequences with about 85% accuracy. We therefore conclude that the Recombination Model can provide useful estimates of the ancestor’s SNP sequences for use in genetics research.
1 Jianping Sun. (2011). Composite Likelihood in Long Sequence Data [Dissertation in Statistics for the Degree of
Doctor of Philosophy, The Pennsylvania State University].
Ayla Osgood, Physics
Using Simulations to Probe the Interfacial Disorder of Organic solar Cells
Project advisor: Katherine Aidala
Solar cell technology is constantly evolving, pushing the limits of efficiency while working to conserve costs. Organic solar cells are promisingly inexpensive to make but are much more disordered than conventional materials, resulting in lower efficiency. This disorder impedes the movement of electrons by “trapping” them temporarily. These traps greatly influence efficiency of organic solar cells but are not completely understood; grasping all the intricacies would enable us to build more efficient devices. Typical solar cells are layers of semiconducting materials, with the two most active layers being the donor and acceptor. The disorder at the interface between these layers can significantly impact the device efficiency but it is still unclear exactly how.
To investigate the role of this interface we created 5 devices in a solar cell modeling software, gpvdm, with the same structure and materials as 5 physical devices, previously fabricated in the Arango lab. Importantly, our simulated devices have an interface layer between the donor and acceptor whose properties we can independently adjust. By fitting the data from our physical devices with the simulation we could find the values of many of the device parameters which are hard or impossible to measure experimentally. Once the output from our simulated device matched that of the experimental device, we individually adjusted the trap parameter values of the interface and values the donor and acceptor together to determine the relative impact of each on the curve. We found the impact of the interface to be much greater than that of the rest. Furthermore, a number of our fits found suggestive evidence that there was more disorder at the interface than in the donor and acceptor, though further research must be done to confirm this.
Nina Gilkyson, Physics
Structural and Vibrational Investigation of Römerite under Icy Satellite and Martian Conditions
Project Advisor: Darby Dyar
In their spectral investigations of the surface and interior compositions of Jupiter’s icy satellites, researchers have proposed hydrated salts may be a principal con- stituent. Hydrated sulfate salts have also been identified by in situ studies on Mars. R ̈omerite, Fe2+Fe 3+ 2 (SO4 )4 · 14 H2O is a mixed valency hydrated sul- fate salt, a natural sample of which was obtained from the Caltech Mineral Collection. Here, single crystal X-ray diffraction, synchrotron infrared spec- troscopy, Raman spectroscopy, and M ̈ossbauer spectroscopy are used to char- acterize r ̈omerite in low temperature, high pressure icy satellite and Martian conditions. Single crystal X-ray diffraction measurements were done in 20K temperature intervals from 300-100K, including a reverse temperature series. R ̈omerite’s crystal structure was refined at each temperature to an agreement factor 0.030043 ≤ R1 ≤ 0.036565 and a goodness of fit 1.00956 ≤ S ≤ 1.02128 on the basis of 2747-2942 reflections (Fo ≥ 4σF ). The evolution of r ̈omerite’s unit cell parameters, atomic positions, and bond properties with temperature were quantified. Low temperature and high pressure synchrotron infrared spec- troscopy measurements in a diamond anvil cell were performed ranging from 20-300K and 0-8 GPa. Bonding environments of the H2O and SO4 groups were characterized with temperature and pressure. These results are compared with other hydrated sulfate phases relevant to icy satellite and Martian surfaces and interiors.
Ruozhen Gong, Physics
Braiding Dynamics in Active Nematic Systems
Project advisor: Spencer Smith
Active nematics are non-equilibrium fluids composed of rod-shaped microtubule bundles that turn ATP energy into extensile motion. For 2D active nematic microtubule systems, flows are largely characterized by the dynamics of mobile defects in the nematic director field. As these defects wind about each other, their trajectories trace out braids. We propose a coarse-grained model of the system that focuses on the topological braids formed by the space-time trajectory of topological defects. Instead of equations of motion, we study the types of braids that are formed by the trajectories of topological defects, and we show that there exist restrictions on the types of braids that can form. In particular, we consider the topological entropy of braids, which quantifies how chaotic the system must be. We conjecture that the emergent defect dynamics are optimal in that they produce braids which maximize the topological entropy. Our model of the active nematic system using topological braids is a minimal description, in the sense that the braids effectively capture the motion of defects and the mixing of the system, while requiring a much smaller degree of freedom than the traditional approaches.