1. Last 7 days
    1. hile this extensive assemblageof actors, spread across documents and offices, is deployedto stabilize the fine and solve her problem, Ms. Freire mightcome to the conclusion that the PSIU is either corrupt or in-efficient—perhaps both!

      This story about Ms. Freire's complaint shows just how slow and frustrating the government process is. Even though the system is technically working, it's so slow that the person who made the complaint feels like the government is failing her. This highlights the massive gap between rules on paper and real-world results.

    2. theheterogeneity of “noise” as an umbrella concept, the complexity of its scientific mensuration, and the unsteadiness of itslegal encoding make this a particularly difficult object for the state to grasp.

      This is the core argument of the whole article. The author claims the government struggles to handle noise because "noise" itself is a vague concept, it's hard to measure scientifically, and the laws for it are shaky. This sets the stage for everything else.

    3. I present law enforcement assemblages asboth unstable and heterogeneous, managed by people withdifferent (and often diverging) expectations regarding how thecity should sound.

      The author doesn't see law enforcement as a single, unified body. Instead, he describes it as an "unstable and heterogeneous" network of people who often disagree on how the city should sound. This means the police and the anti-noise agency have different goals and ideas

    Annotators

    1. “[Prisons censor] the things that you would probably think about as most threatening — materials that might enhance violence or maybe encourage some other sorts of subversive behavior,”

      Subversive meaning what? If inmates stand up against abuse?

    1. For many of us, the main audience of our research is our colleagues. But colleagues is about as vague and useless a term as “the public” and so when you develop a new project you should pin down (for yourself) the scholarly topics that your work will contribute to, and some examples of individuals who do related work.

      I have never trained an LLM to do a specific function for myself before. It will be interesting to see if I will have to fully direct the program to research different sites specifically (narrow the hypothesis down for the program).

    1. formerness of the “West” as aprogressive theodicy, leveled downby necrocapitalist extraction, whileit still exerts a disproportionatecapacity to project violence acrossthe globe and on its residents.

      necrocapitalism as institution =/= discordant with ideals of progressive theodicy, built environment "frayed"(?) as a result of necrocapitalism & global violence?

    2. Between Not Everythingand Not Nothing: Cuts TowardInfrastructural Critique

      chrome-extension://bjfhmglciegochdpefhhlphglcehbmek/pdfjs/web/viewer.html?file=file%3A%2F%2F%2FUsers%2Fprestontaylor%2FDownloads%2FC3.VISHMIDT_Between%2520Not%2520Everything%2520and%2520Not%2520Nothing_FORMER%2520WEST%25202016.pdf

    Annotators

    1. “Who gets to use what I make? Who am I leaving out? How does what I make facilitate or hinder access?”

      This ethical question stresses me out less about the implications of the work created by a LLM trained by myself - the work is meant for me and not meant to be shared amongst many people. Although this shouldn't deter people from using digital means, it should remain an important thought through the creation process.

    1. A better approach, from a pedagogical point of view, is to encourage students to explore and try things out, with the grading being focused on documenting the process rather than on the final outcome.

      The whole purpose of this course is exactly this - we are introduced to digital archaeology, new note-taking methods, programs, etc., create a hypothesis and start an experiment. There is no end goal of a proven hypothesis but instead a focus on the journey to an answer.

    1. By manipulating the code that produces these images in both random and patterned ways, we manipulate the meaning of the image and the way in which these images communicate information to the viewer.

      This directly correlates with my last annotation - if someone was to look for answers pertaining to something they must not rely directly off the information given by a program; programs are trained by humans - we are easily susceptible to perpetuating our own biases and this exhibits itself in our work.

    1. This is the place. And I am here, the mermaid whose dark hair streams black, the merman in his armored body.

      The speaker finally reached the wreck and become part of it. Calling herself both the mermaid and the merman could suggest that she feels connected to different sides of human experience, not limited by gender.

    2. I came to explore the wreck. The words are purposes. The words are maps.

      This part might mean that the speaker’s goal is to truly explore and understand the wreck, not just read about it. The “words” could represent stories or explanations that guide her, like maps, but she still needs to see the truth for herself.

    3. I go down. My flippers cripple me, I crawl like an insect down the ladder and there is no one to tell me when the ocean will begin.

      This might describe how the she feels awkward and alone while going deeper into the sea. The image of crawling “like an insect” could suggest weakness or discomfort.

    1. Digital tools and their use are not theory-free nor without theoretical implications. There is no such thing as neutral, when digital tools are employed.

      This is a common theme amongst papers pertaining to digital being more incorporated into different streams. The implications of relying more and more on the digital means leaves opportunities of theoretical changes.

      One of the things I am skeptical about when using AI trained programs is it authenticity regarding sourcing, biases, etc. and is definitely something that relates to my topic, as it pertains to religious beliefs, ethics, etc.

    1. IV
      • Informativo 1062
      • RE 964659 / RS
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. DIAS TOFFOLI
      • Julgamento: 05/08/2022 (Virtual)
      • Ramo do Direito: Administrativo, Constitucional
      • Matéria: Servidor Público; Remuneração/ Direitos e Garantias Fundamentais; Salário Mínimo

      Servidor público: jornada de trabalho reduzida e remuneração inferior ao salário mínimo

      Tese fixada - É defeso o pagamento de remuneração em valor inferior ao salário mínimo ao servidor público, ainda que labore em <u>jornada reduzida</u> de trabalho.

      Resumo - É inconstitucional remunerar servidor público, mesmo que exerça jornada de trabalho reduzida, em patamar inferior a um salário mínimo.

      • O direito fundamental ao salário mínimo é previsto constitucionalmente para garantir a dignidade da pessoa humana por meio da melhoria de suas condições de vida (CF/1988, art. 7º, IV), garantia que foi estendida aos servidores públicos sem qualquer sinalização no sentido da possibilidade de flexibilizá-la no caso de jornada reduzida ou previsão em legislação infraconstitucional (CF/1988, art. 39, § 3º).

      • A leitura conjunta dos dispositivos constitucionais atinentes ao tema, somado ao postulado da vedação do retrocesso de direitos sociais, denota a finalidade de assegurar o mínimo existencial aos integrantes da Administração Pública Direta e Indireta com a fixação do menor patamar remuneratório admissível, especialmente se consideradas as limitações inerentes ao regime jurídico dos servidores públicos, cujas características se distinguem do relativo às contratações temporárias ou originadas de vínculos decorrentes das recentes reformas trabalhistas.

      • Com base nesse entendimento, o Plenário, por maioria, ao apreciar o Tema 900 da repercussão geral, deu provimento ao recurso extraordinário para devolver os autos ao tribunal de origem para continuidade de julgamento, a fim de que sejam decididas as demais questões postas no apelo, observados os parâmetros ora decididos.

      Legislação: CF/1988: arts. 7º, IV; e 39, § 3º.

    2. Quando houver deficit atuarial
      • Considerando que - conforme estipula o § 18 do art. 40 - a contribuição para o RPPS somente incide sobre o valor que exceder o teto do RGPS, havendo déficit atuarial, poderá incidir contribuição sobre aposentadoria e pensões que superarem o valor do salário-mínimo.
    1. To make studies auditable

      uditing kan vervalste data onthullen, zoals aangetoond in de blootstelling van LaCour's frauduleuze Science paper. Tijdschriften zouden basis verificatie praktijken kunnen aannemen om data authenticiteit en wangedrag te voorkomen.

    2. ave the results been selectively reported? Techniques like funnel plots and excessive significance test

      om ontbrekende resultaten de detecteren ze onthullen niet de onderliggende effecten maar ze bevestigen meer dat er een publicatiebias is.

    3. The F-Statistics

      worden berekend met resiudal mean squares maar zijn niet onafhankelijk dus de type 1 fout wordt niet meteen gecorriceerd. als we spreken van onafhankelijk dan controleren we de type 1 meteen

    4. a simple effects analysis

      een manier van analyseren waarbij je het effect van een onafhankelijke variable ziet op alle levels van een andere waardoor je de interacties kan ontleden en interpeteren dus het effect van een type gezicht op elk level van alchohol elke vergelijking geeft een f statisitiek om te kijken of er echt een verschil is.

    5. In one plot, the gap between bars for attractive and unattractive faces shrinks at higher alcohol doses—indicating an interaction.  In another, the gaps remain similar across all alcohol levels—indicating no interaction

      in b zie je kloven vergelijkbaar met alle alcholniveaus en bij a echt een verschil opeens bij een hogere alchohol

    6. Much of the outcome variance remains unexplained.

      zonder een covariate wordt veel van de ruis niet uitlegd waardoor dezze een grote rol heeft en de groepsverschillen kleiner worden

    7. ANCOVA

      het controleren voor de covariatie is het regressie onderdeel en het vergelijken van groepsgemiddelende het anova het corriceert groepsvergelijkingen voor de invloed van een covariaat zodat de effect zuiverder worden geschat. deze gecorriceerde groepsgemiddelende zonder de covariaat noem je adjusted means

    8. that a hypothesis is true given the data, not just whether the result could occur by chance

      dit zegt de p dus helemaal neit true given the data p geeft dat het voorkomt uit kans

    1. 1940s and 1950s,

      in the mid 1900's studies were done to find if any feelings like same sex attraction are common or not. S study of 11,000 women suggested that a significant number of them had felt same sex attraction at times or all the time and as the study's continued, it started to grow because of the recognition of "coming out" and others seeing that it wasn't just them.

    2. Sexual orientation

      Sexual orientation is what one prefers to have sexual relations with, while gender identity is what someone's wants to be seen and addressed as like their pronouns and appearance.

    1. erspective

      3 things that could help reduce gender inequality are 1. confronting stereotypes towards others in media or news. 2. Have the government fund rape crisis centers to spread awareness and to help those who have been victims. Finally, 3. Use political support to help push efforts to downsize the issue altogether.

    1. family structure

      basically, if the individual was 1. not a man, that means that they get less pay then male counterparts do because of glass ceilings and social discrimination.

    2. The poverty rate differences

      the poverty rate depends on how an individual was finically supported and also cand depend on the environment was like when they were growing up. If someone's family was poor, they would have a harder time of getting out of that.

    1. You say you do not know the lady's mind: Uneven is the course, I like it not.

      The Friar is warning that the marriage is rushed and that Paris doesn’t really understand Juliet’s feelings. This shows the Friar’s wisdom and foreshadows the tragedy that comes from moving too quickly.

    1. Continuity and Change,

      continuity and change basically means that topics like social change will forever continue to spread and change because the world and all of us are also constantly changing.

    1. In the current study, the ASRT system is superseded by the PACC5 cognitive composite for detecting MCI.In the current study the ASRT system is superior to the PACC5 for detecting Aβ positivity.

      Isn't better than PACC5 (current methods) for detecting MCI but works for the presence of amyloid plaques

    2. Alzheimer’s disease is not routinely screened for in clinical practice.1 Instead it is most commonly tested for when patients present with cognitive complaints, or after cognitive impairment interferes with daily functioning. Research indicates that half of individuals aged 65+ with dementia are missed from primary care dementia registers, which suggests that around 50% of cases remain undiagnosed even at the more advanced stages of Alzheimer’s disease.

      No current screening, AD currently massively underdiagnosed

    3. Simulation analyses indicated that in primary care, speech-based screening could modestly improve detection of mild cognitive impairment (+8.5%), while reducing false positives (−59.1%). Furthermore, speech-based amyloid pre-screening was estimated to reduce the number of PET scans required by 35.3% and 35.5%

      Okay pretty significant percentages but what is this based off?

    4. amyloid beta status (primary endpoint)

      Okay to amyloid beta status is a stand in for AD diagnosis. The study measures AI performance by how well it can predict amyloid status. If it looks likely, send them for a scan.

      So...does it work? And does it work better than current NPTs?

    5. The automatic story recall task was administered during supervised in-person or telemedicine assessments, where participants were asked to recall stories immediately and after a brief delay

      Data they obtained to feed the AI

    6. Early detection of Alzheimer’s disease is required to identify patients suitable for disease-modifying medications and to improve access to non-pharmacological preventative interventions

      Value

    1. The Vandals crossed the Rhine in 406 CE. After their victory over Romans at Adrianople, the Visigoths under Alaric became foederati or allied troops. Roman armies at this time consisted mostly of mercenaries, rather than citizen soldiers. Alarics demands for land, titles, and money were ignored, and between 408 and 410 he besieged the city several times.

      It’s wild how chaotic this time was! The Vandals crossed the Rhine in 406 CE, and the Visigoths under Alaric became allies of Rome after beating them at Adrianople. But when Rome ignored Alaric’s demands for land and money, he got fed up and ended up besieging the city multiple times between 408 and 410!

    1. Le Prince d’Aquitaine à la tour abolie

      Eliot’s line “Le prince d’Aquitaine à la tour abolie” which translates to “the prince of Aquitaine, his tower in ruins” is a direct reference to the identical line in Gerard de Nerval’s poem El Desdichado. The “tour abolie” or “tower in ruins” references back to the “falling towers” from earlier in the section, and, thus the “unreal city” referenced in “The Burial of the Dead” and in “A Game of Chess.” In these references, the city, which may seem at first glance to be bustling and full of life, is inverted upon further investigation. With “brown fog” and “tower in ruins,” the images that Eliot portrays of the urban environment is anything but inspiring. The “towers” that make up the city, falling or in ruins, are replicated in the structure of the poem, with the poem itself acting as an autonomous landscape, with the characters, Madame Sosostris, Tiresias, etc, going through the motions of life while surrounded by a world, or words, falling apart. Thus, at the end of The Waste Land, bringing this notion of “la tour abolie” to the forefront, the readers can end the poem, seeing the microcosm of references and urbanity crumbling beneath itself, with no hope of resurrection.

      In his annotation on the same line, Richard Lu compares the “tour abolie” and “seule étoile” (in the following line of El Desdichado) to the tarot deck. He states that “in tarot decks, The Star directly follows The Tower card. The Tower is often called the most dreadfall card as it often implies a sudden disaster which instates a change in your world. This change is extreme. However, after the tower falls, the Star appears, the tarot card of hope. However, Eliot ends with this reference. The Star is dead, there may not be a hope after the destruction of The Tower.” Concluding “The Waste Land,” this line, among others, works to support the desolation laid out since the very first line. Though approached with a dead and barren landscape, throughout the poem one can find a glimmer of hope, an ounce of inspiration between the clever minds and many characters that prolong the narrative. This ending defies all of that. Eliot, through all five parts of the poem, sets up an ending where divinity and faith have no place. There is no hope for the The Star tarot card coming next, it is dead. There is no God, nothing matters. In the wake of destruction, among the “falling towers” in ruins, one finally becomes as desolate and depressing as their surroundings.

    2. DA

      This section depicts an image of drought in India, drawing on the geography of the Ganges River (Ganga) and the Himalayan mountains (Himavant). The "limp leaves" and "sunken" river depict a world parched and waiting for a life-giving force, while the rain clouds gather "far distant," suggesting that salvation or meaning is possible but not yet present. The "jungle crouched, humped in silence" (line 399) adds a sense of primal, coiled tension, as if the entire natural world is holding its breath. This silence is then broken by the thunder, which says "DA." This sound as a three-part command structures the rest of this section; each command is met with a complex response, revealing a deep spiritual inadequacy. The first is "Datta," or "give". The speaker creates a self-interrogation by saying "what have we given?" (line 402) They later claim that "By this, and this only, we have existed." This "truth" is unrecordable though, and won't be found in "memories draped by the beneficent spider / Or under seals broken by the lean solicitor" (line 408). It exists only in the haunting echo of "our empty rooms." The thunder speaks again, saying "Dayadhvam," or "be compassionate." This triggers a scene from Dante's Inferno: "I have heard the key / Turn in the door once and turn once only" (line 412-413). This is the sound of Count Ugolino being locked in the tower to starve, a symbol of irrevocable imprisonment. Eliot then layers this with a philosophical idea from F.H. Bradley, who saw every individual consciousness as a isolated prison. Individuals are all trapped in their subjective selves, "each in his prison" (line 414) and their awareness of the isolation. Lastly the thunder commands Damyata. Unlike the previous two sections, which dwell on failure and isolation, this one offers a glimpse of harmony. It paints a picture of perfect control: a boat responding "gaily" to an "expert" hand. The speaker then poignantly extends this metaphor to a human relationship, saying "your heart would have responded / Gaily, when invited." The conditional tense "would have" is key—it reveals this harmonious control not as a reality, but as a lost opportunity or a poignant "what if." It’s a vision of a relationship that could have been obedient to "controlling hands," a symbiosis that was never achieved.

    3. Turn in the door once and turn once only

      Response to Sophie Perkel

      Reading the Brihadaranyaka source that connects to this section of the poem, I notice in Chapter five that the descriptions of "truth" and "falsehood" remind me of the third/fourth person from our previous readings. "The gods worship truth (satya), pronounced with three syllables, "sa-ti-yam". Sa and yam represent Truth, ti represents falsehood. "falsehood is surrounded on both sides by truth, and becomes truth." Falsehood being in-between is similar to how the "third person" is in-between, and if falsehood becomes truth by being surrounded by truth, then could the entity that is the third person, what does the "third person" become?

      Sophie has an intriguing point here about the "third (or fourth) person." Following her analysis of the three syllables "sa-ti-yam" of satya (truth), where "ti" means falsehood while the other two represent truth, then it would be logical to surmise that Eliot believes a third entity can become truthful if surrounded by truth. I don't necessarily believe that Eliot thinks falsehood can become true, but maybe that falsehood can become normalized and authentic to an individual. For example, if we build on my past annotation where I argued that there are three entities - God, an individual, and a third entity which tempts or distracts the individual from a clear path to Heaven or spirituality. In this case, the true entities that "count" (by the words of "the Thunder" earlier in this section) in life are God and the individual. Thus, that third entity which cannot be defined as it would be different for each entity is the falsehood because it should not be the core of living. Once we understand that, we can apply Sophie's theory and conclude that Eliot believes that although a distraction will never truly live up to the importance of an individual or the deity which they follow, enough prioritization of a sin will make it a habit and a necessity. In this way, that third entity of sin and temptation can become an authentic aspect of someone's life, because they have placed a falsehood on such a pedestal that it has become (to them) as truthful as God.

    4. Who is the third who walks always beside you? 360 When I count, there are only you and I together But when I look ahead up the white road There is always another one walking beside you

      In this section, there is no water, indicating a drought of spirituality in modern society. This makes sense, consider that “What the Thunder Said” likely refers to the words of God (or gods?) lamenting what the world he created has become. Although many might refer to “the third who walks always beside you” as Jesus or the Devil, I interpreted it as whatever temptation or distraction pulls people away from “holy” values. If we read the entire section as words said by “the Thunder” or God, then it would be logical to assume that this stanza portrays God questioning society on their downfall. This deity believes that all that counts/matters is “you and I together,” but even when looking at people who claim to be religious and on a path to heaven (the white road), they still live with other sins that they are not willing to give up. Hesse/Dostoevsky wrestles the possibility, or lack thereof, of evil in a world with God, and the guilt one feels (or does not feel) after being an accomplice or partaker in sin. This is increasingly apparent with the rise of overconsumption and personal liberty in the 20th century, so it is no surprise that Eliot, in his analysis of modern society’s problems, views the material and relational temptations (described many times throughout the poem) as barriers to true spirituality.

    5. DA

      I have been tracking agency across the poem in many of my annotations. Something that I think somehow fits with this—also at play across the poem—is perception. Someone touched on this the other day (I think it might have been William, but I can’t quite remember), and it made me think of this connection.

      “Da” means “be self-controlled,” “give,” or “be compassionate.” But how can these three be distinguished between when just “Da” is used (and not Damyata or Datta or Dayadhvam, and if the relationship—to gods, to human beings, or to demons—is not defined)? How does the second voice in the exchange in the poem discern which? Zooming out, readers, in a pause before reading what follows after each “Da,” can interpret it as all three at once, or choose one. Is this desired? How does the effect change?

      The Bradley source clearly speaks very much to perception. How can the exchange between the two figures (one being thunder) here at the end with the three repetitions, and meanings, of “Da” be informed or elucidated by this? At the very least, it seems to exemplify the potential disconnect (and differing interpretation etc.) Bradley lays out. Is this then to spread to other characters and their exchanges in the poem? All of this makes me want to loop back to Madame Sosotris, and the tarot deck…

      On a slightly different note, I think this ending (or at least the beginning of the end; I don’t want to speak too soon) is hopeful. The order of “Da”s goes from human beings to demons to gods, ending with self-control—and a boat responding “Gaily, to the hand expert with sail and oar / the sea was calm…” This seems to lay out the journey, ending on top, and with established human agency. But of course I am now seeing some other ways this could be read…

    6. He who was living is now dead We who were living are now dying

      Response to Marisin McLain (and Sophie Perkel)

      Last year, scholar Sophie Perkel noted how the ambiguous “he”, who plagues this poem makes its final appearance in this line, “He who was living is now dead.” There is a finality of death, different from the amorphous cycle of life and rebirth that has plagued the poem thus far, as this “he” dies not only because Eliot wrote it so, but because “he” never appears again; the pronoun dies from the remaining stanzas. This finality is also emphasized by the following line, which follows a similar grammatical pattern, but differs in number of the subject (singular versus plural) and form of the final word (dead vs. dying). The two lines follow this grammatical structure: First person singular/plural - imperfect verb in relative clause - present verb - adjective/present participle. The imperfect verb is the most recent form of past tense, indicating a freshness to the living, and the present provides vividness for the current state of death. But death and dying have far more contrast: death is an adjective, used to directly describe and define “he”. There is no verby-ness in its form, instead more analogous to the noun death. Conversely, “we” are “dying”, a present participle, a verbal adjective meaning continuous action. In this moment of time we are still going through the process of dying, it is not yet complete. With this continuous form the first person pronoun continues to appear in alive use throughout the remainder. “We” might be dying, but we are not yet dead.

      I found it very interesting how Sophie and Marisin noted that “He who was living is now dead” is the final appearance of the pronoun “He” in the poem. Now that “He” is dead, the pronoun’s presence has also died, or ceased to exist, in the poem. This makes the later points in the poem feel more personal, now that Eliot has removed the ambiguous “he.” I would like to build off of Marisin’s point that “we” are still going through the process of dying, but it is not yet complete so the pronoun “we” is not eliminated. Technically, one might argue that, from birth, we are always dying, since each moment of aging is a step towards death. This makes me wonder, is “he” always dead or just for the remainder of this poem? What is Eliot’s perspective on the cycle of life – is there rebirth like I originally thought from his frequency imagery or water, or is there one finite life like the previous Buddha source argued? Eliot’s small but intentional grammatical choices give insight into his wider arguments, one’s that cannot be missed.

    7. DA

      “Look at this stuff, isn’t it neat?” she asks, surrounded by objects that don’t speak her language. Ariel’s voice is stolen so she can walk on land, and that is the question of Psalm 137: “How shall we sing the Lord’s song in a strange land?” Eliot asks the same thing, except there’s no sea witch to blame, only silence. The Ganga is sunken, the clouds are distant, and the thunder can barely form a word: DA. The syllable stammers toward meaning. Datta. Dayadhvam. Damyata. Give. Sympathize. Control. Commands that echo the psalm’s plea for song but return only fragments. The captives in Babylon hung their harps on the willows; Eliot’s speakers hang their words on static. The thunder speaks, but its language is splintered, a sacred tongue reduced to consonants. Both rivers are holy, both are broken. The Ganges without rain is as desolate as Babylon without Zion. In both, sound becomes the only remaining form of faith, the echo of what once was music. The psalmist threatens vengeance, but Eliot offers obedience; both are desperate to reclaim voice through rhythm. When the thunder says DA, it is the ghost of a hymn, a cracked psalm vibrating through dry air. The rain hesitates at the edge of speech. What’s left is a choir of lost voices, each trying to sing in a language it no longer believes in. Ariel traded her song for legs; Eliot traded his for survival. Neither ever gets it back.

    8. Who is the third who walks always beside you? 360 When I count, there are only you and I together But when I look ahead up the white road There is always another one walking beside you

      “It’s a glitch in the Matrix.” That is what this moment feels like, the instant the world repeats itself and perception breaks. The voice counts two but sees three. The brown-mantled figure glides beside them, half real, half reflection. The road is bright enough to blind. This is not revelation but error, a tear in vision. The “third” is not a companion or a god. It is the self split open, awareness doubled until it can no longer tell which part is moving forward. The desert becomes circuitry, the white road a frozen current. The “hooded hordes” that follow are copied bodies, corrupted code replaying itself. Every pilgrim in the poem—Roland, Dracula, Eliot’s wanderer—exists in this duplication. The chapel, the tower, the city: all illusions rendered again and again. The thunder speaks in fragments because speech itself has been divided. Even rain feels artificial, a false reset. The “third” is what remains after too much seeing, the echo of thought that keeps walking when the body stops. It is both the error and the evidence, the ghost produced by perception’s glitch.

    9. Elizabeth and Leicester

      Response to Anthony Hu's Annotation from 2024

      "...Both sexes have been unanimously subjected to the similar degenerative consequences of contemporary industrialization and mechanization of love. Here, however, I want to highlight a disparity between the presentation of the two sexes. Why does Eliot include here the hidden affair between Queen Elizabeth I and Robert Dudley, 1st Earl of Leicester? ... Recall that a consistent motif throughout the poem is the identification of characters with the Fisher King, who, due to a physical impotence, leads to sterility throughout his country’s land. Queen Elizabeth, however, serves as a complete antithesis in “The Fire Sermon.” Publicly known as the “Virgin Queen,” her physical chastity is often celebrated along with the glamor of the country during her rule. In fact, we can easily identify causal relationships between the two – for instance, it is precisely to ensure Britain’s political stability that members of the nobility propagated rumors that prevented a marriage between the queen and the earl. Unlike for the Fisher King, whose sexual potency is restored through the renewal of the land, it seems like female virility is negatively related to the prosperity (and therefore metaphorical fertility) of the land. ...Despite all of that, perhaps there is no disparity after all. If we interpret the queen’s virginity not as a sterile state, but the potential for future fertility, she perfectly demonstrates the pattern embodied in the Fisher King. In a historical context, this interpretation indeed holds some truth – for other European powers at the time, Elizabeth’s chastity meant she was constantly available for a future arrangement of political marriage; as such, the stability and prosperity of the nation was maintained. To reach a final answer on this matter of gender portrayal, a comparison of patterns throughout the poem will be ideal.

      Anthony brought up a comparison which I did not consider between Queen Elizabeth I (or the Virgin Queen) and the Fisher King (the Maimed King). He recalls how the Fisher King’s sterilization due to injury causes his country to suffer in success and land fertility. Queen Elizabeth’s virginity is celebrated as having brought more success to England. She was revered for being “married to her country” – a selfless leader who prioritized the good of the nation over the possibility of building a family. Although Eliot chastises both men and women for the modern lack of sexual restraint, there is clearly still a difference in how the separate genders are portrayed. The Fisher King is called “the Maimed King,” so that his title and identity becomes fused with his infertile state. Queen Elizabeth is titled “the Virgin Queen,” almost equating her to the Virgin Mary, who is celebrated for having a child without sacrificing her “purity.” Maybe the point of contrast here is gender, that men should be allowed sexual freedom while women should not. The authority of male leaders is often measured by their virility. We see this with Queen Elizabeth’s father, Henry VIII, who may not have been praised for his many wives, but he was still obeyed and seen as a powerful leader. Female leaders, on the other hand, are expected to devote their entire lives to their career. Often, when a woman is pregnant with a child, the public is given an explicit reminder of her sex, and with that a level of subconscious bias that she is no longer physically or emotionally fit for this role. Since men were seen as the preferential leaders and men cannot carry children, a woman who does not carry a child is seen as closer to the ideal leader than one that does, explaining the praise of a virgin queen. Contrastingly, the difference between the two royals may be based on their ability and choice. The Fisher King is forced to be infertile against his will while Queen Elizabeth chooses to be celibate. Although this may be a large reason for the discrepancy, we cannot remove gender from the equation. One might say that infertility is not a choice while virginity is, and though that is true for the most part, Eliot references sexual assault throughout the poem as an unfortunate exception that rule. An infertile man could still succeed, albeit with challenges, because his value is not tied to a spouse, but a woman who has experienced sexual assault was labeled as “ruined,” despite having no choice in the matter. Finally, power and societal standing cannot be ignored. A single woman living in poverty would not be famed for her virginity to the level that Queen Elizabeth was. Similarly, an infertile man not in power would not be publicly shamed for his infertility if he stood his ground in other areas of life. So, there are both drawbacks and benefits to power which the Fisher King and Queen Elizabeth experience. In conclusion, while we can theorize Eliot’s stance on sexuality based on textual references, it is clear that power, gender, and consent have an unmistakable influence on the relationship between a person’s reputation and their romantic endeavors.

    10. I do not know whether a man or a woman

      Lucas, I really enjoyed reading your annotation, and I agree with your analysis. Throughout The Waste Land Eliot explores the fate of men and the fate of women, but there are several central points of almost overwhelming union(/fusion?) of the binary—the first being Tiresias, and the second being here, with this nebulous “third.” This figure, while amorphous, is key, pulling together a number of sources (Shakelton, Marudanayagam, Luke, Weston (the Black Hand?)) and clearly standing as some kind of higher power.

      The Visuddhi-Magga boils all beings down to the same—the absolute physical: bones and then working outwards. In so directly linking this to the mysterious third person, Eliot seems finally to have settled in his back and forth between defined gender roles and fates vs. none at all. And, as Lucas notes, bones run through the Waste Land, persisting beyond all outside layers. But how, then, to reconcile that other side still present? Perhaps the way Eliot doesn’t directly quote from the Visuddhi-Magga, and punctuates differently, and switches the order of “man” and “woman” shows that he isn’t in fact wholly drawing from the source.

      Something else interesting to note, in a broader / more general sense, is that three functions as a number of disruption—it unsettles the stability of two, breaking the pair or the binary or the dipole. And it is from two that a third can be formed—from both, as a mix. I have been inspired to think more about the significance of this number (which happens to be my lucky number :) ) by Dr. Blevins’ paper on numbers in TWL and Celina ’23’s note that “the number three plays an important role in ancient scholars’ perception of the world, and it is hard to explain this weird infatuation with this specific number across universal applications” (she cites Lao Zi and Daoism, to start). I also think there is perhaps a connection somewhere in here to Hesse’s idea of “the downfall of Europe” which is to result in a fusing of the European man and the Russian man—or perhaps just the European man becoming the Russian man, who in himself is already the third, a mix of good and bad, moral and immoral, etc.

    1. (r) are .10, .30, and .50, respectively, while small, medium, and large mean differences (d) are .20, .50, and .80

      de maten van verschillen verschillen voor correlatie of effectsize, 10 30 50 20 50 80. effectsizes stappen van 30 vanaf 20

    2. the result is broadly applicable or meaningfu

      dit is niet de betekenis van significant alleen dat de kans klein is dat het komt door kans

    3. Games–Howell,

      kan liberaren zijn in kleine steekproeven dus de neiging omteveel significante verschillen te vinden ook wanneer deze er niet zijn

    4. Planned Contrast

      voordeel is dat je voordat je de data ziet een specifieke hypothese hebt wat leidt tot een hogere power omdat je het aantal testen beperkt en je hoeft minder correcties toe te passen

    5. Polynomial contrasts

      zijn liniare combinaties van groepsgemiddelden die getest worden wanneer er een natuurlijke orde is van laag naar medium naar hoog. deze contrasten hebben minimaal 3 groepen nodig en er zijn vormen

    6. Non-orthogonal

      cotrasten die wel afhankelijk zijn en hierdoor error verhogen je komt hieraan als je dezelfde groep meerdere keren gebruikt

    1. ge は以上、lt は未満

      念のためちゃんと書いておきたい気がしました。

      ge は “greater than or equal to” の略で、「以上」を意味し、lt は “less than” の略で、「未満」を意味

      とかでしょうか?

    2. BaseModel

      BaseModelは基本部分になるので、ドキュメントのリンクがあってもよさそうです。 https://docs.pydantic.dev/latest/api/base_model/

    3. Annotated パターンとデコレータ(@field_validator)パターンです。

      下の説明では「デコレータパターン」「Annotatedパターン」の順で説明しているので、この説明の語順と合わせたほうがいいと思います。

    4. Python で管理しやすくしなり、Python でコーディングする際に型ヒントの恩恵が得られるという大きなメリットが得られます。

      ここももうちょっと具体的に書きたいですね。

      Pythonで◯◯が管理しやすくなる、Pythonでコーディングする際に◯◯での△△の表示など型ヒントの恩恵が

      みたいな具体的な文章にしてほしい

    5. データ型を以下のように変更します。

      データ型を変更している、と言っていいのかが気になる。

      Annotatedは直訳すると「注釈」なので

    6. のフォーマットは、

      2つのルールに分けた方がよさそう

      • 時間のフォーマットはH:MMとする
      • 時間は30分単位とする
    7. これ、私も意味わかってなかったんですけど ... を指定すると必須だけどデフォルト値がないってこと、ですか?

    8. データ検証

      データ検証を具体的にするんでしょうか?

      私のイメージとしては、データ形式(スキーマ)を具体的にするので、結果としてそのルールでデータ検証されている。というイメージです。

      https://docs.pydantic.dev/latest/concepts/models/ にも以下の様に書いてある One of the primary ways of defining schema in Pydantic is via models. Models are simply classes which inherit from BaseModel and define fields as annotated attributes.

    9. このようにPydanticは全データを検証して全部のデータをわかりやすく出してくれます。

      みたいな文章が欲しい

    10. branch = Branch(**data)

      Branchがどこから来るの? と思っちゃうので、別モジュールにして

      from model import Branch

      とか書いて欲しいかなと思った

    11. import json

      このコードの説明を書いて欲しい。 例外が発生したら errors() メソッドでその詳細を出しているよ

      みたいな

    12. 辞書で定義したものを JSON ファイルとして読み込んで確認をしてみましょう。

      これちょっとわかりにくいなと。

      最初に例示していたときはPythonの辞書で、ここで書いているのはJSONってことですかね。 もうちょっとわかりやすく説明して欲しいです。もしくは最初からJSONということで話を始めた方がよいかと

    13. 要素がリストの中に辞書が入っています

      staffの中には各スタッフを表す辞書がリストの中に入っています。とか?

      要素がリストの中に辞書が入っています。は日本語として意味がわからなかった

    14. branch = {

      コードには全部キャプションを入れて欲しいです

      ```{code-block} python :caption: ここにキャプション

      コード ```

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful and constructive feedback, which helped us strengthen the study on both the computational and biological side. In response, we added substantial new analyses and results in a total of 26 new supplementary figures and a new supplementary note. Importantly, we demonstrated that our approach generalizes beyond tissue outcomes by predicting final-timepoint morphology clusters from early frames with good accuracy as new Figure 4C. Furthermore, we completely restructured and expanded the human expert panel: six experts now provided >30,000 annotations across evenly spaced time intervals, allowing us to benchmark human predictions against CNNs and classical models under comparable conditions. We verified that morphometric trajectories are robust: PCA-based reductions and nearest-neighbor checks confirmed that patterns seen in t-SNE/UMAP are genuine, not projection artifacts. To test whether z-stacks are required, we re-did all analyses with sum- and maximum-intensity projections across five slices; results were unchanged, showing that single-slice imaging is sufficient. From a bioinformatics perspective, we performed negative-label baselines, downsampling analyses to quantify dataset needs, and statistical tests confirming CNNs significantly outperform classical models. Biologically, we clarified that each well contains one organoid, further introduced the Latent Determination Horizon concept tied to expert visibility thresholds, and discussed limits in cross-experiment transfer alongside strategies for domain adaptation and adaptive interventions. Finally, we clarified methods, corrected terminology and a scaler leak, and made all code and raw data publicly available.

      Together, these revisions in our opinion provide an even clearer, more reproducible, and stronger case for the utility of predictive modeling in retinal organoid development.


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      This study presents predictive modeling for developmental outcome in retinal organoids based on high-content imaging. Specifically, it compares the predictive performance of an ensemble of deep learning models with classical machine learning based on morphometric image features and predictions from human experts for four different task: prediction of RPE presence and lense presence (at the end of development) as well as the respective sizes. It finds that the DL model outperforms the other approaches and is predictive from early timepoints on, strongly indicating a time-frame for important decision steps in the developmental trajectory.

      Response: We thank the reviewer for the constructive and thoughtful feedback. In response to the review as found below, we have made substantial revisions and additions to the manuscript. Specifically, we clarified key aspects of the experimental setup, changed terminology regarding training/validation/test sets, and restructured our human expert baseline analysis by collecting and integrating a substantially larger dataset of expert annotations according to suggestion. We introduced the Latent Determination Horizon concept with clearer rationale and grounding. Most importantly, we significantly expanded our interpretability analyses across three CNN architectures and eight attribution methods, providing comprehensive quantitative evaluations and supplementary figures that extend beyond the initial DenseNet121 examples (new Supplementary Figures S29-S37). We also ensured full reproducibility by making both code and raw data publicly available with documentation. While certain advanced interpretability methods (e.g., Discover) could not be integrated despite considerable effort, we believe the revised manuscript presents a robust, well-documented, and carefully qualified analysis of CNN predictions in retinal organoid development.

      Major comments: I find the paper over-all well written and easy to understand. The findings are relevant (see significance statement for details) and well supported. However, I have some remarks on the description and details of the experimental set-up, the data availability and reproducibility / re-usability of the data.

      1. Some details about the experimental set-up are unclear to me. In particular, it seems like there is a single organoid per well, as the manuscript does not mention any need for instance segmentation or tracking to distinguish organoids in the images and associate them over time. Is that correct? If yes, it should be explicitly stated so. Are there any specific steps in the organoid preparation necessary to avoid multiple organoids per well? Having multiple organoids per well would require the aforementioned image analysis steps (instance segmentation and tracking) and potentially add significant complexity to the analysis procedure, so this information is important to estimate the effort for setting up a similar approach in other organoid cultures (for example cancer organoids, where multiple organoids per well are common / may not be preventable in certain experimental settings).

      Response: We thank the reviewer for this question. We agree that these preprocessing steps would add more complexity to our presented preprocessing steps and would definitely be required in some organoid systems. In our experimental setup, there is only one organoid per well which forms spontaneously after cell seeding from (almost) all seeded cells. There are no additional steps necessary in order to ensure this behaviour in our setup. We amended the Methods section to now explicitly state this accordingly (paragraph ‘Organoid timelapse imaging’).

      The terminology used with respect to the test and validation set is contrary to the field, and reporting the results on the test set (should be called validation set), should be avoided since it is used to select models. In more detail: the terms "test set" and "validation set" (introduced in 213-221) are used with the opposite meaning to their typical use in the deep learning literature. Typically, the validation set refers to a separate split that is used to monitor convergence / avoid overfitting during training, and the test set refers to an external set that is used to evaluate the performance of trained models. The study uses these terms in an opposite manner, which becomes apparent from line 624: "best performing model ... judged by the loss of the test set.". Please exchange this terminology, it is confusing to a machine learning domain expert. Furthermore, the performance on the test set (should be called validation set) is typically not reported in graphs, as this data was used for model selection, and thus does not provide an unbiased estimate of model performance. I would remove the respective curves from Figures 3 and 4.

      Response: We are thankful for the reviewers comments on this matter. Indeed, we were using an opposite terminology compared to what is commonly used within the field. We have adjusted the Results, Discussion and Methods sections as well as the figures accordingly. Further, we added a corresponding disclaimer for the code base in the github repository. However, we prefer to not remove the respective curves from the figures. We think that this information is crucial to interpret the variability in accuracy between organoids from the same experiments and organoids acquired from a different, independent experiment. The results suggest that the accuracy for organoids within the same experiments is still higher, indicating to users the potential accuracy drop resulting from independent experiments. As we think that this is crucial information for the interpretability of our results, we would like to still include it side-by-side with the test data in the figures.

      The experimental set-up for the human expert baseline is quite different to the evaluation of the machine learning models. The former is based on the annotation of 4,000 images by seven expert, the latter based on a cross-validation experiments on a larger dataset. First of all, the details on the human expert labeling procedure is very sparse, I could only find a very short description in the paragraph 136-144, but did not find any further details in the methods section. Please add a methods section paragraph that explains in more detail how the images were chosen, how they were assigned to annotators, and if there was any redundancy in annotation, and if yes how this was resolved / evaluated. Second, the fact that the set-up for human experts and ML models is quite different means that these values are not quite comparable in a statistical sense. Ideally, human estimators would follow the same set-up as in ML (as in, evaluate the same test sets). However, this would likely prohibitive in the required effort, so I think it's enough to state this fact clearly, for example by adding a comment on this to the captions of Figure 3 and 4.

      Response: We thank the reviewer for this constructive suggestion. We agree that the curves for human evaluations in the original draft were calculated differently compared to the curves for the classification algorithms, mostly stemming from feasibility of data set annotation at the time. In order to still address this suggestion, we went on to repeat and substantially expand the number of images annotated and thus revised the full human expert annotation. Each one of 6 human experts was asked to predict/interpret 6 images of each organoid within the full dataset. In order to select the images, we divided the time course (0-72h) into 6 evenly spaced intervals of 12 hours. For each interval, one image per organoid and human expert was randomly selected and assigned. This resulted in a total of 31,626 classified images (up from 4000 in the original version of the manuscript), from which the assigned images were overlapping between experts for each source interval but not for the individual images. We then changed the calculation of the curves to be the same as for the classification analysis: F1 data were calculated for each experiment over 6 timeframes and all experts, and plotted within the respective figure. We have amended the Methods section accordingly and replaced the respective curves within Figures 3 and 4 and Supplementary Figures S1, S8 and S19.

      It is unclear to me where the theoretical time window for the Latent Determination Horizon in Figure 5 (also mentioned in line 350) comes from? Please explain this in more detail and provide a citation for it.

      Response: We thank the reviewer for this important point. The Latent Determination Horizon (LDH) is a conceptual framework we introduced in this study to describe the theoretical period during which the eventual presence of a tissue outcome of interest (TOI) is being determined but not yet detectable. It is derived from two main observations in our dataset: (i) the inherent intra- and inter-experimental heterogeneity of organoid outcomes despite standardized protocols, and (ii) the progressive increase in predictive performance of our deep learning models over time, which suggests that informative morphological features only emerge gradually. We have now clarified this rationale in the manuscript (Discussion section) further and explicitly stated that the LDH is a concept we introduce here, rather than a previously described or cited term.

      The timewindow is defined by the TOI visibility, which is defined empirically as indicated by the results of our human expert panel (compare also Supplementary Figure S1).

      The intepretability analysis (Figure 4, 634-639) based on relevance backpropagation was performed based on DenseNet121 only. Why did you choose this model and not the ResNet / MobileNet? I think it is quite crucial to see if there are any differences between these model, as this would show how much weight can be put on the evidence from this analysis and I would suggest to add an additional experiment and supplementary figure on this.

      Response: We thank the reviewer for this important comment regarding the interpretability analysis and the choice of model. In the original submission, we restricted the attribution analyses shown in originial Figure 4C to DenseNet121, which served as our main reference model throughout the study. This choice was made primarily for clarity and to avoid redundancy in the main figures, as all three convolutional neural network (CNN) architectures (DenseNet121, ResNet50, MobileNetV3_Large) achieved comparable classification performance on our tasks.

      In response to the reviewer’s concern, we have now extended the interpretability analyses to include all three CNN architectures and a total of eight attribution methods (new Supplementary Note 1). Specifically, we generated saliency maps for DenseNet121, ResNet50, and MobileNetV3_Large across multiple time points and evaluated them using a systematic set of metrics: pairwise method agreement within each model (new Supplementary Figure S29), cross-model consistency per method (new Supplementary Figure S34), entropy and diffusion of saliencies over time (new Supplementary Figure S35), regional voting overlap across methods (new Supplementary Figure S36), and spatial drift of saliency centers of mass (new Supplementary Figure S37).

      These pooled analyses consistently showed that attribution methods differ markedly in the regions they prioritize, but that their relative behaviors were mostly stable across the three CNN architectures. For example, Grad-CAM and Guided Grad-CAM exhibited strong internal agreement and progressively focused relevance into smaller regions, while gradient-based methods such as DeepLiftSHAP and Integrated Gradients maintained broader and more diffuse relevance patterns but were the most consistent across models. Perturbation-based methods like Feature Ablation and Kernel SHAP often showed decreasing entropy and higher spatial drift, again similarly across architectures.

      To further address the reviewer’s point, we visualized the organoid depicted in original Figure 4C across all three CNNs and all eight attribution methods (new Supplementary Figures S30-S33). These comparisons confirm and extend analysis of the qualitative patterns described in original Figure 4C and show that they are not specific to DenseNet121, but are representative of the general behavior across architectures.

      In sum, we observed notable differences in how relevance was assigned and how consistently these assignments aligned. Highlighted organoid patterns were not consistent enough across attribution methods for us to be comfortable to base unequivocal biological interpretation on them. Nevertheless we believe that the analyses in response to the reviewer’s suggestions (new Supplementary Note 1 and new Supplementary Figures S29-S37) add valuable context to what can be expected from machine learning models in an organoid research setting.

      As we did not base further unequivocal biological claims on the relevance backpropagation, we decided to move the analyses to the Supporting Information and now show a new model predicting organoid morphology by morphometrics clustering at the final imaging timepoint in new Figure 4C in line with suggestions by Reviewer #3.

      The code referenced in the code availability statement is not yet present. Please make it available and ensure a good documentation for reproducibility. Similarly, it is unclear to me what is meant by "The data that supports the findings will be made available on HeiDoc". Does this only refer to the intermediate results used for statistical analysis? I would also recommend to make the image data of this study available. This could for example be done through a dedicated data deposition service such as BioImageArchive or BioStudies, or with less effort via zenodo. This would ensure both reproducibility as well as potential re-use of the data. I think the latter point is quite interesting in this context; as the authors state themselves it is unclear if prediction of the TOIs isn't even possible at an earlier point that could be achieved through model advances, which could be studied by making this data available.

      Response: We thank the reviewer for this comment. We have now made the repository and raw data public on the suggested platform (Zenodo) and apologize for this oversight. The links are contained within the github repository which is stated in the manuscript under “Data availability”.

      Minor comments:

      Line 315: Please add a citation for relevance backpropagation here.

      Response: We have included citations for all relevance backpropagation methods used in the paper.

      Line 591: There seems to be typo: "[...] classification of binary classification [...]"

      Response: Corrected as suggested.

      Line 608: "[...] where the images of individual organoids served as groups [...]" It is unclear to me what this means.

      Response: We wanted to express that organoid images belonging to one organoid were assigned in full to a training/validation set. We have now stated this more clearly in the Methods section.

      Reviewer #1 (Significance (Required)):

      General assessment: This study demonstrates that (retinal) organoid development can be predicted from early timepoints with deep learning, where these cannot be discerned by human experts or simpler machine learning models. This fact is very interesting in itself due to its implication for organoid development, and could provide a valuable tool for molecular analysis of different organoid populations, as outlined by the authors. The contribution could be strengthened by providing a more thorough investigation of what features in the image are predictive at early timepoints, using a more sophisticated approach than relevance backprop, e.g. Discover (https://www.nature.com/articles/s41467-024-51136-9). This could provide further biological insight into the underlying developmental processes and enhance the understanding of retinal organoid development.

      Response: We thank the reviewer for this assessment and suggestion. We agree that identifying image features predictive at early timepoints would add important biological context. We therefore attempted to apply Discover to our dataset. However, we were unable to get the system to run successfully. After considerable effort, we concluded that this approach could not be integrated into our current analysis. Instead, we report our substantially expanded results obtained with relevance backpropagation, which provided the most interpretable and reproducible insights for our study as described above (New Supplementary Note 1, new Supplementary Figures S29-S37).

      Advance: similar studies that predict developmental outcome based on image data, for example cell proliferation or developmental outcome exist. However, to the best of my knowledge, this study is the first to apply such a methodology to organoids and convincingly shows is efficacy and argues is potential practical benefits. It thus constitutes a solid technical advance, that could be especially impactful if it could be translated to other organoid systems in the future.

      Response: We thank the reviewer for this positive assessment of our work and for highlighting its novelty and potential impact. We are encouraged that the reviewer recognizes the value of applying predictive modeling to organoids and the opportunities this creates for translation to other organoid systems.

      Audience: This research is of interest to a technical audience. It will be of immediate interest to researchers working on retinal organoids, who could adapt and use the proposed system to support experiments by better distinguishing organoids during development. To enable this application, code and data availability should be ensured (see above comments on reproducibility). It is also of interest to researchers in other organoid systems, who may be able to adapt the methodology to different developmental outcome predictions. Finally, it may also be of interest to image analysis / deep learning researchers as a dataset to improve architectures for predictive time series modeling.

      My research background: I am an expert in computer vision and deep learning for biomedical imaging, especially in microscopy. I have some experience developing image analysis for (cancer) organoids. I don't have any experience on the wet lab side of this work.

      Response: We thank the reviewer for this encouraging feedback and for recognizing the broad relevance of our work across retinal organoid research, other organoid systems, and the image analysis community. We are pleased that the potential utility of our dataset and methodology is appreciated by experts in computer vision and biomedical imaging. We have now made the repository and raw data public and apologize for this oversight. The links are provided in the manuscript under “Data availability”.

      Constantin Pape


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: Afting et al. present a computational pipeline for analyzing timelapse brightfield images of retinal organoids derived from Medaka fish. Their pipeline processes images along two paths: 1) morphometrics (based on computer vision features from skimage) and 2) deep learning. They discovered, through extensive manual annotation of ground truth, that their deep learning method could predict retinal pigmented epithelium and lens tissue emergence in time points earlier than either morphometrics or expert predictions. Our review is formatted based on the review commons recommendation.

      Response: We thank the reviewer for the detailed and constructive feedback, which has greatly improved the clarity and rigor of our manuscript. In response, we have corrected a potential data leakage issue, re-ran the affected analyses, and confirmed that results remain unchanged. We clarified the use of data augmentation in CNN training, tempered some claims throughout the text, and provided stronger justification for our discretization approach together with new supplementary analyses (New Supplementary Figures S26, S27). We substantially expanded our interpretability analyses across three CNN architectures and eight attribution methods, quantified their consistency and differences (new Supplementary Figures S29, S34-S37, new Supplementary Note 1), and added comprehensive visualizations (New S30-S33). We also addressed technical artifact controls, provided downsampling analyses to support our statement on sample size sufficiency (new Supplementary Figure S28), and included negative-control baselines with shuffled labels in Figures 3 and 4. Furthermore, we improved the clarity of terminology, figures, and methodological descriptions, and we have now made both code and raw data publicly available with documentation. Together, we believe these changes further strengthen the robustness, reproducibility, and interpretability of our study while carefully qualifying the claims.

      Major comments:

      Are the key conclusions convincing?

      Yes, the key conclusion that deep learning outperforms morphometric approaches is convincing. However, several methodological details require clarification. For instance, were the data splitting procedures conducted in the same manner for both approaches? Additionally, the authors note in the methods: "The validation data were scaled to the same range as the training data using the fitted scalers obtained from the training data." This represents a classic case of data leakage, which could artificially inflate performance metrics in traditional machine learning models. It is unclear whether the deep learning model was subject to the same issue. Furthermore, the convolutional neural network was trained with random augmentations, effectively increasing the diversity of the training data. Would the performance advantage still hold if the sample size had not been artificially expanded through augmentation?

      Response: We thank the reviewer for raising these important methodological points. As Reviewer #1 correctly noted, our use of the terms validation and test may have contributed to confusion. To clarify: in the original analysis the scalers were fitted on the training and validation data and then applied to the test data. This indeed constitutes a form of data leakage. We have corrected the respective code, re-ran all analyses that were potentially affected, and did not observe any meaningful change in the reported results. The Methods section has been amended to clarify this important detail.

      For the neural networks, each image was normalized independently (per image), without using dataset-level statistics, thereby avoiding any risk of data leakage.

      Regarding data augmentation, the convolutional neural network was indeed trained with augmentations. Early experiments without augmentation led to severe overfitting, confirming that the performance advantage would not hold without artificially increasing the effective sample size. We have added a clarifying statement in the Methods section to make this explicit.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Their claims are currently preliminary, pending increased clarity and additional computational experiments described below.

      Response: We believe our additionally performed computational experiments qualify all the claims we make in the revised version of the manuscript.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      • The authors discretize continuous variables into four bins for classification. However, a regression framework may be more appropriate for preserving the full resolution of the data. At a minimum, the authors should provide a stronger justification for this binning strategy and include an analysis of bin performance. For example, do samples near bin boundaries perform comparably to those near the bin centers? This would help determine whether the discretization introduces artifacts or obscures signals.

      Response: We thank the reviewer for this thoughtful suggestion. We agree that regression frameworks can, in principle, preserve the full resolution of continuous outcome variables. However, in our setting we deliberately chose a discretization approach. First, the discretized outcome categories correspond to ranges of tissue sizes that are biologically meaningful and allow direct comparison to expert annotations. In practice, human experts also tend to judge tissue presence and size in categorical rather than strictly continuous terms, which was mirrored by our human expert annotation strategy. As we aimed to compare deep learning with classical machine learning models and with expert annotations across the same prediction tasks, a categorical outcome formulation provided the most consistent and fair framework. Secondly, the underlying outcome variables did not follow a normal distribution, but instead exhibited a skewed and heterogeneous spread. Regression models trained on such distributions often show biases toward the most frequent value ranges, which may obscure less common but biologically important outcomes. Discretization mitigated this issue by balancing the prediction task across defined size categories.

      In line with the reviewer’s request, we have now analyzed the performance in relation to the distance of each sample from the bin center. These results are provided as new Supplementary Figures S26 and S27. Interestingly, for the classical machine learning classifiers, F1 scores tended to be somewhat higher for samples close to bin edges. For the convolutional neural networks, however, F1 scores were more evenly distributed across distances from bin centers. While the reason for this difference remains unclear, the analysis demonstrates that the discretization did not obscure predictive signals in either framework. We have amended the results section accordingly.

      • The relevance backpropagation interpretation analysis is not convincing. The authors argue that the model's use of pixels across the entire image (rather than just the RPE region) indicates that the deep learning approach captures holistic information. However, only three example images are shown out of hundreds, with no explanation for their selection, limiting the generalizability of the interpretation. Additionally, it is unclear how this interpretability approach would work at all in earlier time points, particularly before the model begins making confident predictions around the 8-hour mark. It is also not specified whether the input used for GradSHAP matches the input used during CNN training. The authors should consider expanding this analysis by quantifying pixel importance inside versus outside annotated regions over time. Lastly, Figure 4C is missing a scale bar, which would aid in interpretability.

      Response: We thank the reviewer for raising these important concerns. In the initial version we showed examples of relevance backpropagation that suggested CNNs rely on visible RPE or lens tissue for their predictions (original Figure 4C). Following the reviewer’s comment, we expanded the analysis extensively across all models and attribution methods (compare new Supplementary Note 1), and quantified agreement, consistency, entropy, regional overlap, and drift (new Supplementary Figures S29 and S34-S37), as well as providing comprehensive visualizations across models and methods (new Supplementary Figures S30-S33).

      This extended analysis showed that attribution methods behave very differently from each other, but consistently so across the three CNN architectures. Each method displayed characteristic patterns, for example in entropy or center-of-mass drift, but the overlap between methods was generally low. While integrated gradients and DeepLiftSHAP tended to concentrate on tissue regions, other methods produced broader or shifting relevance patterns, and overall we could not establish robust or interpretable signals from a biological point of view that would support stronger conclusions.

      We have therefore revised the text to focus on descriptive results only, without making claims about early structural information or tissue-specific cues being used by the networks. We also added missing scale bars and clarified methodological details. Together, the revised section now reflects the extensive work performed while remaining cautious about what can and cannot be inferred from saliency methods in this setting.

      • The authors claim that they removed technical artifacts to the best of their ability, but it is unclear if the authors performed any adjustment beyond manual quality checks for contamination. Did the authors observe any illumination artifacts (either within a single image or over time)? Any other artifacts or procedures to adjust?

      Response: We thank the reviewer for this comment. We have not performed any adjustment beyond manual quality control post organoid seeding. The aforementioned removal of technical artifacts included, among others, seeding at the same time of day, seeding and cell processing by the same investigator according to a standardized protocol, usage of reproducible chemicals (same LOT, frozen only once, etc.) and temperature control during image acquisition. We adhered strictly to internal, previously published workflows that were aimed to reduce any variability due to technical variations during cell harvesting, organoid preparation and imaging. We have clarified this important point in the Methods section.

      • In line 434-436 the authors state "In this work, we used 1,000 organoids in total, to achieve the reported prediction accuracies. Yet, we suspect that as little as ~500 organoids are sufficient to reliably recapitulate our findings." It is unclear what evidence the authors use to support this claim? The authors could perform a downsampling analysis to determine tradeoff between performance and sample size.

      Response: We thank the reviewer for this important comment. To clarify, our statement regarding the sufficiency of ~500 organoids was based on a downsampling-style analysis we had already performed. In this analysis, we systematically reduced the number of experiments used for training and assessed predictive performance for both CNN- and classifier-based approaches (former Supplementary Figure S11, new Supplementary Figure S28). For CNNs, performance curves plateaued at approximately six experiments (corresponding to ~500 organoids), suggesting that increasing the sample size further only marginally improved prediction accuracy. In contrast, we did not observe a clear plateau for the machine learning classifiers, indicating that these models can achieve comparable performance with fewer training experiments. We have revised the manuscript text to clarify that this conclusion is derived from these analyses, and continue to include Supplementary Figure S11 as new Supplementary Figure S28 for transparency (compare Supplementary Note 1).

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. Yes, we believe all experiments are realistic in terms of time and resources. We estimate all experiments could be completed in 3-6 months.

      Response: We confirm that the suggested experiments are realistic in terms of time and resources and have been able to complete them within 6 months.

      Are the data and the methods presented in such a way that they can be reproduced? No, the code is not currently available. We were not able to review the source code.

      Response: We have now made the repository public. We apologize for this initial oversight. The links are provided in the revised version of the manuscript under “Data availability”.

      Are the experiments adequately replicated and statistical analysis adequate?

      • The experiments are adequately replicated.

      • The statistical analysis (deep learning) is lacking a negative control baseline, which would be helpful to observe if performance is inflated.

      Response: We thank the reviewer for this comment. We have calculated the respective curves with neural networks and machine learning classifiers that were trained on data with shuffled labels and have included these results as a separate curve in the respective Figures 3 and 4. We have also amended the Methods section accordingly.

      Minor comments:

      Specific experimental issues that are easily addressable.

      Are prior studies referenced appropriately?

      Yes.

      Are the text and figures clear and accurate?

      The authors must improve clarity on terminology. For example, they should define a comprehensive dataset, significant, and provide clarity on their morphometrics feature space. They should elaborate on what they mean by "confounding factor of heterogeneity".

      Response: We thank the reviewer for highlighting the need to clarify terminology. We have revised the manuscript accordingly. Specifically, we now explicitly define comprehensive dataset as longitudinal brightfield imaging of ~1,000 organoids from 11 independent experiments, imaged every 30 minutes over several days, covering a wide range of developmental outcomes at high temporal resolution. Furthermore, we replaced the term significantly with wording that avoids implying statistical significance, where appropriate. We have clarified the morphometrics feature space in the Methods section in a more detailed fashion, describing the custom parameters that we used to enhance the regionprops_table function of skimage.

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions? - Figure 2C describes a distance between what? The y axis is likely too simple. Same confusion over Figure 2D. Was distance computed based on tsne coordinates?

      Response: We thank the reviewer for pointing out this potential source of confusion. The distances shown in original Figures 2C and 2D were not calculated in tSNE space. Instead, morphometrics features were first Z-scaled, and then dimensionality reduction by PCA was applied, with the first 20 principal components retaining ~93% of the variance. Euclidean distances were subsequently computed in this 20-dimensional PC space. For inter-organoid distances (Figure 2C), we calculated mean pairwise Euclidean distances between all organoids at each imaging time point, capturing the global divergence of organoid morphologies over time in an experiment-specific manner. For intra-organoid distances (Figure 2D), we calculated Euclidean distances between consecutive time points (n vs. n+1) for each individual organoid, thereby quantifying the extent of morphological change within organoids over time. We have revised the Figure legend and Methods section to make these definitions clearer.

      • The authors perform a Herculean analysis comparing dozens of different machine learning classifiers. They select two, but they should provide justification for this decision.

      Response: We thank the reviewer for this comment. In our initial machine learning analyses, we systematically benchmarked a broad set of classifiers on the morphometrics feature space, using cross-validation and hyperparameter tuning where appropriate. The classifiers that we ultimately focused on were those that consistently achieved the best performance in these comparisons. This process is described in the Methods and summarized in the Supplementary Figures S4 and S15 (for sum- and maximum-intensity z-projections new Supplementary Figures S5/6 and S16/17), which show the results of the benchmarking. We have clarified the text to state that the selected classifiers were chosen on the basis of their superior performance in these evaluations.

      • It would be good to get a sense for how these retinal organoids grow - are they moving all over the place? They are in Matrigel so maybe not, but are they rotating?

      Can the author's approach predict an entire non-emergence experiment? The authors tried to standardize protocol, but ultimately if It's deriving this much heterogeneity, then how well it will actually generalize to a different lab is a limitation.

      Response: We thank the reviewer for these thoughtful questions. The retinal organoids in our study were embedded in low concentrations of Matrigel and remained relatively stable in position throughout imaging. We did not observe substantial displacement or lateral movement of organoids, and no systematic rotation could be detected in our dataset. Small morphological rearrangements within organoids were observed, but the gross positioning of organoids within the wells remained consistent across time-lapse recordings.

      Regarding generalization across laboratories, we agree with the reviewer that this is an important limitation. While we minimized technical variability by adhering to a highly standardized, published protocol (see Methods), considerable heterogeneity remained at both intra- and inter-experimental levels. This variability likely reflects inherent properties of the system, similar the reportings in the literature across organoid systems, rather than technical artifacts, and poses a potential challenge for applying our models to independently generated datasets. We therefore highlight the need for future work to test the robustness of our models across laboratories, which will be essential to determine the true generalizability of our approach. We have amended the Discussion accordingly.

      • The authors should dampen claims throughout. For example, in the abstract they state, "by combining expert annotations with advanced image analysis". The image analysis pipelines use common approaches.

      Response: We thank the reviewer for this comment. We agree that the individual image analysis steps we used, such as morphometric feature extraction, are based on well-established algorithms. By referring to “advanced image analysis,” we intended to highlight not the novelty of each single algorithm, but rather the way in which we systematically combined a large number of quantitative parameters and leveraged them through machine learning models to generate predictive insights into organoid development.

      • The authors state: "the presence of RPE and lenses were disagreed upon by the two independently annotating experts in a considerable fraction of organoids (3.9 % for RPE, 2.9% for lenses).", but it is unclear why there were two independently annotating experts. The supplements say images were split between nine experts for annotation.

      Response: We thank the reviewer for pointing out this ambiguity. To clarify, the ground truth definition at the final time point was established by two experts who annotated all organoids. These two annotators were part of the larger group of six experts who contributed to the earlier human expert annotation tasks. Thus, while six experts provided annotations for subsets of images during the expert prediction experiments, the final annotation for every single organoid at its last time frame was consistently performed by the same two experts to ensure a uniform ground truth. We have amended this in the revised manuscript to make this distinction clear.

      • Details on the image analysis pipeline would be helpful to clarify. For example, why did they choose to measure these 165 morphology features? Which descriptors were used to quantify blur? Did the authors apply blur metrics per FOV or per segmented organoid?

      Response: We thank the reviewer for this comment. To clarify, we extracted 165 morphometric features per segmented organoid, combining standard scikit-image region properties with custom implementations (e.g., blur quantified as the variance of the Laplace filter response within the organoid mask). All metrics, including blur, were calculated per segmented organoid rather than per full field of view. This broad feature space was deliberately chosen to capture size, shape, and intensity distributions in a comprehensive and unbiased manner. We now provide a more detailed description of the preprocessing steps, the full feature list, and the exact code implementations are provided in the Methods section (“Large-scale time-lapse Image analysis”) of the revised version of the manuscript as well as in the source code github repository.

      • The description of the number of images is confusing and distracts from the number of organoids. The number of organoids and number of timepoints used would provide a better description of the data with more value. For example, does this image count include all five z slices?

      Response: We thank the reviewer for this comment. The reported image count includes slice 3 only, which we based our models on. The five z-slices that we used to create the MAX- and SUM-intensity z-projections would increase this number 5-fold. While we agree that the number of organoids and time points are highly informative metrics and have provided these details in the manuscript, we also believe that reporting the image count is valuable, as it directly reflects the size of the dataset processed by our analysis pipelines. For this reason, we prefer to keep the current description.

      • The authors should consider applying a maximum projection across the five z slices (rather than the middle z) as this is a common procedure in image analysis. Why not analyze three-dimensional morphometrics or deep learning features? Might this improve performance further?

      Response: We thank the reviewer for this valuable suggestion. To address this point, we repeated all analyses using both sum- and maximum-intensity z-projections and have included the results as new Supplementary Figures S8-S10, S13/S14 for TOI emergence and new Supplementary Figures S19-S21, S24/S25 for TOI sizes (classifier benchmarking and hyperparameter tuning in new Supplementary Figures S5/S6 and S16/S17). These additional analyses did not reveal a noticeable improvement in performance, suggesting that projections incorporating all slices are not strictly necessary in our setting. An analysis that included all five z-slices separately for classification would indeed be of interest, but was not feasible within the scope of this study, as it would substantially increase the computational demands beyond the available resources and timeframe.

      • There is a lot of manual annotation performed in this work, the authors could speculate how this could be streamlined for future studies. How does the approach presented enable streamlining?

      Response: We thank the reviewer for raising this important point. The current study relied on expert visual review, which is time-intensive, but our findings suggest several ways to streamline future work. For instance, model-assisted prelabeling could be used to automatically accept high-confidence cases while routing only uncertain cases to experts. Active sampling strategies, focusing expert review on boundary cases or rare classes, as well as programmatic checks from morphometrics (e.g., blur or contrast to flag low-quality frames), could further reduce effort. Consensus annotation could be reserved only for cases where the model and expert disagree or confidence is low. Finally, new experiments could be bootstrapped with a small seed set of annotated organoids for fine-tuning before switching to such a model-assisted workflow. These possibilities are enabled by our approach, where organoids are imaged individually, morphometrics provide automated quality indicators, and the CNN achieves reliable performance at early developmental stages, making model-in-the-loop annotation a feasible and efficient strategy for future studies. We have added a clarifying paragraph to the Discussion accordingly.

      Reviewer #2 (Significance (Required)):

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. The paper's advance is technical (providing new methods for organoid quality control) and conceptual (providing proof of concept that earlier time points contain information to predict specific future outcomes in retinal organoids)

      Place the work in the context of the existing literature (provide references, where appropriate).

      • The authors do a good job of placing their work in context in the introduction.
      • The work presents a simple image analysis pipeline (using only the middle z slice) to process timelapse organoid images. So not a 4D pipeline (time and space), just 3D (time). It is likely that more and more of these approaches will be developed over time, and this article is one of the early attempts.

      • The work uses standard convolutional neural networks.

      Response: We thank the reviewer for this assessment. We agree that our work represents one of the early attempts in this direction, applying a straightforward pipeline with standard convolutional neural networks, and we appreciate the reviewer’s acknowledgment of how the study has been placed in context within the Introduction.

      State what audience might be interested in and influenced by the reported findings. - Data scientists performing image-based profiling for time lapse imaging of organoids.

      • Retinal organoid biologists

      • Other organoid biologists who may have long growth times with indeterminate outcomes.

      Response: We thank the reviewer for outlining the relevant audiences. We agree that the reported findings will be of interest to data scientists working on image-based profiling, retinal organoid biologists, and more broadly to organoid researchers facing long culture times with uncertain developmental outcomes.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. - Image-based profiling/morphometrics

      • Organoid image analysis

      • Computational biology

      • Cell biology

      • Data science/machine learning

      • Software

      This is a signed review:

      Gregory P. Way, PhD

      Erik Serrano

      Jenna Tomkinson

      Michael J. Lippincott

      Cameron Mattson

      Department of Biomedical Informatics, University of Colorado


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      This manuscript by Afting et. al. addresses the challenge of heterogeneity in retinal organoid development by using deep learning to predict eventual tissue outcomes from early-stage images. The central hypothesis is that deep learning can forecast which tissues an organoid will form (specifically retinal pigmented epithelium, RPE, and lens) well before those tissues become visibly apparent. To test this, the authors assembled a large-scale time-lapse imaging dataset of ~1,000 retinal organoids (~100,000 images) with expert annotations of tissue outcomes. They characterized the variability in organoid morphology and tissue formation over time, focusing on two tissues: RPE (which requires induction) and lens (which appears spontaneously). The core finding is that a deep learning model can accurately predict the emergence and size of RPE and lens in individual organoids at very early developmental stages. Notably, a convolutional neural network (CNN) ensemble achieved high predictive performance (F1-scores ~0.85-0.9) hours before the tissues were visible, significantly outperforming human experts and classical image-analysis-based classifiers. This approach effectively bypasses the issue of stochastic developmental heterogeneity and defines an early "determination window" for fate decisions. Overall, the study demonstrates a proof-of-concept that artificial intelligence can forecast organoid differentiation outcomes non-invasively, which could revolutionize how organoid experiments are analyzed and interpreted.

      Recommendation:

      While this manuscript addresses an important and timely scientific question using innovative deep learning methodologies, it currently cannot be recommended for acceptance in its present form. The authors must thoroughly address several critical limitations highlighted in this report. In particular, significant issues remain regarding the generalizability of the predictive models across different experimental conditions, the interpretability of deep learning predictions, and the use of Euclidean distance metrics in high-dimensional morphometric spaces-potentially leading to distorted interpretations of organoid heterogeneity. These revisions are essential for validating the general applicability of their approach and enhancing biological interpretability. After thoroughly addressing these concerns, the manuscript may become suitable for future consideration.

      Response: We thank the reviewer for the thoughtful and constructive comments. In response, we expanded our analyses in several key ways. We clarified limitations regarding external datasets. Interpretability analyses were greatly extended across three CNN architectures and eight attribution methods (new Supplementary Figures S29-S37, new Supplementary Note 1), showing consistent but method-specific behaviors; as no reproducible biologically interpretable signals emerged, we now present these results descriptively and clearly state their limitations. We further demonstrated the flexibility of our framework by predicting morphometric clusters in addition to tissue outcomes (new Figure 4C), confirmed robustness of the morphometrics space using PCA and nearest-neighbor analyses (new Supplementary Figure S3), and added statistical tests confirming CNNs significantly outperform classical classifiers (Supplementary File 1). Finally, we made all code and raw data publicly available, clarified species context, and added forward-looking discussion on adaptive interventions. We believe these revisions now further improve the rigor and clarity of our work.

      Major Issues (with Suggestions):

      1. Generalization to Other Batches or Protocols: The drop in performance on independent validation experiments suggests the model may partially overfit to specific experimental conditions. A major concern is how well this approach would work on organoids from a different batch or produced by a slightly different differentiation protocol. Suggestion: The authors should clarify the extent of variability between their "independent experiment" and training data (e.g., were these done months apart, with different cell lines or minor protocol tweaks?). To strengthen confidence in the model's robustness, I recommend testing the trained model on one or more truly external datasets, if available (for instance, organoids generated in a separate lab or under a modified protocol). Even a modest analysis showing the model can be adapted (via transfer learning or re-training) to another dataset would be valuable. If new data cannot be added, the authors should explicitly discuss this limitation and perhaps propose strategies (like domain adaptation techniques or more robust training with diverse conditions) to handle batch effects in future applications.

      Response: We thank the reviewer for this important comment. We fully agree with the reviewer that this would be an amazing addition to the manuscript. Unfortunately we are not able to obtain the requested external data set. Although retinal organoid systems exist and are widely used across different species lines, to the best of our knowledge our laboratory is the only one currently raising retinal organoids from primary embryonic pluripotent stem cells of Oryzias latipes and there is currently only one known (and published) differentiation protocol which allows the successful generation of these organoids. We note that our datasets were collected over the course of nine months, which already introduces variability across time and thus partially addresses concerns regarding batch effects. While we did not have access to truly external datasets (e.g., from other laboratories), we have clarified this limitation as suggested in the revised version of the manuscript and outlined strategies such as domain adaptation and training on more diverse conditions as promising future directions to improve robustness.

      Biological Interpretation of Early Predictive Features: The study currently concludes that the CNN picks up on complex, non-intuitive features that neither human experts nor conventional analysis could identify. However, from a biological perspective, it would be highly insightful to know what these features are (e.g., subtle texture, cell distribution patterns, etc.). Suggestion: I encourage the authors to delve deeper into interpretability. They might try complementary explainability techniques (for example, occlusion tests where parts of the image are masked to see if predictions change, or activation visualization to see what patterns neurons detect) beyond GradientSHAP. Additionally, analyzing false predictions might provide clues: if the model is confident but wrong for certain organoids, what visual traits did those have? If possible, correlating the model's prediction confidence with measured morphometrics or known markers (if any early marker data exist) could hint at what the network sees. Even if definitive features remain unidentified, providing the reader with any hypothesis (for instance, "the network may be sensing a subtle rim of pigmentation or differences in tissue opacity") would add value. This would connect the AI predictions back to biology more strongly.

      Response: We thank the reviewer for this thoughtful suggestion. We agree that linking CNN predictions to specific biological features would be highly valuable. In response, we expanded our interpretability analyses beyond GradientSHAP to a broad set of attribution methods and quantified their behavior across models and timepoints (new Supplementary Figures S29-S37, new Supplementary Note 1). While some methods (e.g., Integrated Gradients, DeepLiftSHAP) occasionally highlighted visible tissue regions, others produced diffuse or shifting relevance, and overall overlap was low. Therefore, our results did not yield reproducible, interpretable biological signals.

      Given these results, we have refrained from speculating about specific early image features and now present the interpretability analyses descriptively. We agree that future studies integrating imaging with molecular markers will be required to directly link early predictive cues to defined biological processes.

      Expansion to Other Outcomes or Multi-Outcome Prediction: The focus on RPE and lens is well-justified, but these are two outcomes within retinal organoids. A major question is whether the approach could be extended to predict other cell types or structures (e.g., presence of certain retinal neurons, or malformations) or even multiple outcomes at once. Suggestion: The authors should discuss the generality of their approach. Could the same pipeline be trained to predict, say, photoreceptor layer formation or other features if annotated? Are there limitations (like needing binary outcomes vs. multi-class)? Even if outside the scope of this study, a brief discussion would reassure readers that the method is not intrinsically limited to these two tissues. If data were available, it would be interesting to see a multi-label classification (predict both RPE and lens presence simultaneously) or an extension to other organoid systems in future. Including such commentary would highlight the broad applicability of this platform.

      Response: We thank the reviewer for this helpful and important suggestion. While our study focused on RPE and lens as the most readily accessible tissues of interest in retinal organoids, our new analyses demonstrate that the pipeline is not limited to these outcomes. In addition to tissue-specific predictions, we trained both a convolutional neural network (on image data) and a decision tree classifier (on morphometrics features) to predict more abstract morphological clusters defined at the final timepoint using the morphometrics features, showing that both approaches could successfully capture non-tissue features from early frames (new Figure 4C). This illustrates that the framework can be extended beyond binary tissue outcomes to multi-class problems, and predict relevant outcomes like the overall organoid morphology. Given appropriate annotations, the framework could in principle be trained to detect additional structures such as photoreceptor layers or malformations. Furthermore, the CNN architecture we employed and the morphometrics feature space are compatible with multi-label classification, meaning simultaneous prediction of several outcomes would also be feasible. We have clarified this point in the discussion to highlight the methodological flexibility and potential generality of our approach and are excited to share this very interesting, additional model with the readership.

      Curse of high dimensionality: Using Euclidean distance in a 165-dimensional morphometric space likely suffers from the curse of dimensionality, which diminishes the meaning of distances as dimensionality increases. In such high-dimensional settings, the range of pairwise distances tends to collapse, undermining the ability to discern meaningful intra- vs. inter-organoid differences. Suggestion: To address this, I would encourage the authors to apply principal component analysis (PCA) in place of (or prior to) tSNE. PCA would reduce the data to a few dominant axes of variation that capture most of the morphometric variance, directly revealing which features drive differences between organoids. These principal components are linear combinations of the original 165 parameters, so one can examine their loadings to identify which morphometric traits carry the most information - yielding interpretable axes of biological variation (e.g., organoid size, shape complexity, etc.). In addition, I would like to mention an important cautionary remark regarding tSNE embeddings. tSNE does not preserve global geometry of the data. Distances and cluster separations in a tSNE map are therefore not faithful to the original high-dimensional distances and should be interpreted with caution. See Chari T, Pachter L (2023), The specious art of single-cell genomics, PLoS Comput Biol 19(8): e1011288, for an enlightening discussion in the context of single cell genomics. The authors have shown that extreme dimensionality reduction to 2D can introduce significant distortions in the data's structure, meaning the apparent proximity or separation of points in a tSNE plot may be an artifact of the algorithm rather than a true reflection of morphometric similarity. Implementing PCA would mitigate high-dimensional distance issues by focusing on the most informative dimensions, while also providing clear, quantitative axes that summarize organoid heterogeneity. This change would strengthen the analysis by making the results more robust (avoiding distance artifacts) and biologically interpretable, as each principal component can be traced back to specific morphometric features of interest.

      Response: We thank the reviewer for this mention. Indeed, high dimensionality and dimensionality reductions can lead to false interpretations. We approached this issue as follows: First, we calculated the same TSNE projections and distances using the first 20 PCs and supplied these data as the new Figure 2 and new Supplementary Figure 2. While the scale of the data shifted slightly, there were no differences in the data distribution that would contradict our prior conclusions.

      In order to confirm the findings and further emphasize the validity of our dimensionality reduction, we calculated the intersection of 30 nearest neighbors in raw data space (or pca space) compared and 30 nearest neighbors in reduced space (TSNE or UMAP, as we wanted to emphasize that this was not an effect specific for TSNE projections and would also be valid in a dimensionality reduction which is more known to preserve global structure rather than local structure). As shown in the new Supplementary Figure S3 (A-D), the high jaccard index confirmed that our projections accurately reflect the data’s structure obtained from raw distance measurements. Moreover, the jaccard index generally increased over time, which is best explained by a stronger morphological similarity of organoids at timepoint 0 and reflected by the dense point cloud in the TSNE projections at that timepoint. The described effects were independent of the usage of data derived from 20 PCs versus data derived from all 165 dimensions.

      We next wanted to confirm the conclusion that data points obtained from organoids at later timepoints were more closely related to each other than data points from different organoids. We therefore identified the 30 nearest neighbor data points, showing that at later timepoints these 30 nearest neighbor data points were almost all attributable to the same organoid (new Supplementary Figure S3 E/F). This was only not the case for experiments that lacked in between timepoints (E007 and E002), therefore misaligning the organoids in the reduced space and convoluting the nearest neighbor analysis.

      We have included the respective new Figures and new Supplementary Figures and linked them in the main manuscript.

      Statistical Reporting and Significance: The manuscript focuses on F1-score as the metric to report accuracy over time, which is appropriate. However, it's not explicitly stated whether any statistical significance tests were performed on the differences between methods (e.g., CNN vs human, CNN vs classical ML). Suggestion: The authors could report statistical significance of the performance differences, perhaps using a permutation test or McNemar's test on predictions. For example, is the improvement of the CNN ensemble over the Random Forest/QDA classifier statistically significant across experiments? Given the n of organoids, this should be assessable. Demonstrating significance would add rigor to the analysis.

      Response: We thank the reviewer for this helpful suggestion. Following the recommendation, we quantified per-experiment differences in predictive performance by calculating the area under the F1-score curves (AUC) for each classifier and experiment. We then compared methods using paired Wilcoxon signed-rank tests across experiments, with Holm-Bonferroni correction for multiple comparisons. This analysis confirmed that the CNN consistently and significantly outperformed the baseline models and classical machine learning classifiers in validation and test organoids, while CNNs were notably but not significantly better performing in test organoids for RPE area and lens sizes compared to the machine learning classifiers. In summary, the findings add the requested statistical rigor to our findings. The results of these tests are now provided in the Supplementary Material as Supplementary File 1.

      Minor Issues (with Suggestions):

      1. Data Availability: Given the resource-intensive nature of the work, the value to the community will be highest if the data is made publicly available. I understand that this is of course at the behest of the authors and they do mention that they will make the data available upon publication of the manuscript. For the time being, the authors can consider sharing at least a representative subset of the data or the trained model weights. This will allow others to build on their work and test the method in other contexts, amplifying the impact of the study.

      Response: We have now made the repository and raw data public and apologize for this oversight. The link for the github repository is now provided in the manuscript under “Data availability”, while the links for the datasets are contained within the github repository.

      Discussion - Future Directions: The Discussion does a good job of highlighting applications (like guiding molecular analysis). One minor addition could be speculation on using this approach to actively intervene: for example, could one imagine altering culture conditions mid-course for organoids predicted not to form RPE, to see if their fate can be changed? The authors touch on reducing variability by focusing on the window of determination; extending that thought to an experimental test (though not done here) would inspire readers. This is entirely optional, but a sentence or two envisioning how predictive models enable dynamic experimental designs (not just passive prediction) would be a forward-looking note to end on.

      Response: We thank the reviewer for this constructive suggestion. We have expanded the discussion to briefly address how predictive modeling could go beyond passive observation. Specifically, we now discuss that predictive models may enable dynamic interventions, such as altering culture conditions mid-course for organoids predicted not to form RPE, to test whether their developmental trajectory can be redirected. While outside the scope of the present work, this forward-looking perspective emphasizes how predictive modeling could inspire adaptive experimental strategies in future studies.

      I believe with the above clarifications and enhancements - especially regarding generalizability and interpretability - the paper will be suitable for broad readership. The work represents an exciting intersection of developmental biology and AI, and I commend the authors for this contribution.

      Response: We thank the reviewer for the positive assessment and their encouraging remarks regarding the contribution of our work to these fields.

      Novelty and Impact:

      This work fills an important gap in organoid biology and imaging. Previous studies have used deep learning to link imaging with molecular profiles or spatial patterns in organoids, but there remained a "notable gap" in predicting whether and to what extent specific tissues will form in organoids. The authors' approach is novel in applying deep learning to prospectively predict organoid tissue outcomes (RPE and lens) on a per-organoid basis, something not previously demonstrated in retinal organoids. Conceptually, this is a significant advance: it shows that fate decisions in a complex 3D culture model can be predicted well in advance, suggesting the existence of subtle early morphogenetic cues that only a sophisticated model can discern. The findings will be of broad interest to researchers in organoid technology, developmental biology, and biomedical AI.

      Response: We thank the reviewer for this thoughtful and encouraging assessment. We agree that our study addresses an important gap by prospectively predicting tissue outcomes at the single-organoid level, and we appreciate the recognition that this represents a conceptual advance with relevance not only for retinal organoids but also for broader applications in organoid biology, developmental biology, and biomedical AI.

      Methodological Rigor and Technical Quality:

      The study is methodologically solid and carefully executed. The authors gathered a uniquely large dataset under consistent conditions, which lends statistical power to their analyses. They employ rigorous controls: an expert panel provided human predictions as a baseline, and a classical machine learning pipeline using quantitative image-derived features was implemented for comparison. The deep learning approach is well-chosen and technically sound. They use an ensemble of CNN architectures (DenseNet121, ResNet50, and MobileNetV3) pre-trained on large image databases, fine-tuning them on organoid images. The use of image segmentation (DeepLabV3) to isolate the organoid from background is appropriate to ensure the models focus on the relevant morphology. Model training procedures (data augmentation, cross-entropy loss with class balancing, learning rate scheduling, and cross-validation) are thorough and follow best practices. The evaluation metrics (primarily F1-score) are suitable for the imbalanced outcomes and emphasize prediction accuracy in a biologically relevant way. Importantly, the authors separate training, test, and validation sets in a meaningful manner: images of each organoid are grouped to avoid information leakage, and an independent experiment serves as a validation to test generalization. The observation that performance is slightly lower on independent validation experiments underscores both the realism of their evaluation and the inherent heterogeneity between experimental batches. In addition, the study integrates interpretability (using GradientSHAP-based relevance backpropagation) to probe what image features the network uses. Although the relevance maps did not reveal obvious human-interpretable features, the attempt reflects a commendable thoroughness in analysis. Overall, the experimental design, data analysis, and reporting are of high quality, supporting the credibility of the conclusions.

      Response: We thank the reviewer for their very positive and detailed assessment. We appreciate the recognition of our efforts to ensure methodological rigor and reproducibility, and we agree that interpretability remains an important but challenging area for future work.

      Reviewer #3 (Significance (Required)):

      Scientific Significance and Conceptual Advances:

      Biologically, the ability to predict organoid outcomes early is quite significant. It means researchers can potentially identify when and which organoids will form a given tissue, allowing them to harvest samples at the right moment for molecular assays or to exclude organoids that will not form the desired structure. The manuscript's results indicate that RPE and lens fate decisions in retinal organoids are made much earlier than visible differentiation, with predictive signals detectable as early as ~11 hours for RPE and ~4-5 hours for lens. This suggests a surprising synchronization or early commitment in organoid development that was not previously appreciated. The authors' introduction of deep learning-derived determination windows refines the concept of a developmental "point of no return" for cell fate in organoids. Focusing on these windows could help in pinpointing the molecular triggers of these fate decisions. Another conceptual advance is demonstrating that non-invasive imaging data can serve a predictive role akin to (or better than) destructive molecular assays. The study highlights that classical morphology metrics and even expert eyes capture mainly recognition of emerging tissues, whereas the CNN detects subtler, non-intuitive features predictive of future development. This underlines the power of deep learning to uncover complex phenotypic patterns that elude human analysis, a concept that could be extended to other organoid systems and developmental biology contexts. In sum, the work not only provides a tool for prediction but also contributes conceptual insights into the timing of cell fate determination in organoids.

      Response: We thank the reviewer for this thoughtful and positive assessment. We agree that the determination windows provide a valuable framework to study early fate decisions in organoids, and we have emphasized this point in the discussion to highlight the biological significance of our findings.

      Strengths:

      The combination of high-resolution time-lapse imaging with advanced deep learning is innovative. The authors effectively leverage AI to solve a biological uncertainty problem, moving beyond qualitative observations to quantitative predictions. The study uses a remarkably large dataset (1,000 organoids, >100k images), which is a strength as it captures variability and provides robust training data. This scale lends confidence that the model isn't overfit to a small sample. By comparing deep learning with classical machine learning and human predictions, the authors provide context for the model's performance. The CNN ensemble consistently outperforms both the classical algorithms and human experts, highlighting the value added by the new method. The deep learning model achieves high accuracy (F1 > 0.85) at impressively early time points. The fact that it can predict lens formation just ~4.5 hours into development with confidence is striking. Performance remained strong and exceeded human capability at all assessed times. Key experimental and analytical steps (segmentation, cross-validation between experiments, model calibration, use of appropriate metrics) are executed carefully. The manuscript is transparent about training procedures and even provides source code references, enhancing reproducibility. The manuscript is generally well-written with a logical flow from the problem (organoid heterogeneity) to the solution (predictive modeling) and clear figures referenced.

      Response: We thank the reviewer for this very positive and encouraging assessment of our study, particularly regarding the scale of our dataset, the methodological rigor, and the reproducibility of our approach.

      Weaknesses and Limitations:

      Generalizability Across Batches/Conditions: One limitation is the variability in model performance on organoids from independent experiments. The CNN did slightly worse on a validation set from a separate experiment, indicating that differences in the experimental batch (e.g., slight protocol or environmental variations) can affect accuracy. This raises the question of how well the model would generalize to organoids generated under different protocols or by other labs. While the authors do employ an experiment-wise cross-validation, true external validation (on a totally independent dataset or a different organoid system) would further strengthen the claim of general applicability.

      Response: We thank the reviewer for this important point. We agree that generalizability across batches and experimental conditions is a key consideration. We have carefully revised the discussion to explicitly address this limitation and to highlight the variability observed between independent experiments.

      Interpretability of the Predictions: Despite using relevance backpropagation, the authors were unable to pinpoint clear human-interpretable image features that drive the predictions. In other words, the deep learning model remains somewhat of a "black box" in terms of what subtle cues it uses at early time points. This limits the biological insight that can be directly extracted regarding early morphological indicators of RPE or lens fate. It would be ideal if the study could highlight specific morphological differences (even if minor) correlated with fate outcomes, but currently those remain elusive.

      Response: We thank the reviewer for raising this important point. Indeed, while our models achieved robust predictive performance, the underlying morphological cues remained difficult to interpret using relevance backpropagation. We believe this limitation reflects both the subtlety of the early predictive signals and the complexity of the features captured by deep learning models, which may not correspond to human-intuitive descriptors. We have clarified this limitation in the Discussion and Supplementary Note 1 and emphasize that further methodological advances in interpretability, or integration with complementary molecular readouts, will be essential to uncover the precise morphological correlates of fate determination.

      Scope of Outcomes: The study focuses on two particular tissues (RPE and lens) as the outcomes of interest. These were well-chosen as examples (one induced, one spontaneous), but they do not encompass the full range of retinal organoid fates (e.g., neural retina layers). It's not a flaw per se, but it means the platform as presented is specialized. The method might need adaptation to predict more complex or multiple tissue outcomes simultaneously.

      Response: We agree with the reviewer that our study focuses on two specific tissues, RPE and lens, which served as proof-of-concept outcomes representing both induced and spontaneous differentiation events. While this scope is necessarily limited, we believe it demonstrates the general feasibility of our approach. We have clarified in the Discussion that the same framework could, in principle, be extended to additional retinal fates such as neural retina layers, or even to multi-label prediction tasks, provided appropriate annotations are available. We now provide additional experiments showing that even abstract morphological classes are well predictable. This will be an important next step to broaden the applicability of our platform.

      Requirement of Large Data and Annotations: Practically, the approach required a very large imaging dataset and extensive manual annotation; each organoid's RPE and lens outcome, plus manual masking for training the segmentation model. This is a substantial effort that may be challenging to reproduce widely. The authors suggest that perhaps ~500 organoids might suffice to achieve similar results, but the data requirement is still high. Smaller labs or studies with fewer organoids might not immediately reap the full benefits of this approach without access to such imaging throughput.

      Response: We thank the reviewer for highlighting this important point. We agree that the generation of a large imaging dataset and the associated annotations represent a substantial investment of time and resources. At the same time, we consider this effort highly relevant, as it reflects the intrinsic heterogeneity of organoid systems rather than technical artifacts, and therefore ensures robust model training. We have clarified this limitation in the discussion. While our full dataset included ~1,000 organoids, our downsampling analysis suggests that as few as ~500 organoids may already be sufficient to reproduce the key findings, which we believe makes the approach feasible for many organoid systems (compare new Supplementary Note 1). Moreover, as we outline in the Discussion, future refinements such as combining image- and tabular-based features or incorporating fluorescence data could further enhance predictive power and reduce annotation effort.

      Medaka Fish vs. Other Systems: The retinal organoids in this study appear to be from medaka fish, whereas much organoid research uses human iPSC-derived organoids. It's not fully clear in the manuscript as to how the findings translate to mammalian or human organoids. If there are species-specific differences, the applicability to human retinal organoids (which are important for disease modeling) might need discussion. This is a minor point if the biology is conserved, but worth noting as a potential limitation.

      Response: We thank the reviewer for pointing out this important consideration. We have now explicitly clarified in the Discussion that our proof-of-concept study was performed in medaka organoids, which offer high reproducibility and rapid development. While species-specific differences may exist, the predictive framework is not inherently restricted to medaka and should, in principle, be transferable to mammalian or human iPSC/ESC-derived organoids, provided sufficiently annotated datasets are available. We have amended the Discussion accordingly.

      Predicting Tissue Size is Harder: The model's accuracy in predicting how much tissue (relative area) an organoid will form, while good, is notably lower than for simply predicting presence/absence. Final F1 scores for size classes (~0.7) indicate moderate success. This implies that quantitatively predicting organoid phenotypic severity or extent is more challenging, perhaps due to more continuous variation in size. The authors do acknowledge the lower accuracy for size and treat it carefully.

      Response: We thank the reviewer for this observation and agree with their interpretation. We have already acknowledged in the manuscript that predicting tissue size is more challenging than predicting tissue presence/absence, and we believe we have treated these results with appropriate caution in the revised version of the manuscript.

      Latency vs. Determination: While the authors narrow down the time window of fate determination, it remains somewhat unclear whether the times at which the model reaches high confidence truly correspond to the biological "decision point" or are just the earliest detection of its consequences. The manuscript discusses this caveat, but it's an inherent limitation that the predictive time point might lag the actual internal commitment event. Further work might be needed to link these predictions to molecular events of commitment.

      Response: We agree with the reviewer. As noted in the Discussion, the time points identified by our models likely reflect the earliest detectable morphological consequences of fate determination, rather than the exact molecular commitment events themselves. Establishing a direct link between predictive signals and underlying molecular mechanisms will require future experimental work.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      This manuscript by Afting et. al. addresses the challenge of heterogeneity in retinal organoid development by using deep learning to predict eventual tissue outcomes from early-stage images. The central hypothesis is that deep learning can forecast which tissues an organoid will form (specifically retinal pigmented epithelium, RPE, and lens) well before those tissues become visibly apparent. To test this, the authors assembled a large-scale time-lapse imaging dataset of ~1,000 retinal organoids (~100,000 images) with expert annotations of tissue outcomes. They characterized the variability in organoid morphology and tissue formation over time, focusing on two tissues: RPE (which requires induction) and lens (which appears spontaneously). The core finding is that a deep learning model can accurately predict the emergence and size of RPE and lens in individual organoids at very early developmental stages. Notably, a convolutional neural network (CNN) ensemble achieved high predictive performance (F1-scores ~0.85-0.9) hours before the tissues were visible, significantly outperforming human experts and classical image-analysis-based classifiers. This approach effectively bypasses the issue of stochastic developmental heterogeneity and defines an early "determination window" for fate decisions. Overall, the study demonstrates a proof-of-concept that artificial intelligence can forecast organoid differentiation outcomes non-invasively, which could revolutionize how organoid experiments are analyzed and interpreted.

      Recommendation:

      While this manuscript addresses an important and timely scientific question using innovative deep learning methodologies, it currently cannot be recommended for acceptance in its present form. The authors must thoroughly address several critical limitations highlighted in this report. In particular, significant issues remain regarding the generalizability of the predictive models across different experimental conditions, the interpretability of deep learning predictions, and the use of Euclidean distance metrics in high-dimensional morphometric spaces-potentially leading to distorted interpretations of organoid heterogeneity. These revisions are essential for validating the general applicability of their approach and enhancing biological interpretability. After thoroughly addressing these concerns, the manuscript may become suitable for future consideration.

      Major Issues (with Suggestions):

      1. Generalization to Other Batches or Protocols: The drop in performance on independent validation experiments suggests the model may partially overfit to specific experimental conditions. A major concern is how well this approach would work on organoids from a different batch or produced by a slightly different differentiation protocol. Suggestion: The authors should clarify the extent of variability between their "independent experiment" and training data (e.g., were these done months apart, with different cell lines or minor protocol tweaks?). To strengthen confidence in the model's robustness, I recommend testing the trained model on one or more truly external datasets, if available (for instance, organoids generated in a separate lab or under a modified protocol). Even a modest analysis showing the model can be adapted (via transfer learning or re-training) to another dataset would be valuable. If new data cannot be added, the authors should explicitly discuss this limitation and perhaps propose strategies (like domain adaptation techniques or more robust training with diverse conditions) to handle batch effects in future applications.
      2. Biological Interpretation of Early Predictive Features: The study currently concludes that the CNN picks up on complex, non-intuitive features that neither human experts nor conventional analysis could identify. However, from a biological perspective, it would be highly insightful to know what these features are (e.g., subtle texture, cell distribution patterns, etc.). Suggestion: I encourage the authors to delve deeper into interpretability. They might try complementary explainability techniques (for example, occlusion tests where parts of the image are masked to see if predictions change, or activation visualization to see what patterns neurons detect) beyond GradientSHAP. Additionally, analyzing false predictions might provide clues: if the model is confident but wrong for certain organoids, what visual traits did those have? If possible, correlating the model's prediction confidence with measured morphometrics or known markers (if any early marker data exist) could hint at what the network sees. Even if definitive features remain unidentified, providing the reader with any hypothesis (for instance, "the network may be sensing a subtle rim of pigmentation or differences in tissue opacity") would add value. This would connect the AI predictions back to biology more strongly.
      3. Expansion to Other Outcomes or Multi-Outcome Prediction: The focus on RPE and lens is well-justified, but these are two outcomes within retinal organoids. A major question is whether the approach could be extended to predict other cell types or structures (e.g., presence of certain retinal neurons, or malformations) or even multiple outcomes at once. Suggestion: The authors should discuss the generality of their approach. Could the same pipeline be trained to predict, say, photoreceptor layer formation or other features if annotated? Are there limitations (like needing binary outcomes vs. multi-class)? Even if outside the scope of this study, a brief discussion would reassure readers that the method is not intrinsically limited to these two tissues. If data were available, it would be interesting to see a multi-label classification (predict both RPE and lens presence simultaneously) or an extension to other organoid systems in future. Including such commentary would highlight the broad applicability of this platform.
      4. Curse of high dimensionality: Using Euclidean distance in a 165-dimensional morphometric space likely suffers from the curse of dimensionality, which diminishes the meaning of distances as dimensionality increases. In such high-dimensional settings, the range of pairwise distances tends to collapse, undermining the ability to discern meaningful intra- vs. inter-organoid differences. Suggestion: To address this, I would encourage the authors to apply principal component analysis (PCA) in place of (or prior to) tSNE. PCA would reduce the data to a few dominant axes of variation that capture most of the morphometric variance, directly revealing which features drive differences between organoids. These principal components are linear combinations of the original 165 parameters, so one can examine their loadings to identify which morphometric traits carry the most information - yielding interpretable axes of biological variation (e.g., organoid size, shape complexity, etc.). In addition, I would like to mention an important cautionary remark regarding tSNE embeddings. tSNE does not preserve global geometry of the data. Distances and cluster separations in a tSNE map are therefore not faithful to the original high-dimensional distances and should be interpreted with caution. See Chari T, Pachter L (2023), The specious art of single-cell genomics, PLoS Comput Biol 19(8): e1011288, for an enlightening discussion in the context of single cell genomics. The authors have shown that extreme dimensionality reduction to 2D can introduce significant distortions in the data's structure, meaning the apparent proximity or separation of points in a tSNE plot may be an artifact of the algorithm rather than a true reflection of morphometric similarity. Implementing PCA would mitigate high-dimensional distance issues by focusing on the most informative dimensions, while also providing clear, quantitative axes that summarize organoid heterogeneity. This change would strengthen the analysis by making the results more robust (avoiding distance artifacts) and biologically interpretable, as each principal component can be traced back to specific morphometric features of interest.
      5. Statistical Reporting and Significance: The manuscript focuses on F1-score as the metric to report accuracy over time, which is appropriate. However, it's not explicitly stated whether any statistical significance tests were performed on the differences between methods (e.g., CNN vs human, CNN vs classical ML). Suggestion: The authors could report statistical significance of the performance differences, perhaps using a permutation test or McNemar's test on predictions. For example, is the improvement of the CNN ensemble over the Random Forest/QDA classifier statistically significant across experiments? Given the n of organoids, this should be assessable. Demonstrating significance would add rigor to the analysis.

      Minor Issues (with Suggestions):

      1. Data Availability: Given the resource-intensive nature of the work, the value to the community will be highest if the data is made publicly available. I understand that this is of course at the behest of the authors and they do mention that they will make the data available upon publication of the manuscript . For the time being, the authors can consider sharing at least a representative subset of the data or the trained model weights. This will allow others to build on their work and test the method in other contexts, amplifying the impact of the study.
      2. Discussion - Future Directions: The Discussion does a good job of highlighting applications (like guiding molecular analysis). One minor addition could be speculation on using this approach to actively intervene: for example, could one imagine altering culture conditions mid-course for organoids predicted not to form RPE, to see if their fate can be changed? The authors touch on reducing variability by focusing on the window of determination; extending that thought to an experimental test (though not done here) would inspire readers. This is entirely optional, but a sentence or two envisioning how predictive models enable dynamic experimental designs (not just passive prediction) would be a forward-looking note to end on.

      I believe with the above clarifications and enhancements - especially regarding generalizability and interpretability - the paper will be suitable for broad readership. The work represents an exciting intersection of developmental biology and AI, and I commend the authors for this contribution.

      Novelty and Impact:

      This work fills an important gap in organoid biology and imaging. Previous studies have used deep learning to link imaging with molecular profiles or spatial patterns in organoids, but there remained a "notable gap" in predicting whether and to what extent specific tissues will form in organoids. The authors' approach is novel in applying deep learning to prospectively predict organoid tissue outcomes (RPE and lens) on a per-organoid basis, something not previously demonstrated in retinal organoids. Conceptually, this is a significant advance: it shows that fate decisions in a complex 3D culture model can be predicted well in advance, suggesting the existence of subtle early morphogenetic cues that only a sophisticated model can discern. The findings will be of broad interest to researchers in organoid technology, developmental biology, and biomedical AI.

      Methodological Rigor and Technical Quality:

      The study is methodologically solid and carefully executed. The authors gathered a uniquely large dataset under consistent conditions, which lends statistical power to their analyses. They employ rigorous controls: an expert panel provided human predictions as a baseline, and a classical machine learning pipeline using quantitative image-derived features was implemented for comparison. The deep learning approach is well-chosen and technically sound. They use an ensemble of CNN architectures (DenseNet121, ResNet50, and MobileNetV3) pre-trained on large image databases, fine-tuning them on organoid images. The use of image segmentation (DeepLabV3) to isolate the organoid from background is appropriate to ensure the models focus on the relevant morphology. Model training procedures (data augmentation, cross-entropy loss with class balancing, learning rate scheduling, and cross-validation) are thorough and follow best practices. The evaluation metrics (primarily F1-score) are suitable for the imbalanced outcomes and emphasize prediction accuracy in a biologically relevant way. Importantly, the authors separate training, test, and validation sets in a meaningful manner: images of each organoid are grouped to avoid information leakage, and an independent experiment serves as a validation to test generalization. The observation that performance is slightly lower on independent validation experiments underscores both the realism of their evaluation and the inherent heterogeneity between experimental batches. In addition, the study integrates interpretability (using GradientSHAP-based relevance backpropagation) to probe what image features the network uses. Although the relevance maps did not reveal obvious human-interpretable features, the attempt reflects a commendable thoroughness in analysis. Overall, the experimental design, data analysis, and reporting are of high quality, supporting the credibility of the conclusions.

      Significance

      Scientific Significance and Conceptual Advances:

      Biologically, the ability to predict organoid outcomes early is quite significant. It means researchers can potentially identify when and which organoids will form a given tissue, allowing them to harvest samples at the right moment for molecular assays or to exclude organoids that will not form the desired structure. The manuscript's results indicate that RPE and lens fate decisions in retinal organoids are made much earlier than visible differentiation, with predictive signals detectable as early as ~11 hours for RPE and ~4-5 hours for lens. This suggests a surprising synchronization or early commitment in organoid development that was not previously appreciated. The authors' introduction of deep learning-derived determination windows refines the concept of a developmental "point of no return" for cell fate in organoids. Focusing on these windows could help in pinpointing the molecular triggers of these fate decisions. Another conceptual advance is demonstrating that non-invasive imaging data can serve a predictive role akin to (or better than) destructive molecular assays. The study highlights that classical morphology metrics and even expert eyes capture mainly recognition of emerging tissues, whereas the CNN detects subtler, non-intuitive features predictive of future development. This underlines the power of deep learning to uncover complex phenotypic patterns that elude human analysis, a concept that could be extended to other organoid systems and developmental biology contexts. In sum, the work not only provides a tool for prediction but also contributes conceptual insights into the timing of cell fate determination in organoids.

      Strengths:

      The combination of high-resolution time-lapse imaging with advanced deep learning is innovative. The authors effectively leverage AI to solve a biological uncertainty problem, moving beyond qualitative observations to quantitative predictions. The study uses a remarkably large dataset (1,000 organoids, >100k images), which is a strength as it captures variability and provides robust training data. This scale lends confidence that the model isn't overfit to a small sample. By comparing deep learning with classical machine learning and human predictions, the authors provide context for the model's performance. The CNN ensemble consistently outperforms both the classical algorithms and human experts, highlighting the value added by the new method. The deep learning model achieves high accuracy (F1 > 0.85) at impressively early time points. The fact that it can predict lens formation just ~4.5 hours into development with confidence is striking. Performance remained strong and exceeded human capability at all assessed times. Key experimental and analytical steps (segmentation, cross-validation between experiments, model calibration, use of appropriate metrics) are executed carefully. The manuscript is transparent about training procedures and even provides source code references, enhancing reproducibility. The manuscript is generally well-written with a logical flow from the problem (organoid heterogeneity) to the solution (predictive modeling) and clear figures referenced.

      Weaknesses and Limitations:

      Generalizability Across Batches/Conditions: One limitation is the variability in model performance on organoids from independent experiments. The CNN did slightly worse on a validation set from a separate experiment, indicating that differences in the experimental batch (e.g., slight protocol or environmental variations) can affect accuracy. This raises the question of how well the model would generalize to organoids generated under different protocols or by other labs. While the authors do employ an experiment-wise cross-validation, true external validation (on a totally independent dataset or a different organoid system) would further strengthen the claim of general applicability.

      Interpretability of the Predictions: Despite using relevance backpropagation, the authors were unable to pinpoint clear human-interpretable image features that drive the predictions. In other words, the deep learning model remains somewhat of a "black box" in terms of what subtle cues it uses at early time points. This limits the biological insight that can be directly extracted regarding early morphological indicators of RPE or lens fate. It would be ideal if the study could highlight specific morphological differences (even if minor) correlated with fate outcomes, but currently those remain elusive.

      Scope of Outcomes: The study focuses on two particular tissues (RPE and lens) as the outcomes of interest. These were well-chosen as examples (one induced, one spontaneous), but they do not encompass the full range of retinal organoid fates (e.g., neural retina layers). It's not a flaw per se, but it means the platform as presented is specialized. The method might need adaptation to predict more complex or multiple tissue outcomes simultaneously.

      Requirement of Large Data and Annotations: Practically, the approach required a very large imaging dataset and extensive manual annotation; each organoid's RPE and lens outcome, plus manual masking for training the segmentation model. This is a substantial effort that may be challenging to reproduce widely. The authors suggest that perhaps ~500 organoids might suffice to achieve similar results, but the data requirement is still high. Smaller labs or studies with fewer organoids might not immediately reap the full benefits of this approach without access to such imaging throughput.

      Medaka Fish vs. Other Systems: The retinal organoids in this study appear to be from medaka fish, whereas much organoid research uses human iPSC-derived organoids. It's not fully clear in the manuscript as to how the findings translate to mammalian or human organoids. If there are species-specific differences, the applicability to human retinal organoids (which are important for disease modeling) might need discussion. This is a minor point if the biology is conserved, but worth noting as a potential limitation.

      Predicting Tissue Size is Harder: The model's accuracy in predicting how much tissue (relative area) an organoid will form, while good, is notably lower than for simply predicting presence/absence. Final F1 scores for size classes (~0.7) indicate moderate success. This implies that quantitatively predicting organoid phenotypic severity or extent is more challenging, perhaps due to more continuous variation in size. The authors do acknowledge the lower accuracy for size and treat it carefully.

      Latency vs. Determination: While the authors narrow down the time window of fate determination, it remains somewhat unclear whether the times at which the model reaches high confidence truly correspond to the biological "decision point" or are just the earliest detection of its consequences. The manuscript discusses this caveat, but it's an inherent limitation that the predictive time point might lag the actual internal commitment event. Further work might be needed to link these predictions to molecular events of commitment.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary: Afting et al. present a computational pipeline for analyzing timelapse brightfield images of retinal organoids derived from Medaka fish. Their pipeline processes images along two paths: 1) morphometrics (based on computer vision features from skimage) and 2) deep learning. They discovered, through extensive manual annotation of ground truth, that their deep learning method could predict retinal pigmented epithelium and lens tissue emergence in time points earlier than either morphometrics or expert predictions. Our review is formatted based on the review commons recommendation.

      Major comments:

      Are the key conclusions convincing?

      Yes, the key conclusion that deep learning outperforms morphometric approaches is convincing. However, several methodological details require clarification. For instance, were the data splitting procedures conducted in the same manner for both approaches? Additionally, the authors note in the methods: "The validation data were scaled to the same range as the training data using the fitted scalers obtained from the training data." This represents a classic case of data leakage, which could artificially inflate performance metrics in traditional machine learning models. It is unclear whether the deep learning model was subject to the same issue. Furthermore, the convolutional neural network was trained with random augmentations, effectively increasing the diversity of the training data. Would the performance advantage still hold if the sample size had not been artificially expanded through augmentation?

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Their claims are currently preliminary, pending increased clarity and additional computational experiments described below.

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      • The authors discretize continuous variables into four bins for classification. However, a regression framework may be more appropriate for preserving the full resolution of the data. At a minimum, the authors should provide a stronger justification for this binning strategy and include an analysis of bin performance. For example, do samples near bin boundaries perform comparably to those near the bin centers? This would help determine whether the discretization introduces artifacts or obscures signals.
      • The relevance backpropagation interpretation analysis is not convincing. The authors argue that the model's use of pixels across the entire image (rather than just the RPE region) indicates that the deep learning approach captures holistic information. However, only three example images are shown out of hundreds, with no explanation for their selection, limiting the generalizability of the interpretation. Additionally, it is unclear how this interpretability approach would work at all in earlier time points, particularly before the model begins making confident predictions around the 8-hour mark. It is also not specified whether the input used for GradSHAP matches the input used during CNN training. The authors should consider expanding this analysis by quantifying pixel importance inside versus outside annotated regions over time. Lastly, Figure 4C is missing a scale bar, which would aid in interpretability.
      • The authors claim that they removed technical artifacts to the best of their ability, but it is unclear if the authors performed any adjustment beyond manual quality checks for contamination. Did the authors observe any illumination artifacts (either within a single image or over time)? Any other artifacts or procedures to adjust?
      • In line 434-436 the authors state "In this work, we used 1,000 organoids in total, to achieve the reported prediction accuracies. Yet, we suspect that as little as ~500 organoids are sufficient to reliably recapitulate our findings." It is unclear what evidence the authors use to support this claim? The authors could perform a downsampling analysis to determine tradeoff between performance and sample size.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Yes, we believe all experiments are realistic in terms of time and resources. We estimate all experiments could be completed in 3-6 months.

      Are the data and the methods presented in such a way that they can be reproduced?

      No, the code is not currently available. We were not able to review the source code.

      Are the experiments adequately replicated and statistical analysis adequate?

      • The experiments are adequately replicated.
      • The statistical analysis (deep learning) is lacking a negative control baseline, which would be helpful to observe if performance is inflated.

      Minor comments:

      Specific experimental issues that are easily addressable.

      Are prior studies referenced appropriately?

      Yes.

      Are the text and figures clear and accurate?

      The authors must improve clarity on terminology. For example, they should define a comprehensive dataset, significant, and provide clarity on their morphometrics feature space. They should elaborate on what they mean by "confounding factor of heterogeneity".

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      • Figure 2C describes a distance between what? The y axis is likely too simple. Same confusion over Figure 2D. Was distance computed based on tsne coordinates?
      • The authors perform a Herculean analysis comparing dozens of different machine learning classifiers. They select two, but they should provide justification for this decision.
      • It would be good to get a sense for how these retinal organoids grow - are they moving all over the place? They are in Matrigel so maybe not, but are they rotating? Can the author's approach predict an entire non-emergence experiment? The authors tried to standardize protocol, but ultimately if It's deriving this much heterogeneity, then how well it will actually generalize to a different lab is a limitation.
      • The authors should dampen claims throughout. For example, in the abstract they state, "by combining expert annotations with advanced image analysis". The image analysis pipelines use common approaches.
      • The authors state: "the presence of RPE and lenses were disagreed upon by the two independently annotating experts in a considerable fraction of organoids (3.9 % for RPE, 2.9% for lenses).", but it is unclear why there were two independently annotating experts. The supplements say images were split between nine experts for annotation.
      • Details on the image analysis pipeline would be helpful to clarify. For example, why did they choose to measure these 165 morphology features? Which descriptors were used to quantify blur? Did the authors apply blur metrics per FOV or per segmented organoid?
      • The description of the number of images is confusing and distracts from the number of organoids. The number of organoids and number of timepoints used would provide a better description of the data with more value. For example, does this image count include all five z slices?
      • The authors should consider applying a maximum projection across the five z slices (rather than the middle z) as this is a common procedure in image analysis. Why not analyze three-dimensional morphometrics or deep learning features? Might this improve performance further?
      • There is a lot of manual annotation performed in this work, the authors could speculate how this could be streamlined for future studies. How does the approach presented enable streamlining?

      Significance

      Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      The paper's advance is technical (providing new methods for organoid quality control) and conceptual (providing proof of concept that earlier time points contain information to predict specific future outcomes in retinal organoids)

      Place the work in the context of the existing literature (provide references, where appropriate).

      • The authors do a good job of placing their work in context in the introduction.
      • The work presents a simple image analysis pipeline (using only the middle z slice) to process timelapse organoid images. So not a 4D pipeline (time and space), just 3D (time). It is likely that more and more of these approaches will be developed over time, and this article is one of the early attempts.
      • The work uses standard convolutional neural networks.

      State what audience might be interested in and influenced by the reported findings.

      • Data scientists performing image-based profiling for time lapse imaging of organoids.
      • Retinal organoid biologists
      • Other organoid biologists who may have long growth times with indeterminate outcomes.

      Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      • Image-based profiling/morphometrics
      • Organoid image analysis
      • Computational biology
      • Cell biology
      • Data science/machine learning
      • Software

      This is a signed review: Gregory P. Way, PhD Erik Serrano Jenna Tomkinson Michael J. Lippincott Cameron Mattson Department of Biomedical Informatics, University of Colorado

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This study presents predictive modeling for developmental outcome in retinal organoids based on high-content imaging. Specifically, it compares the predictive performance of an ensemble of deep learning models with classical machine learning based on morphometric image features and predictions from human experts for four different task: prediction of RPE presence and lense presence (at the end of development) as well as the respective sizes. It finds that the DL model outperforms the other approaches and is predictive from early timepoints on, strongly indicating a time-frame for important decision steps in the developmental trajectory.

      Major comments: I find the paper over-all well written and easy to understand. The findings are relevant (see significance statement for details) and well supported. However, I have some remarks on the description and details of the experimental set-up, the data availability and reproducibility / re-usability of the data.

      1. Some details about the experimental set-up are unclear to me. In particular, it seems like there is a single organoid per well, as the manuscript does not mention any need for instance segmentation or tracking to distinguish organoids in the images and associate them over time. Is that correct? If yes, it should be explicitly stated so. Are there any specific steps in the organoid preparation necessary to avoid multiple organoids per well? Having multiple organoids per well would require the aforementioned image analysis steps (instance segmentation and tracking) and potentially add significant complexity to the analysis procedure, so this information is important to estimate the effort for setting up a similar approach in other organoid cultures (for example cancer organoids, where multiple organoids per well are common / may not be preventable in certain experimental settings).
      2. The terminology used with respect to the test and validation set is contrary to the field, and reporting the results on the test set (should be called validation set), should be avoided since it is used to select models. In more detail: the terms "test set" and "validation set" (introduced in 213-221) are used with the opposite meaning to their typical use in the deep learning literature. Typically, the validation set refers to a separate split that is used to monitor convergence / avoid overfitting during training, and the test set refers to an external set that is used to evaluate the performance of trained models. The study uses these terms in an opposite manner, which becomes apparent from line 624: "best performing model ... judged by the loss of the test set.". Please exchange this terminology, it is confusing to a machine learning domain expert. Furthermore, the performance on the test set (should be called validation set) is typically not reported in graphs, as this data was used for model selection, and thus does not provide an unbiased estimate of model performance. I would remove the respective curves from Figures 3 and 4.
      3. The experimental set-up for the human expert baseline is quite different to the evaluation of the machine learning models. The former is based on the annotation of 4,000 images by seven expert, the latter based on a cross-validation experiments on a larger dataset. First of all, the details on the human expert labeling procedure is very sparse, I could only find a very short description in the paragraph 136-144, but did not find any further details in the methods section. Please add a methods section paragraph that explains in more detail how the images were chosen, how they were assigned to annotators, and if there was any redundancy in annotation, and if yes how this was resolved / evaluated. Second, the fact that the set-up for human experts and ML models is quite different means that these values are not quite comparable in a statistical sense. Ideally, human estimators would follow the same set-up as in ML (as in, evaluate the same test sets). However, this would likely prohibitive in the required effort, so I think it's enough to state this fact clearly, for example by adding a comment on this to the captions of Figure 3 and 4.
      4. It is unclear to me where the theoretical time window for the Latent Determination Horizon in Figure 5 (also mentioned in line 350) comes from? Please explain this in more detail and provide a citation for it.
      5. The intepretability analysis (Figure 4, 634-639) based on relevance backpropagation was performed based on DenseNet121 only. Why did you choose this model and not the ResNet / MobileNet? I think it is quite crucial to see if there are any differences between these model, as this would show how much weight can be put on the evidence from this analysis and I would suggest to add an additional experiment and supplementary figure on this.
      6. The code referenced in the code availability statement is not yet present. Please make it available and ensure a good documentation for reproducibility. Similarly, it is unclear to me what is meant by "The data that supports the findings will be made available on HeiDoc". Does this only refer to the intermediate results used for statistical analysis? I would also recommend to make the image data of this study available. This could for example be done through a dedicated data deposition service such as BioImageArchive or BioStudies, or with less effort via zenodo. This would ensure both reproducibility as well as potential re-use of the data. I think the latter point is quite interesting in this context; as the authors state themselves it is unclear if prediction of the TOIs isn't even possible at an earlier point that could be achieved through model advances, which could be studied by making this data available.

      Minor comments:

      Line 315: Please add a citation for relevance backpropagation here.

      Line 591: There seems to be typo: "[...] classification of binary classification [...]"

      Line 608: "[...] where the images of individual organoids served as groups [...]" It is unclear to me what this means.

      Significance

      General assessment: This study demonstrates that (retinal) organoid development can be predicted from early timepoints with deep learning, where these cannot be discerned by human experts or simpler machine learning models. This fact is very interesting in itself due to its implication for organoid development, and could provide a valuable tool for molecular analysis of different organoid populations, as outlined by the authors. The contribution could be strengthened by providing a more thorough investigation of what features in the image are predictive at early timepoints, using a more sophisticated approach than relevance backprop, e.g. Discover (https://www.nature.com/articles/s41467-024-51136-9). This could provide further biological insight into the underlying developmental processes and enhance the understanding of retinal organoid development.

      Advance: similar studies that predict developmental outcome based on image data, for example cell proliferation or developmental outcome exist. However, to the best of my knowledge, this study is the first to apply such a methodology to organoids and convincingly shows is efficacy and argues is potential practical benefits. It thus constitutes a solid technical advance, that could be especially impactful if it could be translated to other organoid systems in the future.

      Audience: This research is of interest to a technical audience. It will be of immediate interest to researchers working on retinal organoids, who could adapt and use the proposed system to support experiments by better distinguishing organoids during development. To enable this application, code and data availability should be ensured (see above comments on reproducibility). It is also of interest to researchers in other organoid systems, who may be able to adapt the methodology to different developmental outcome predictions. Finally, it may also be of interest to image analysis / deep learning researchers as a dataset to improve architectures for predictive time series modeling.

      My research background: I am an expert in computer vision and deep learning for biomedical imaging, especially in microscopy. I have some experience developing image analysis for (cancer) organoids. I don't have any experience on the wet lab side of this work.

      Constantin Pape

    1. Note d'information : Militantisme et Esprit Critique

      Synthèse

      Ce document de synthèse analyse les tensions entre l'engagement militant et la rigueur de la pensée critique, en se basant sur les analyses de Laurent Puech, assistant social.

      Il démontre que si le militantisme est essentiel pour le progrès social, une approche axée exclusivement sur la "cause" peut conduire à des dérives méthodologiques, à la manipulation de données et à des résultats contre-productifs.

      À travers deux études de cas approfondies — les violences conjugales et les enfants tués par leurs parents —

      Laurent Puech met en lumière comment certains discours militants, souvent amplifiés par les médias et les institutions, propagent des statistiques alarmistes et factuellement fausses.

      Par exemple, l'idée d'une augmentation des "féminicides" ou le chiffre de "deux enfants tués par jour" sont directement contredits par les données officielles, qui montrent au contraire une baisse significative de ces phénomènes.

      Ce décalage entre la perception et la réalité révèle l'utilisation des chiffres non pas comme des outils de mesure, mais comme des arguments moraux visant à susciter l'émotion et à valider une idéologie préexistante.

      Cette démarche, bien que souvent sincère, entrave une compréhension juste des problèmes, génère une peur infondée et risque de paralyser les victimes que l'on prétend aider.

      En conclusion, Laurent Puech plaide pour un militantisme fondé sur la méthode, la vérification des faits et l'honnêteté intellectuelle, même face aux sujets les plus sensibles.

      1. Profil de l'intervenant : Laurent Puech

      Laurent Puech est un assistant social de formation qui a développé une expertise sur l'application de la pensée critique dans le domaine de l'aide sociale et du militantisme.

      1.1. Parcours professionnel

      Formation et débuts : Après une réorientation professionnelle vers la trentaine, il s'est formé au métier d'assistant de service social.

      Expériences diverses : Son parcours l'a conduit à travailler dans des contextes variés, incluant le milieu scolaire (collèges, lycées), la "polyvalence de secteur" (service social de quartier), et une mise à disposition auprès de la gendarmerie.

      Spécialisation : Ces expériences l'ont rapproché des questions de protection de l'enfance et des personnes, notamment les femmes victimes de violences conjugales.

      Son rôle auprès de la gendarmerie consistait à assister le public en contact avec les forces de l'ordre, sur la base du volontariat.

      1.2. Parcours militant et évolution

      Laurent Puech se définit comme un militant, son parcours étant jalonné d'engagements syndicaux, politiques et associatifs (notamment à l'Association Nationale des Assistants de Service Social - ANAS).

      Il décrit une évolution significative dans sa manière de militer :

      Du militantisme de l'idée... : Dans sa jeunesse (années 80), son engagement était principalement motivé par des "idées" et des grands principes.

      Il cite son adhésion au MRAP (Mouvement contre le Racisme et pour l'Amitié entre les Peuples) comme un exemple de militantisme centré sur la défense de valeurs (égalité, dignité) sans un questionnement approfondi de la méthode.

      ...Au militantisme de la méthode : Aujourd'hui, son militantisme est axé sur la défense d'une méthode basée sur la pensée critique, l'analyse de l'information et la déconstruction des logiques argumentatives.

      Il ne défend plus une "casquette" mais une approche rigoureuse.

      L'influence de sa jeunesse punk a joué un rôle formateur, lui inculquant une défiance envers les autorités non justifiées et un regard critique comme préalable à la reconnaissance de toute autorité.

      2. Le Rôle Central de la Pensée Critique

      L'intérêt de Laurent Puech pour la pensée critique a émergé en dehors du travail social, lors d'une expérience dans le secteur de la diététique en Belgique.

      2.1. L'origine de l'intérêt

      Confronté à des personnes en souffrance utilisant des thérapies dites "alternatives" (par exemple, des cures de vitamines basées sur les conseils d'un astrologue), il a commencé à s'interroger sur l'impact des croyances.

      Il a compris que la sincérité ou la bienveillance d'un praticien (astrologue, gourou) ne suffisait pas à garantir la qualité de sa démarche.

      La lecture d'ouvrages comme "Le paranormal" d'Henri Broch a été un tournant, lui fournissant les outils méthodologiques pour analyser la construction d'une argumentation et la validité des preuves.

      2.2. La Zététique comme méthode

      Il a adopté la démarche de la zététique, définie comme un scepticisme utilisant la méthode scientifique pour mettre à l'épreuve des énoncés par l'investigation, la remontée aux sources et l'expérimentation.

      Il a appliqué cette méthode en déconstruisant les prévisions de l'astrologue Élizabeth Teissier, démontrant qu'elles étaient soit factuellement fausses (dans les dates), soit si vagues qu'elles étaient sujettes à toutes les interprétations.

      Cette analyse a également mis en lumière la complaisance des médias, qui relayaient ses affirmations sans aucun regard critique.

      3. Application au Travail Social : Le Paradoxe de la Protection

      Laurent Puech transpose cette analyse critique à son propre domaine, le travail social, via ses sites SecretPro.fr (sur le secret professionnel) et protections-critiques.org.

      3.1. L'aide sociale comme "effraction"

      Il décrit certaines facettes de l'aide sociale, notamment en protection de l'enfance, comme une "effraction". Lorsqu'une "information préoccupante" est émise, une enquête sociale est déclenchée.

      Une famille ne peut refuser ce contact sans risquer une saisine de l'autorité judiciaire. L'intervention, même si elle n'est pas physiquement forcée, l'est symboliquement.

      Il note une augmentation de ces procédures, ce qui pose la question de l'équilibre entre aide et contrôle, avec une part du contrôle devenant de plus en plus "brutale et violente".

      3.2. L'angle mort du système protecteur

      Le principal risque est que "le protecteur peut devenir maltraitant". Selon lui, les systèmes de protection souffrent d'un angle mort majeur :

      ils sont conçus pour voir la violence chez les autres (les familles) mais peinent à penser leur propre violence potentielle.

      Ce phénomène est renforcé par un vocabulaire qui se veut exclusivement positif :

      Protection : Un concept "horizon", une promesse impossible à atteindre pleinement.

      Déontologie, Respect, Bienveillance : Des termes qui bardent le professionnel de certitudes morales et l'empêchent de questionner les effets potentiellement destructeurs de ses actions.

      Pour Puech, la bienveillance ne se décrète pas au présent ; elle se mesure aux effets produits, donc toujours au passé.

      L'injonction paradoxale faite aux professionnels ("Soyez aidant en contrôlant les gens") achève de brouiller les repères et complique la pratique quotidienne.

      4. Étude de Cas 1 : Les Violences Conjugales

      Laurent Puech applique sa méthode critique à la question très médiatisée des violences conjugales, en analysant les discours militants et les données disponibles.

      4.1. Définitions et données

      Distinction clé : Il rappelle la distinction du rapport Henrion entre le conflit conjugal (où les acteurs sont sur un pied d'égalité, même avec des actes violents) et la violence conjugale, qui se caractérise par une domination de l'un sur l'autre.

      Types de violence : Si la violence physique grave est majoritairement le fait d'hommes sur des femmes, les études (notamment québécoises) montrent une quasi-parité dans les violences psychologiques.

      Manque de données en France : L'enquête Enveff (début des années 2000) ne portait que sur les femmes.

      Le rapport complet de la nouvelle enquête Virage (hommes et femmes), prévu pour 2017, n'est toujours pas publié en 2019.

      4.2. Analyse du discours militant

      Le discours militant actuel se concentre sur les violences physiques des hommes envers les femmes, en utilisant le terme "féminicide". Cette approche présente plusieurs biais :

      Invisibilisation d'une partie du réel : Elle occulte les violences psychologiques, les violences exercées par des femmes, et les hommes victimes.

      Ces derniers sont d'ailleurs confrontés à une incrédulité qui rend leur parole encore plus difficile ("Oh, monsieur ! C'est qui, l'homme à la maison ?").

      Simplification idéologique : En ne retenant que la violence patriarcale (homme sur femme), ce discours met sur le même plan des situations de nature très différente (ex: un homicide violent et une euthanasie de conjoint atteint d'Alzheimer) au seul motif que la victime est une femme.

      Focalisation sur l'agresseur : En se concentrant sur les "féminicides", le discours militant s'intéresse moins aux femmes victimes qu'à prouver que "l'homme est un salaud".

      La preuve en est que les homicides de femmes par des femmes dans un couple ne sont pas comptabilisés par certains collectifs.

      4.3. La réalité des chiffres

      Le discours militant diffuse l'idée d'une augmentation dramatique des "féminicides", en s'appuyant sur des pics statistiques de courte durée (ex: janvier-février 2019) et en ignorant les périodes de baisse.

      Tendance de fond : Les données officielles de la Délégation aux victimes du Ministère de l'Intérieur, collectées depuis 2006, montrent une baisse de 25 % des homicides au sein du couple (hommes et femmes) entre 2006 et 2017.

      Contexte général : Cette baisse s'inscrit dans une tendance plus large de diminution des homicides en France (passés de 1500 à 800 par an en 15 ans).

      Explications : Cette amélioration est le fruit de multiples facteurs : meilleure connaissance du phénomène (grâce, paradoxalement, aux alertes militantes initiales), renforcement de la loi pénale, et création de dispositifs d'aide et d'hébergement.

      Effets du discours alarmiste : En affirmant que "rien n'est fait", le discours militant actuel est jugé "dépressif" et peut "tétaniser les femmes qui vivent de la violence" en leur envoyant le message que la société les abandonne.

      5. Étude de Cas 2 : Les Enfants Tués par leurs Parents

      Un autre sujet où l'émotion anesthésie l'esprit critique est celui des enfants tués par leurs parents.

      5.1. Le mythe des "deux enfants tués par jour"

      Le chiffre de "deux enfants tués par jour" (environ 700 par an) est largement diffusé par des associations, des médias et même des institutions (rapports parlementaires, ministres, etc.).

      Laurent Puech en retrace l'origine, qu'il compare à celle de l'iridologie (une pseudoscience fondée sur une seule anecdote non vérifiée).

      Origine (années 80) : Le chiffre provient d'une extrapolation "insensée" réalisée à partir de données éparses d'un seul service hospitalier.

      Légitimation (années 2000) : Une étude de l'Inserm, portant uniquement sur les enfants de 0 à 1 an, a popularisé une méthode d'extrapolation consistant à multiplier les cas connus par un facteur allant jusqu'à 15 pour estimer les cas cachés (ex: syndrome du bébé secoué).

      Généralisation absurde : Cette méthode, déjà très critiquable pour les nourrissons, a ensuite été appliquée à tous les mineurs, comme s'il était aussi facile de dissimuler le meurtre d'un adolescent de 14 ans que celui d'un bébé.

      5.2. La réalité des chiffres

      Contradiction flagrante : Le chiffre de 700 enfants tués par an était supérieur au nombre total d'homicides enregistrés en France toutes catégories d'âge confondues. Cette absurdité n'a pourtant pas empêché sa diffusion.

      Données réelles : Un travail de recensement rigoureux mené sur la période 2012-2016 a établi le nombre moyen de cas à environ 70 par an, soit dix fois moins que le chiffre militant.

      5.3. Le chiffre comme argument moral

      L'analyse de Laurent Puech montre que, sur ces sujets hautement émotionnels, le chiffre n'est pas utilisé pour décrire le réel, mais pour soutenir une position morale.

      Il sert à dire "j'ai raison" et à disqualifier toute parole dissonante comme étant "immorale".

      Ceux qui contestent le chiffre sont accusés de minimiser la gravité du problème et de se placer "dans le camp du mal", alors même que la critique ne porte pas sur la sincérité des acteurs, mais sur la rigueur de leur méthode et la fiabilité de l'information qu'ils diffusent.

    1. ABSTRACT

      Oh okk so, yes the author is promoting multinat fed as one of the best forms to uphold dem because it has the ability for the minority nations to be heard. but a level of self-restraint must be seen by majority nation so there isn't any state stability with the minority nation fighting back for more autonomy. (additionally, their level of autonomy is not a 1 conversation deal its a day-tp-day.) the author uses the federalism formula and applied to complex pol systems to demostrate that if the majority nation cannot uphold deep div then minority nations will get angry and put in a situation of dominance.

    2. .

      Multinat fed are not a panacea but its best for fulfilling dem principles and give voice to minority nations/large # of pol comm. "complementary to this", multinat fed can give added val to principle of self-gov in dem settings through implantation of innovative pol practices in maintaining multinat unity.

    3. .

      Mutlinat fed allow for the minority nation to institutionalise a pol of recognition and minority nations are able to develop policy instruments to put limits on dom of majority nation. Therefore the extent that minority nations are treated fairly can expect state stability will be increased and constitutional loyalty will stay intact (because the minority nation won't want to escape). All of this means the majority nation must excerise a level of self-restraint

    4. .

      OH! resistnace to deep diversity is ever where and well documented but what makes multinat fed "better" is its ability to consider the voice of the minority nations because they are built into the pol fabric of the country.

    5. .

      The author reiterates the imoprtance of finding an equilibrium between self-rule and shared rule in order to achieve respect to lang protections, hiring practices, and provision of fair rep in Legilative Assembly and Upper House for the minority nations.

    6. .

      There are many cases where the regional state regained their autonomy (East Timor, Eritrea, Kosovo, Montenegro, and South Suden) and they have become independent. (*which it is a point for multinat fed are doomed to collapse...)

    7. .

      Shout out to Philip Resnick, might hav eto check him out on his views. The author highlights that the central challenge is within finding a proper equilibrium between sefl-rule and shared rule. The Canadian fed is complex and Quebec desires to have share of pol competencies. Though there are some minority nations that are fine with the central state taking care of certain businesses because the minority nation doesn't have the resources to self-rule. But this may lead to situations where the majority nation takes more and more control away from minority

    8. .

      This is a lot of what the author is arguing but it is framed in this essay through the backing up of Pierre Trudeau's point about multinat fed (which is the advantages of staying integrated for the minority nation must outweigh the down-sides/other opportunities that may occur when spilting. Author added on that this success is dependent on how much the central state is willing to accomodate deep diversity. Lists examples of how the central state can accomodate. mainly pol power sharing, pol auto, adoption of collab

    1. The urban setting, saturated with the vacant images of late capitalism leavesHamlet in an Un-Shakespearean universe.

      Throughout the movie, the sounds of traffic flowing through the city remind the viewer of the busy urban setting, painting the backdrop of many scenes. This particular aspect of modernity strongly contrasts with the Shakespearean language used by the characters.

    1. theuniversity where the present research took place is now engaging with thefindings from the research and the implications for enhancing praxis

      本研究所在的大学目前正运用研究结果,思考如何改进实践

    1. It makes a lot of sense that the choice of words in questions impacts the results and answers from people. Asking 'how good/bad' something was instead of 'what do you think about it' directs people towards a specific answer. I wonder how many deceptive surveys do that in purpose just to attain a result they want.

  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Kurt Wagner. This is how Facebook collects data on you even if you don’t have an account. Vox, April 2018. URL: https://www.vox.com/2018/4/20/17254312/facebook-shadow-profiles-data-collection-non-users-mark-zuckerberg (visited on 2023-12-05).

      I feel like this breaks a basic idea of consent : if you’ve never joined, you never agreed to be tracked. And yet the only options seem to be: accept it or avoid the web entirely, which is basically impossible. So I am question how much control I really have, If the system can build a “shadow profile” of my interests without me opting in, then is there truly any privacy left?

    1. The AT Protocol API lets you access a lot of the data that Bluesky tracks (since Bluesky is a more open social media protocol), but Bluesky probably track much more than they let you have access to (like what other social media platforms do)., but Bluesky probably track much more than they let you have access to (like what other social media platforms do).

      I think it’s interesting how the Bluesky API gives researchers access to certain data but still limits what they can see. It reminds me that even when a platform claims to be “open,” it still controls what kind of information we’re allowed to analyze. I wonder how much bias this creates in research if the data we get only shows a part.

    1. After this day, Coyote ran away and never came back for he was afraid of what he had done. He always looked over his shoulder, afraid that someone was pursuing him. Since then he has been starving because no one will give him anything to eat.

      I kind of feel bad for Coyote. because he made people sad, and now he is sad too. I think it shows when you do something bad, it can come back to you later.

    2. When at last he heard the whirlwind coming he closed the door before the whirlwind could enter.

      Why did Coyote close the door? Was he jealous or mad? Or did he just want to be right? I don’t understand what he was thinking here.

    3. Coyote jumped up and said that people ought to die forever because there was not enough food or room for everyone to live forever.

      This part reminds me of Lewis Hyde’s idea that tricksters change the world. Coyote breaks the rule and makes a big change. He brings death, so now life is different for everyone.

    1. . What are the implications of compromised immune function as a result of exposure to chronic stress?

      Chronic stress can weaken the immune system, making the body less able to fight off infections and illnesses. This means that people under long-term stress are more likely to get sick from viruses or bacteria. Stress can also slow down the body’s healing process and increase the risk of chronic diseases like heart problems, diabetes, or autoimmune disorders. In addition, prolonged stress can make vaccines less effective because the immune system doesn’t respond as strongly. Overall, chronic stress can seriously reduce overall health and the body’s ability to protect itself.

    1. To be witnessed is what gives one’s life meaning; that is what gives death its cause.

      A personal belief that I really have been taught exactly in church since I was kid, and then I started to teach the kids in the sunday school the same mindset during the Persecution of the coptic christians by the muslim brotherhood, and ISIS. they were used to burn, the churches, steal, and beat the coptics - as they did to to my aunt - and kill on purpose in the name of God. So, personally I got so attached to such sentence, becasue I used to believe in this senthece every time I walk to my church in Sharm or Cairo 2013 till 2016.

    2. The procedure must be as delicate, as cognizant of the original conditions of creation in order to nurture and ensure a continuation of life.

      Such sentence is simply resembling the truth of translation in Lina's eyes in the most beautiful way. As a pianist, she used the word "transposed" as if she is describing the words as music notes in which basically means change a piece of music from its original key to a different one, either higher or lower, while keeping all the relationships between the notes the same. so trnalsating is not only changing words but keeping the rekationship and context same.

    1. As you accumulate sources, make sure you create a bibliography, or a list of sources that you’ve used in your research and writing process (keeping track of those sources will help you to create you annotated bibliography, should your instructor require one.

      keep track of the resources you use.

    2. Next, in step four, you generate sub-questions from your main question. For instance, “During the 19th century, what were some of the competing theories about how life is created?,” and “Did any of Mary Shelley’s other works relate to the creation of life?” After you know what sub-questions you want to pursue, you’ll be able to move to step five.

      create sub questions from your main question.

    3. Your main research question should be substantial enough to form the guiding principle of your paper—but focused enough to guide your research. A strong research question requires you not only to find information but also to put together different pieces of information, interpret and analyze them, and figure out what you think. As you consider potential research questions, ask yourself whether they would be too hard or too easy to answer.

      your research question should be good enough to lead your essay.

    4. Once you have a list of potential topics, you will need to choose one as the focus of your essay. You will also need to narrow your topic. Most writers find that the topics listed during the brainstorming or idea mapping stage are broad

      pick one of the topic in the list to start

    5. important to know how to narrow down your ideas into a concise, manageable thesis. You may also use the list as a starting point to help you identify additional, related topics. Discussing your ideas with your instructor will help ensure that you choose a manageable topic that fits the requirements of the assignment.

      its important to narrow down your steps and its helpful to doble check with instructor about your ideas.

    6. A successful research process should go through these steps: Decide on the topic. Narrow the topic in order to narrow search parameters. Consider a question that your research will address. Generate sub-questions from your main question. Determine what kind of sources are best for your argument. Create a bibliography as you gather and reference sources.

      succeful research process.

    7. You can also limit the time period from which you will draw resources. Do you only want articles written in the past ten or twenty years? Do you want them from a specific span of time? Again, most search engines will allow you to limit results to anything written within the years you specify, and the choice to limit the time period will depend on your topic. Determining these factors will help you form a specific research plan to guide your process.

      gather any type of resource from any time period you want.

    8. A research plan should begin after you can clearly identify the focus of your argument. Narrow the scope of your argument by identifying the specific subtopic you will research. A broad search will yield thousands of sources, which makes it difficult to form a focused, coherent argument, and it is not possible to include every topic in your research. If you narrow your focus, however, you can find targeted resources that can be synthesized into a new argument. After narrowing your focus, think about key search terms that will apply only to your subtopic. Develop specific questions that can be answered through your research process, but be careful not to choose a focus that is overly narrow. You should aim for a question that will limit search results to sources that relate to your topic, but will still result in a varied pool of sources to explore.

      you need to identify the focus of your argument

    9. Another part of your research plan should include the type of sources you want to gather. The possibilities include articles, scholarly journals, primary sources, textbooks, encyclopedias, and more.

      cite in your research plan what sources you are wanting to gather.

    10. You would also not want to search for a single instance of surgery because you might not be able to find enough information on it. Find a happy medium between a too-broad or too-specific topic to research.

      look for specific evidence in your research

    11. A research plan should begin after you can clearly identify the focus of your argument. Narrow the scope of your argument by identifying the specific subtopic you will research. A broad search will yield thousands of sources, which makes it difficult to form a focused, coherent argument, and it is not possible to include every topic in your research. If you narrow your focus, however, you can find targeted resources that can be synthesized into a new argument. After narrowing your focus, think about key search terms that will apply only to your subtopic. Develop specific questions that can be answered through your research process, but be careful not to choose a focus that is overly narrow. You should aim for a question that will limit search results to sources that relate to your topic, but will still result in a varied pool of sources to explore.

      first identify the focus of the argument then make subtopics after that make some questions that might help you do your research .

    1. Differences in language (numbers) reflect subtle differences in cultures tools of intellectual adaptation they affect how a child learns to think.

      Asians are better at arithmetic because they learn mathematics different than the Westerns(ninthy two / nine of tens and a two)

    2. Habituation is a decrease in looking at a stimulus

      Babies will look at the new stimuli longer. Not because they like it better because they want to explore it. Once they get used to it(habituation) they will not give that much attention anymore

    3. embryonic phase, the risk of birth defects is highest (since organs are forming). In the fetal stage, alcohol mainly affects growth and brain development.

      There is a chance that child can birthed without abnormally when the mother consumed alcohol after the organs developed. (First 10 weeks)